All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Follow-up for speed up page cache truncation v2
@ 2017-10-18  7:59 ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Changelog since v1
o Remove "private" parameter from radix tree callbacks
o Rebase on v4.14-rc5 + Jan's v2 series
o Acked/Reviews by

This series is a follow-on for Jan Kara's series "Speed up page cache
truncation" series. We both ended up looking at the same problem but saw
different problems based on the same data. This series builds upon his work.

A variety of workloads were compared on four separate machines but each
machine showed gains albeit at different levels. Minimally, some of the
differences are due to NUMA where truncating data from a remote node is
slower than a local node. The workloads checked were

o sparse truncate microbenchmark, tiny
o sparse truncate microbenchmark, large
o reaim-io disk workfile
o dbench4 (modified by mmtests to produce more stable results)
o filebench varmail configuration for small memory size
o bonnie, directory operations, working set size 2*RAM

reaim-io, dbench and filebench all showed minor gains.  Truncation does not
dominate those workloads but were tested to ensure no other regressions.
They will not be reported further.

The sparse truncate microbench was written by Jan. It creates a number of
files and then times how long it takes to truncate each one. The "tiny"
configuraiton creates a number of files that easily fits in memory and
times how long it takes to truncate files with page cache. The large
configuration uses enough files to have data that is twice the size of
memory and so timings there include truncating page cache and working set
shadow entries in the radix tree.

Patches 1-4 are the most relevant parts of this series. Patches 5-8 are
optional as they are deleting code that is essentially useless but has a
negligible performance impact.

The changelogs have more information on performance but just for bonnie
delete options, the main comparison is

bonnie
                                      4.14.0-rc5             4.14.0-rc5             4.14.0-rc5
                                          jan-v2                vanilla                 mel-v2
Hmean     SeqCreate ops         76.20 (   0.00%)       75.80 (  -0.53%)       76.80 (   0.79%)
Hmean     SeqCreate read        85.00 (   0.00%)       85.00 (   0.00%)       85.00 (   0.00%)
Hmean     SeqCreate del      13752.31 (   0.00%)    12090.23 ( -12.09%)    15304.84 (  11.29%)
Hmean     RandCreate ops        76.00 (   0.00%)       75.60 (  -0.53%)       77.00 (   1.32%)
Hmean     RandCreate read       96.80 (   0.00%)       96.80 (   0.00%)       97.00 (   0.21%)
Hmean     RandCreate del     13233.75 (   0.00%)    11525.35 ( -12.91%)    14446.61 (   9.16%)

Jan's series is the baseline and the vanilla kernel is 12% slower where as
this series on top gains another 11%.  This is from a different machine
than the data in the changelogs but the detailed data was not collected
as there was no substantial change in v2.

 arch/powerpc/mm/mmu_context_book3s64.c             |  2 +-
 arch/powerpc/mm/pgtable_64.c                       |  2 +-
 arch/sparc/mm/init_64.c                            |  2 +-
 arch/tile/mm/homecache.c                           |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c            |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c              |  6 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c                |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c            |  4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c                |  2 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c       |  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-desc.c          |  2 +-
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c   |  3 +-
 .../net/ethernet/cavium/liquidio/octeon_network.h  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |  5 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |  4 +-
 drivers/net/ethernet/qlogic/qlge/qlge_main.c       |  3 +-
 drivers/net/ethernet/sfc/falcon/rx.c               |  2 +-
 drivers/net/ethernet/sfc/rx.c                      |  2 +-
 drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c    |  2 +-
 drivers/net/ethernet/ti/netcp_core.c               |  2 +-
 drivers/net/virtio_net.c                           |  1 -
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |  2 +-
 fs/afs/write.c                                     |  4 +-
 fs/btrfs/extent_io.c                               |  4 +-
 fs/buffer.c                                        |  4 +-
 fs/cachefiles/rdwr.c                               | 10 +--
 fs/ceph/addr.c                                     |  4 +-
 fs/dax.c                                           |  4 +-
 fs/ext4/file.c                                     |  2 +-
 fs/ext4/inode.c                                    |  6 +-
 fs/f2fs/checkpoint.c                               |  2 +-
 fs/f2fs/data.c                                     |  2 +-
 fs/f2fs/file.c                                     |  2 +-
 fs/f2fs/node.c                                     |  8 +-
 fs/fscache/page.c                                  |  2 +-
 fs/fuse/dev.c                                      |  2 +-
 fs/gfs2/aops.c                                     |  2 +-
 fs/hugetlbfs/inode.c                               |  2 +-
 fs/nilfs2/btree.c                                  |  2 +-
 fs/nilfs2/page.c                                   |  8 +-
 fs/nilfs2/segment.c                                |  4 +-
 include/linux/gfp.h                                |  9 +--
 include/linux/pagemap.h                            | 10 +--
 include/linux/pagevec.h                            |  6 +-
 include/linux/radix-tree.h                         |  7 +-
 include/linux/skbuff.h                             |  2 +-
 include/linux/slab.h                               |  3 -
 include/linux/swap.h                               | 15 +++-
 include/trace/events/kmem.h                        | 11 +--
 include/trace/events/mmflags.h                     |  1 -
 kernel/power/snapshot.c                            |  4 +-
 lib/idr.c                                          |  2 +-
 lib/radix-tree.c                                   | 30 +++----
 mm/filemap.c                                       | 15 ++--
 mm/mlock.c                                         |  4 +-
 mm/page-writeback.c                                |  2 +-
 mm/page_alloc.c                                    | 89 +++++++++++---------
 mm/percpu-vm.c                                     |  2 +-
 mm/rmap.c                                          |  2 +-
 mm/shmem.c                                         |  8 +-
 mm/swap.c                                          | 15 ++--
 mm/swap_state.c                                    |  2 +-
 mm/truncate.c                                      | 94 +++++++++++++++-------
 mm/vmscan.c                                        |  6 +-
 mm/workingset.c                                    | 10 +--
 net/core/skbuff.c                                  |  4 +-
 tools/perf/builtin-kmem.c                          |  1 -
 tools/testing/radix-tree/multiorder.c              |  2 +-
 70 files changed, 262 insertions(+), 232 deletions(-)

-- 
2.14.0

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 0/8] Follow-up for speed up page cache truncation v2
@ 2017-10-18  7:59 ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Changelog since v1
o Remove "private" parameter from radix tree callbacks
o Rebase on v4.14-rc5 + Jan's v2 series
o Acked/Reviews by

This series is a follow-on for Jan Kara's series "Speed up page cache
truncation" series. We both ended up looking at the same problem but saw
different problems based on the same data. This series builds upon his work.

A variety of workloads were compared on four separate machines but each
machine showed gains albeit at different levels. Minimally, some of the
differences are due to NUMA where truncating data from a remote node is
slower than a local node. The workloads checked were

o sparse truncate microbenchmark, tiny
o sparse truncate microbenchmark, large
o reaim-io disk workfile
o dbench4 (modified by mmtests to produce more stable results)
o filebench varmail configuration for small memory size
o bonnie, directory operations, working set size 2*RAM

reaim-io, dbench and filebench all showed minor gains.  Truncation does not
dominate those workloads but were tested to ensure no other regressions.
They will not be reported further.

The sparse truncate microbench was written by Jan. It creates a number of
files and then times how long it takes to truncate each one. The "tiny"
configuraiton creates a number of files that easily fits in memory and
times how long it takes to truncate files with page cache. The large
configuration uses enough files to have data that is twice the size of
memory and so timings there include truncating page cache and working set
shadow entries in the radix tree.

Patches 1-4 are the most relevant parts of this series. Patches 5-8 are
optional as they are deleting code that is essentially useless but has a
negligible performance impact.

The changelogs have more information on performance but just for bonnie
delete options, the main comparison is

bonnie
                                      4.14.0-rc5             4.14.0-rc5             4.14.0-rc5
                                          jan-v2                vanilla                 mel-v2
Hmean     SeqCreate ops         76.20 (   0.00%)       75.80 (  -0.53%)       76.80 (   0.79%)
Hmean     SeqCreate read        85.00 (   0.00%)       85.00 (   0.00%)       85.00 (   0.00%)
Hmean     SeqCreate del      13752.31 (   0.00%)    12090.23 ( -12.09%)    15304.84 (  11.29%)
Hmean     RandCreate ops        76.00 (   0.00%)       75.60 (  -0.53%)       77.00 (   1.32%)
Hmean     RandCreate read       96.80 (   0.00%)       96.80 (   0.00%)       97.00 (   0.21%)
Hmean     RandCreate del     13233.75 (   0.00%)    11525.35 ( -12.91%)    14446.61 (   9.16%)

Jan's series is the baseline and the vanilla kernel is 12% slower where as
this series on top gains another 11%.  This is from a different machine
than the data in the changelogs but the detailed data was not collected
as there was no substantial change in v2.

 arch/powerpc/mm/mmu_context_book3s64.c             |  2 +-
 arch/powerpc/mm/pgtable_64.c                       |  2 +-
 arch/sparc/mm/init_64.c                            |  2 +-
 arch/tile/mm/homecache.c                           |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c             |  6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c            |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c            |  2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c              |  6 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c                |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c            |  4 +-
 drivers/gpu/drm/radeon/radeon_ttm.c                |  2 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c       |  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-desc.c          |  2 +-
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c   |  3 +-
 .../net/ethernet/cavium/liquidio/octeon_network.h  |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |  5 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |  4 +-
 drivers/net/ethernet/qlogic/qlge/qlge_main.c       |  3 +-
 drivers/net/ethernet/sfc/falcon/rx.c               |  2 +-
 drivers/net/ethernet/sfc/rx.c                      |  2 +-
 drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c    |  2 +-
 drivers/net/ethernet/ti/netcp_core.c               |  2 +-
 drivers/net/virtio_net.c                           |  1 -
 drivers/staging/lustre/lustre/mdc/mdc_request.c    |  2 +-
 fs/afs/write.c                                     |  4 +-
 fs/btrfs/extent_io.c                               |  4 +-
 fs/buffer.c                                        |  4 +-
 fs/cachefiles/rdwr.c                               | 10 +--
 fs/ceph/addr.c                                     |  4 +-
 fs/dax.c                                           |  4 +-
 fs/ext4/file.c                                     |  2 +-
 fs/ext4/inode.c                                    |  6 +-
 fs/f2fs/checkpoint.c                               |  2 +-
 fs/f2fs/data.c                                     |  2 +-
 fs/f2fs/file.c                                     |  2 +-
 fs/f2fs/node.c                                     |  8 +-
 fs/fscache/page.c                                  |  2 +-
 fs/fuse/dev.c                                      |  2 +-
 fs/gfs2/aops.c                                     |  2 +-
 fs/hugetlbfs/inode.c                               |  2 +-
 fs/nilfs2/btree.c                                  |  2 +-
 fs/nilfs2/page.c                                   |  8 +-
 fs/nilfs2/segment.c                                |  4 +-
 include/linux/gfp.h                                |  9 +--
 include/linux/pagemap.h                            | 10 +--
 include/linux/pagevec.h                            |  6 +-
 include/linux/radix-tree.h                         |  7 +-
 include/linux/skbuff.h                             |  2 +-
 include/linux/slab.h                               |  3 -
 include/linux/swap.h                               | 15 +++-
 include/trace/events/kmem.h                        | 11 +--
 include/trace/events/mmflags.h                     |  1 -
 kernel/power/snapshot.c                            |  4 +-
 lib/idr.c                                          |  2 +-
 lib/radix-tree.c                                   | 30 +++----
 mm/filemap.c                                       | 15 ++--
 mm/mlock.c                                         |  4 +-
 mm/page-writeback.c                                |  2 +-
 mm/page_alloc.c                                    | 89 +++++++++++---------
 mm/percpu-vm.c                                     |  2 +-
 mm/rmap.c                                          |  2 +-
 mm/shmem.c                                         |  8 +-
 mm/swap.c                                          | 15 ++--
 mm/swap_state.c                                    |  2 +-
 mm/truncate.c                                      | 94 +++++++++++++++-------
 mm/vmscan.c                                        |  6 +-
 mm/workingset.c                                    | 10 +--
 net/core/skbuff.c                                  |  4 +-
 tools/perf/builtin-kmem.c                          |  1 -
 tools/testing/radix-tree/multiorder.c              |  2 +-
 70 files changed, 262 insertions(+), 232 deletions(-)

-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Freeing a list of pages current enables/disables IRQs for each page freed.
This patch splits freeing a list of pages into two operations -- preparing
the pages for freeing and the actual freeing. This is a tradeoff - we're
taking two passes of the list to free in exchange for avoiding multiple
enable/disable of IRQs.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                           janbatch-v1r1            oneirq-v1r1
Min          Time      149.00 (   0.00%)      141.00 (   5.37%)
1st-qrtle    Time      150.00 (   0.00%)      142.00 (   5.33%)
2nd-qrtle    Time      151.00 (   0.00%)      142.00 (   5.96%)
3rd-qrtle    Time      151.00 (   0.00%)      143.00 (   5.30%)
Max-90%      Time      153.00 (   0.00%)      144.00 (   5.88%)
Max-95%      Time      155.00 (   0.00%)      147.00 (   5.16%)
Max-99%      Time      201.00 (   0.00%)      195.00 (   2.99%)
Max          Time      236.00 (   0.00%)      230.00 (   2.54%)
Amean        Time      152.65 (   0.00%)      144.37 (   5.43%)
Stddev       Time        9.78 (   0.00%)       10.44 (  -6.72%)
Coeff        Time        6.41 (   0.00%)        7.23 ( -12.84%)
Best99%Amean Time      152.07 (   0.00%)      143.72 (   5.50%)
Best95%Amean Time      150.75 (   0.00%)      142.37 (   5.56%)
Best90%Amean Time      150.59 (   0.00%)      142.19 (   5.58%)
Best75%Amean Time      150.36 (   0.00%)      141.92 (   5.61%)
Best50%Amean Time      150.04 (   0.00%)      141.69 (   5.56%)
Best25%Amean Time      149.85 (   0.00%)      141.38 (   5.65%)

With a tiny number of files, each file truncated has resident page cache
and it shows that time to truncate is roughtly 5-6% with some minor jitter.

                                      4.14.0-rc4             4.14.0-rc4
                                   janbatch-v1r1            oneirq-v1r1
Hmean     SeqCreate ops         65.27 (   0.00%)       81.86 (  25.43%)
Hmean     SeqCreate read        39.48 (   0.00%)       47.44 (  20.16%)
Hmean     SeqCreate del      24963.95 (   0.00%)    26319.99 (   5.43%)
Hmean     RandCreate ops        65.47 (   0.00%)       82.01 (  25.26%)
Hmean     RandCreate read       42.04 (   0.00%)       51.75 (  23.09%)
Hmean     RandCreate del     23377.66 (   0.00%)    23764.79 (   1.66%)

As expected, there is a small gain for the delete operation.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 58 +++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c5c57b..167e163cf733 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2590,24 +2590,26 @@ void mark_free_pages(struct zone *zone)
 }
 #endif /* CONFIG_PM */
 
-/*
- * Free a 0-order page
- * cold == true ? free a cold page : free a hot page
- */
-void free_hot_cold_page(struct page *page, bool cold)
+static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
 {
-	struct zone *zone = page_zone(page);
-	struct per_cpu_pages *pcp;
-	unsigned long flags;
-	unsigned long pfn = page_to_pfn(page);
 	int migratetype;
 
 	if (!free_pcp_prepare(page))
-		return;
+		return false;
 
 	migratetype = get_pfnblock_migratetype(page, pfn);
 	set_pcppage_migratetype(page, migratetype);
-	local_irq_save(flags);
+	return true;
+}
+
+static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
+				bool cold)
+{
+	struct zone *zone = page_zone(page);
+	struct per_cpu_pages *pcp;
+	int migratetype;
+
+	migratetype = get_pcppage_migratetype(page);
 	__count_vm_event(PGFREE);
 
 	/*
@@ -2620,7 +2622,7 @@ void free_hot_cold_page(struct page *page, bool cold)
 	if (migratetype >= MIGRATE_PCPTYPES) {
 		if (unlikely(is_migrate_isolate(migratetype))) {
 			free_one_page(zone, page, pfn, 0, migratetype);
-			goto out;
+			return;
 		}
 		migratetype = MIGRATE_MOVABLE;
 	}
@@ -2636,8 +2638,22 @@ void free_hot_cold_page(struct page *page, bool cold)
 		free_pcppages_bulk(zone, batch, pcp);
 		pcp->count -= batch;
 	}
+}
 
-out:
+/*
+ * Free a 0-order page
+ * cold == true ? free a cold page : free a hot page
+ */
+void free_hot_cold_page(struct page *page, bool cold)
+{
+	unsigned long flags;
+	unsigned long pfn = page_to_pfn(page);
+
+	if (!free_hot_cold_page_prepare(page, pfn))
+		return;
+
+	local_irq_save(flags);
+	free_hot_cold_page_commit(page, pfn, cold);
 	local_irq_restore(flags);
 }
 
@@ -2647,11 +2663,25 @@ void free_hot_cold_page(struct page *page, bool cold)
 void free_hot_cold_page_list(struct list_head *list, bool cold)
 {
 	struct page *page, *next;
+	unsigned long flags, pfn;
+
+	/* Prepare pages for freeing */
+	list_for_each_entry_safe(page, next, list, lru) {
+		pfn = page_to_pfn(page);
+		if (!free_hot_cold_page_prepare(page, pfn))
+			list_del(&page->lru);
+		page->private = pfn;
+	}
 
+	local_irq_save(flags);
 	list_for_each_entry_safe(page, next, list, lru) {
+		unsigned long pfn = page->private;
+
+		page->private = 0;
 		trace_mm_page_free_batched(page, cold);
-		free_hot_cold_page(page, cold);
+		free_hot_cold_page_commit(page, pfn, cold);
 	}
+	local_irq_restore(flags);
 }
 
 /*
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Freeing a list of pages current enables/disables IRQs for each page freed.
This patch splits freeing a list of pages into two operations -- preparing
the pages for freeing and the actual freeing. This is a tradeoff - we're
taking two passes of the list to free in exchange for avoiding multiple
enable/disable of IRQs.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                           janbatch-v1r1            oneirq-v1r1
Min          Time      149.00 (   0.00%)      141.00 (   5.37%)
1st-qrtle    Time      150.00 (   0.00%)      142.00 (   5.33%)
2nd-qrtle    Time      151.00 (   0.00%)      142.00 (   5.96%)
3rd-qrtle    Time      151.00 (   0.00%)      143.00 (   5.30%)
Max-90%      Time      153.00 (   0.00%)      144.00 (   5.88%)
Max-95%      Time      155.00 (   0.00%)      147.00 (   5.16%)
Max-99%      Time      201.00 (   0.00%)      195.00 (   2.99%)
Max          Time      236.00 (   0.00%)      230.00 (   2.54%)
Amean        Time      152.65 (   0.00%)      144.37 (   5.43%)
Stddev       Time        9.78 (   0.00%)       10.44 (  -6.72%)
Coeff        Time        6.41 (   0.00%)        7.23 ( -12.84%)
Best99%Amean Time      152.07 (   0.00%)      143.72 (   5.50%)
Best95%Amean Time      150.75 (   0.00%)      142.37 (   5.56%)
Best90%Amean Time      150.59 (   0.00%)      142.19 (   5.58%)
Best75%Amean Time      150.36 (   0.00%)      141.92 (   5.61%)
Best50%Amean Time      150.04 (   0.00%)      141.69 (   5.56%)
Best25%Amean Time      149.85 (   0.00%)      141.38 (   5.65%)

With a tiny number of files, each file truncated has resident page cache
and it shows that time to truncate is roughtly 5-6% with some minor jitter.

                                      4.14.0-rc4             4.14.0-rc4
                                   janbatch-v1r1            oneirq-v1r1
Hmean     SeqCreate ops         65.27 (   0.00%)       81.86 (  25.43%)
Hmean     SeqCreate read        39.48 (   0.00%)       47.44 (  20.16%)
Hmean     SeqCreate del      24963.95 (   0.00%)    26319.99 (   5.43%)
Hmean     RandCreate ops        65.47 (   0.00%)       82.01 (  25.26%)
Hmean     RandCreate read       42.04 (   0.00%)       51.75 (  23.09%)
Hmean     RandCreate del     23377.66 (   0.00%)    23764.79 (   1.66%)

As expected, there is a small gain for the delete operation.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 58 +++++++++++++++++++++++++++++++++++++++++++--------------
 1 file changed, 44 insertions(+), 14 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 77e4d3c5c57b..167e163cf733 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2590,24 +2590,26 @@ void mark_free_pages(struct zone *zone)
 }
 #endif /* CONFIG_PM */
 
-/*
- * Free a 0-order page
- * cold == true ? free a cold page : free a hot page
- */
-void free_hot_cold_page(struct page *page, bool cold)
+static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
 {
-	struct zone *zone = page_zone(page);
-	struct per_cpu_pages *pcp;
-	unsigned long flags;
-	unsigned long pfn = page_to_pfn(page);
 	int migratetype;
 
 	if (!free_pcp_prepare(page))
-		return;
+		return false;
 
 	migratetype = get_pfnblock_migratetype(page, pfn);
 	set_pcppage_migratetype(page, migratetype);
-	local_irq_save(flags);
+	return true;
+}
+
+static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
+				bool cold)
+{
+	struct zone *zone = page_zone(page);
+	struct per_cpu_pages *pcp;
+	int migratetype;
+
+	migratetype = get_pcppage_migratetype(page);
 	__count_vm_event(PGFREE);
 
 	/*
@@ -2620,7 +2622,7 @@ void free_hot_cold_page(struct page *page, bool cold)
 	if (migratetype >= MIGRATE_PCPTYPES) {
 		if (unlikely(is_migrate_isolate(migratetype))) {
 			free_one_page(zone, page, pfn, 0, migratetype);
-			goto out;
+			return;
 		}
 		migratetype = MIGRATE_MOVABLE;
 	}
@@ -2636,8 +2638,22 @@ void free_hot_cold_page(struct page *page, bool cold)
 		free_pcppages_bulk(zone, batch, pcp);
 		pcp->count -= batch;
 	}
+}
 
-out:
+/*
+ * Free a 0-order page
+ * cold == true ? free a cold page : free a hot page
+ */
+void free_hot_cold_page(struct page *page, bool cold)
+{
+	unsigned long flags;
+	unsigned long pfn = page_to_pfn(page);
+
+	if (!free_hot_cold_page_prepare(page, pfn))
+		return;
+
+	local_irq_save(flags);
+	free_hot_cold_page_commit(page, pfn, cold);
 	local_irq_restore(flags);
 }
 
@@ -2647,11 +2663,25 @@ void free_hot_cold_page(struct page *page, bool cold)
 void free_hot_cold_page_list(struct list_head *list, bool cold)
 {
 	struct page *page, *next;
+	unsigned long flags, pfn;
+
+	/* Prepare pages for freeing */
+	list_for_each_entry_safe(page, next, list, lru) {
+		pfn = page_to_pfn(page);
+		if (!free_hot_cold_page_prepare(page, pfn))
+			list_del(&page->lru);
+		page->private = pfn;
+	}
 
+	local_irq_save(flags);
 	list_for_each_entry_safe(page, next, list, lru) {
+		unsigned long pfn = page->private;
+
+		page->private = 0;
 		trace_mm_page_free_batched(page, cold);
-		free_hot_cold_page(page, cold);
+		free_hot_cold_page_commit(page, pfn, cold);
 	}
+	local_irq_restore(flags);
 }
 
 /*
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/8] mm, truncate: Do not check mapping for every page being truncated
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

During truncation, the mapping has already been checked for shmem and dax
so it's known that workingset_update_node is required. This patch avoids
the checks on mapping for each page being truncated. In all other cases,
a lookup helper is used to determine if workingset_update_node() needs
to be called. The one danger is that the API is slightly harder to use as
calling workingset_update_node directly without checking for dax or shmem
mappings could lead to surprises. However, the API rarely needs to be used
and hopefully the comment is enough to give people the hint.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                             oneirq-v1r1        pickhelper-v1r1
Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)

As you'd expect, the gain is marginal but it can be detected. The differences
in bonnie are all within the noise which is not surprising given the impact
on the microbenchmark.

radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field. The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the private
field is removed.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/dax.c                              |  2 +-
 include/linux/radix-tree.h            |  7 +++----
 include/linux/swap.h                  | 13 ++++++++++++-
 lib/idr.c                             |  2 +-
 lib/radix-tree.c                      | 30 +++++++++++++-----------------
 mm/filemap.c                          |  7 ++++---
 mm/shmem.c                            |  2 +-
 mm/truncate.c                         |  2 +-
 mm/workingset.c                       | 10 ++--------
 tools/testing/radix-tree/multiorder.c |  2 +-
 10 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index f001d8c72a06..3318ae9046e6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -565,7 +565,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 		ret = __radix_tree_lookup(page_tree, index, &node, &slot);
 		WARN_ON_ONCE(ret != entry);
 		__radix_tree_replace(page_tree, node, slot,
-				     new_entry, NULL, NULL);
+				     new_entry, NULL);
 		entry = new_entry;
 	}
 
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 567ebb5eaab0..0ca448c1cb42 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -301,18 +301,17 @@ void *__radix_tree_lookup(const struct radix_tree_root *, unsigned long index,
 void *radix_tree_lookup(const struct radix_tree_root *, unsigned long);
 void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *,
 					unsigned long index);
-typedef void (*radix_tree_update_node_t)(struct radix_tree_node *, void *);
+typedef void (*radix_tree_update_node_t)(struct radix_tree_node *);
 void __radix_tree_replace(struct radix_tree_root *, struct radix_tree_node *,
 			  void __rcu **slot, void *entry,
-			  radix_tree_update_node_t update_node, void *private);
+			  radix_tree_update_node_t update_node);
 void radix_tree_iter_replace(struct radix_tree_root *,
 		const struct radix_tree_iter *, void __rcu **slot, void *entry);
 void radix_tree_replace_slot(struct radix_tree_root *,
 			     void __rcu **slot, void *entry);
 void __radix_tree_delete_node(struct radix_tree_root *,
 			      struct radix_tree_node *,
-			      radix_tree_update_node_t update_node,
-			      void *private);
+			      radix_tree_update_node_t update_node);
 void radix_tree_iter_delete(struct radix_tree_root *,
 			struct radix_tree_iter *iter, void __rcu **slot);
 void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8a807292037f..257c48a525c8 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -292,7 +292,18 @@ struct vma_swap_readahead {
 void *workingset_eviction(struct address_space *mapping, struct page *page);
 bool workingset_refault(void *shadow);
 void workingset_activation(struct page *page);
-void workingset_update_node(struct radix_tree_node *node, void *private);
+
+/* Do not use directly, use workingset_lookup_update */
+void workingset_update_node(struct radix_tree_node *node);
+
+/* Returns workingset_update_node() if the mapping has shadow entries. */
+#define workingset_lookup_update(mapping)				\
+({									\
+	radix_tree_update_node_t __helper = workingset_update_node;	\
+	if (dax_mapping(mapping) || shmem_mapping(mapping))		\
+		__helper = NULL;					\
+	__helper;							\
+})
 
 /* linux/mm/page_alloc.c */
 extern unsigned long totalram_pages;
diff --git a/lib/idr.c b/lib/idr.c
index edd9b2be1651..2593ce513a18 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -171,7 +171,7 @@ void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
 	if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE))
 		return ERR_PTR(-ENOENT);
 
-	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL, NULL);
+	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL);
 
 	return entry;
 }
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 8b1feca1230a..c8d55565fafa 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -677,8 +677,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
  *	@root		radix tree root
  */
 static inline bool radix_tree_shrink(struct radix_tree_root *root,
-				     radix_tree_update_node_t update_node,
-				     void *private)
+				     radix_tree_update_node_t update_node)
 {
 	bool shrunk = false;
 
@@ -739,7 +738,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		if (!radix_tree_is_internal_node(child)) {
 			node->slots[0] = (void __rcu *)RADIX_TREE_RETRY;
 			if (update_node)
-				update_node(node, private);
+				update_node(node);
 		}
 
 		WARN_ON_ONCE(!list_empty(&node->private_list));
@@ -752,7 +751,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 
 static bool delete_node(struct radix_tree_root *root,
 			struct radix_tree_node *node,
-			radix_tree_update_node_t update_node, void *private)
+			radix_tree_update_node_t update_node)
 {
 	bool deleted = false;
 
@@ -762,8 +761,8 @@ static bool delete_node(struct radix_tree_root *root,
 		if (node->count) {
 			if (node_to_entry(node) ==
 					rcu_dereference_raw(root->rnode))
-				deleted |= radix_tree_shrink(root, update_node,
-								private);
+				deleted |= radix_tree_shrink(root,
+								update_node);
 			return deleted;
 		}
 
@@ -1173,7 +1172,6 @@ static int calculate_count(struct radix_tree_root *root,
  * @slot:		pointer to slot in @node
  * @item:		new item to store in the slot.
  * @update_node:	callback for changing leaf nodes
- * @private:		private data to pass to @update_node
  *
  * For use with __radix_tree_lookup().  Caller must hold tree write locked
  * across slot lookup and replacement.
@@ -1181,7 +1179,7 @@ static int calculate_count(struct radix_tree_root *root,
 void __radix_tree_replace(struct radix_tree_root *root,
 			  struct radix_tree_node *node,
 			  void __rcu **slot, void *item,
-			  radix_tree_update_node_t update_node, void *private)
+			  radix_tree_update_node_t update_node)
 {
 	void *old = rcu_dereference_raw(*slot);
 	int exceptional = !!radix_tree_exceptional_entry(item) -
@@ -1201,9 +1199,9 @@ void __radix_tree_replace(struct radix_tree_root *root,
 		return;
 
 	if (update_node)
-		update_node(node, private);
+		update_node(node);
 
-	delete_node(root, node, update_node, private);
+	delete_node(root, node, update_node);
 }
 
 /**
@@ -1225,7 +1223,7 @@ void __radix_tree_replace(struct radix_tree_root *root,
 void radix_tree_replace_slot(struct radix_tree_root *root,
 			     void __rcu **slot, void *item)
 {
-	__radix_tree_replace(root, NULL, slot, item, NULL, NULL);
+	__radix_tree_replace(root, NULL, slot, item, NULL);
 }
 EXPORT_SYMBOL(radix_tree_replace_slot);
 
@@ -1242,7 +1240,7 @@ void radix_tree_iter_replace(struct radix_tree_root *root,
 				const struct radix_tree_iter *iter,
 				void __rcu **slot, void *item)
 {
-	__radix_tree_replace(root, iter->node, slot, item, NULL, NULL);
+	__radix_tree_replace(root, iter->node, slot, item, NULL);
 }
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
@@ -1972,7 +1970,6 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
  *	@root:		radix tree root
  *	@node:		node containing @index
  *	@update_node:	callback for changing leaf nodes
- *	@private:	private data to pass to @update_node
  *
  *	After clearing the slot at @index in @node from radix tree
  *	rooted at @root, call this function to attempt freeing the
@@ -1980,10 +1977,9 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
  */
 void __radix_tree_delete_node(struct radix_tree_root *root,
 			      struct radix_tree_node *node,
-			      radix_tree_update_node_t update_node,
-			      void *private)
+			      radix_tree_update_node_t update_node)
 {
-	delete_node(root, node, update_node, private);
+	delete_node(root, node, update_node);
 }
 
 static bool __radix_tree_delete(struct radix_tree_root *root,
@@ -2001,7 +1997,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
 			node_tag_clear(root, node, tag, offset);
 
 	replace_slot(slot, NULL, node, -1, exceptional);
-	return node && delete_node(root, node, NULL, NULL);
+	return node && delete_node(root, node, NULL);
 }
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index dba68e1d9869..e59580feefd9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -35,6 +35,7 @@
 #include <linux/hugetlb.h>
 #include <linux/memcontrol.h>
 #include <linux/cleancache.h>
+#include <linux/shmem_fs.h>
 #include <linux/rmap.h>
 #include "internal.h"
 
@@ -134,7 +135,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
 			*shadowp = p;
 	}
 	__radix_tree_replace(&mapping->page_tree, node, slot, page,
-			     workingset_update_node, mapping);
+			     workingset_lookup_update(mapping));
 	mapping->nrpages++;
 	return 0;
 }
@@ -162,7 +163,7 @@ static void page_cache_tree_delete(struct address_space *mapping,
 
 		radix_tree_clear_tags(&mapping->page_tree, node, slot);
 		__radix_tree_replace(&mapping->page_tree, node, slot, shadow,
-				     workingset_update_node, mapping);
+				workingset_lookup_update(mapping));
 	}
 
 	page->mapping = NULL;
@@ -360,7 +361,7 @@ page_cache_tree_delete_batch(struct address_space *mapping, int count,
 		}
 		radix_tree_clear_tags(&mapping->page_tree, iter.node, slot);
 		__radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
-				     workingset_update_node, mapping);
+				workingset_lookup_update(mapping));
 		total_pages++;
 	}
 	mapping->nrpages -= total_pages;
diff --git a/mm/shmem.c b/mm/shmem.c
index 07a1d22807be..a72f68aee6a4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -338,7 +338,7 @@ static int shmem_radix_tree_replace(struct address_space *mapping,
 	if (item != expected)
 		return -ENOENT;
 	__radix_tree_replace(&mapping->page_tree, node, pslot,
-			     replacement, NULL, NULL);
+			     replacement, NULL);
 	return 0;
 }
 
diff --git a/mm/truncate.c b/mm/truncate.c
index 3dfa2d5e642e..d578d542a6ee 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -42,7 +42,7 @@ static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
 	if (*slot != entry)
 		goto unlock;
 	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
-			     workingset_update_node, mapping);
+			     workingset_update_node);
 	mapping->nrexceptional--;
 unlock:
 	spin_unlock_irq(&mapping->tree_lock);
diff --git a/mm/workingset.c b/mm/workingset.c
index 7119cd745ace..0f7b4fb130e3 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -339,14 +339,8 @@ void workingset_activation(struct page *page)
 
 static struct list_lru shadow_nodes;
 
-void workingset_update_node(struct radix_tree_node *node, void *private)
+void workingset_update_node(struct radix_tree_node *node)
 {
-	struct address_space *mapping = private;
-
-	/* Only regular page cache has shadow entries */
-	if (dax_mapping(mapping) || shmem_mapping(mapping))
-		return;
-
 	/*
 	 * Track non-empty nodes that contain only shadow entries;
 	 * unlink those that contain pages or are being freed.
@@ -474,7 +468,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 		goto out_invalid;
 	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
 	__radix_tree_delete_node(&mapping->page_tree, node,
-				 workingset_update_node, mapping);
+				 workingset_lookup_update(mapping));
 
 out_invalid:
 	spin_unlock(&mapping->tree_lock);
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 06c71178d07d..59245b3d587c 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -618,7 +618,7 @@ static void multiorder_account(void)
 	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
 	__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
 	assert(node->count == node->exceptional * 2);
-	__radix_tree_replace(&tree, node, slot, NULL, NULL, NULL);
+	__radix_tree_replace(&tree, node, slot, NULL, NULL);
 	assert(node->exceptional == 0);
 
 	item_kill_tree(&tree);
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 2/8] mm, truncate: Do not check mapping for every page being truncated
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

During truncation, the mapping has already been checked for shmem and dax
so it's known that workingset_update_node is required. This patch avoids
the checks on mapping for each page being truncated. In all other cases,
a lookup helper is used to determine if workingset_update_node() needs
to be called. The one danger is that the API is slightly harder to use as
calling workingset_update_node directly without checking for dax or shmem
mappings could lead to surprises. However, the API rarely needs to be used
and hopefully the comment is enough to give people the hint.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                             oneirq-v1r1        pickhelper-v1r1
Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)

As you'd expect, the gain is marginal but it can be detected. The differences
in bonnie are all within the noise which is not surprising given the impact
on the microbenchmark.

radix_tree_update_node_t is a callback for some radix operations that
optionally passes in a private field. The only user of the callback is
workingset_update_node and as it no longer requires a mapping, the private
field is removed.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 fs/dax.c                              |  2 +-
 include/linux/radix-tree.h            |  7 +++----
 include/linux/swap.h                  | 13 ++++++++++++-
 lib/idr.c                             |  2 +-
 lib/radix-tree.c                      | 30 +++++++++++++-----------------
 mm/filemap.c                          |  7 ++++---
 mm/shmem.c                            |  2 +-
 mm/truncate.c                         |  2 +-
 mm/workingset.c                       | 10 ++--------
 tools/testing/radix-tree/multiorder.c |  2 +-
 10 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/fs/dax.c b/fs/dax.c
index f001d8c72a06..3318ae9046e6 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -565,7 +565,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
 		ret = __radix_tree_lookup(page_tree, index, &node, &slot);
 		WARN_ON_ONCE(ret != entry);
 		__radix_tree_replace(page_tree, node, slot,
-				     new_entry, NULL, NULL);
+				     new_entry, NULL);
 		entry = new_entry;
 	}
 
diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 567ebb5eaab0..0ca448c1cb42 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -301,18 +301,17 @@ void *__radix_tree_lookup(const struct radix_tree_root *, unsigned long index,
 void *radix_tree_lookup(const struct radix_tree_root *, unsigned long);
 void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *,
 					unsigned long index);
-typedef void (*radix_tree_update_node_t)(struct radix_tree_node *, void *);
+typedef void (*radix_tree_update_node_t)(struct radix_tree_node *);
 void __radix_tree_replace(struct radix_tree_root *, struct radix_tree_node *,
 			  void __rcu **slot, void *entry,
-			  radix_tree_update_node_t update_node, void *private);
+			  radix_tree_update_node_t update_node);
 void radix_tree_iter_replace(struct radix_tree_root *,
 		const struct radix_tree_iter *, void __rcu **slot, void *entry);
 void radix_tree_replace_slot(struct radix_tree_root *,
 			     void __rcu **slot, void *entry);
 void __radix_tree_delete_node(struct radix_tree_root *,
 			      struct radix_tree_node *,
-			      radix_tree_update_node_t update_node,
-			      void *private);
+			      radix_tree_update_node_t update_node);
 void radix_tree_iter_delete(struct radix_tree_root *,
 			struct radix_tree_iter *iter, void __rcu **slot);
 void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8a807292037f..257c48a525c8 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -292,7 +292,18 @@ struct vma_swap_readahead {
 void *workingset_eviction(struct address_space *mapping, struct page *page);
 bool workingset_refault(void *shadow);
 void workingset_activation(struct page *page);
-void workingset_update_node(struct radix_tree_node *node, void *private);
+
+/* Do not use directly, use workingset_lookup_update */
+void workingset_update_node(struct radix_tree_node *node);
+
+/* Returns workingset_update_node() if the mapping has shadow entries. */
+#define workingset_lookup_update(mapping)				\
+({									\
+	radix_tree_update_node_t __helper = workingset_update_node;	\
+	if (dax_mapping(mapping) || shmem_mapping(mapping))		\
+		__helper = NULL;					\
+	__helper;							\
+})
 
 /* linux/mm/page_alloc.c */
 extern unsigned long totalram_pages;
diff --git a/lib/idr.c b/lib/idr.c
index edd9b2be1651..2593ce513a18 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -171,7 +171,7 @@ void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
 	if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE))
 		return ERR_PTR(-ENOENT);
 
-	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL, NULL);
+	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL);
 
 	return entry;
 }
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 8b1feca1230a..c8d55565fafa 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -677,8 +677,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
  *	@root		radix tree root
  */
 static inline bool radix_tree_shrink(struct radix_tree_root *root,
-				     radix_tree_update_node_t update_node,
-				     void *private)
+				     radix_tree_update_node_t update_node)
 {
 	bool shrunk = false;
 
@@ -739,7 +738,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 		if (!radix_tree_is_internal_node(child)) {
 			node->slots[0] = (void __rcu *)RADIX_TREE_RETRY;
 			if (update_node)
-				update_node(node, private);
+				update_node(node);
 		}
 
 		WARN_ON_ONCE(!list_empty(&node->private_list));
@@ -752,7 +751,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
 
 static bool delete_node(struct radix_tree_root *root,
 			struct radix_tree_node *node,
-			radix_tree_update_node_t update_node, void *private)
+			radix_tree_update_node_t update_node)
 {
 	bool deleted = false;
 
@@ -762,8 +761,8 @@ static bool delete_node(struct radix_tree_root *root,
 		if (node->count) {
 			if (node_to_entry(node) ==
 					rcu_dereference_raw(root->rnode))
-				deleted |= radix_tree_shrink(root, update_node,
-								private);
+				deleted |= radix_tree_shrink(root,
+								update_node);
 			return deleted;
 		}
 
@@ -1173,7 +1172,6 @@ static int calculate_count(struct radix_tree_root *root,
  * @slot:		pointer to slot in @node
  * @item:		new item to store in the slot.
  * @update_node:	callback for changing leaf nodes
- * @private:		private data to pass to @update_node
  *
  * For use with __radix_tree_lookup().  Caller must hold tree write locked
  * across slot lookup and replacement.
@@ -1181,7 +1179,7 @@ static int calculate_count(struct radix_tree_root *root,
 void __radix_tree_replace(struct radix_tree_root *root,
 			  struct radix_tree_node *node,
 			  void __rcu **slot, void *item,
-			  radix_tree_update_node_t update_node, void *private)
+			  radix_tree_update_node_t update_node)
 {
 	void *old = rcu_dereference_raw(*slot);
 	int exceptional = !!radix_tree_exceptional_entry(item) -
@@ -1201,9 +1199,9 @@ void __radix_tree_replace(struct radix_tree_root *root,
 		return;
 
 	if (update_node)
-		update_node(node, private);
+		update_node(node);
 
-	delete_node(root, node, update_node, private);
+	delete_node(root, node, update_node);
 }
 
 /**
@@ -1225,7 +1223,7 @@ void __radix_tree_replace(struct radix_tree_root *root,
 void radix_tree_replace_slot(struct radix_tree_root *root,
 			     void __rcu **slot, void *item)
 {
-	__radix_tree_replace(root, NULL, slot, item, NULL, NULL);
+	__radix_tree_replace(root, NULL, slot, item, NULL);
 }
 EXPORT_SYMBOL(radix_tree_replace_slot);
 
@@ -1242,7 +1240,7 @@ void radix_tree_iter_replace(struct radix_tree_root *root,
 				const struct radix_tree_iter *iter,
 				void __rcu **slot, void *item)
 {
-	__radix_tree_replace(root, iter->node, slot, item, NULL, NULL);
+	__radix_tree_replace(root, iter->node, slot, item, NULL);
 }
 
 #ifdef CONFIG_RADIX_TREE_MULTIORDER
@@ -1972,7 +1970,6 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
  *	@root:		radix tree root
  *	@node:		node containing @index
  *	@update_node:	callback for changing leaf nodes
- *	@private:	private data to pass to @update_node
  *
  *	After clearing the slot at @index in @node from radix tree
  *	rooted at @root, call this function to attempt freeing the
@@ -1980,10 +1977,9 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
  */
 void __radix_tree_delete_node(struct radix_tree_root *root,
 			      struct radix_tree_node *node,
-			      radix_tree_update_node_t update_node,
-			      void *private)
+			      radix_tree_update_node_t update_node)
 {
-	delete_node(root, node, update_node, private);
+	delete_node(root, node, update_node);
 }
 
 static bool __radix_tree_delete(struct radix_tree_root *root,
@@ -2001,7 +1997,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
 			node_tag_clear(root, node, tag, offset);
 
 	replace_slot(slot, NULL, node, -1, exceptional);
-	return node && delete_node(root, node, NULL, NULL);
+	return node && delete_node(root, node, NULL);
 }
 
 /**
diff --git a/mm/filemap.c b/mm/filemap.c
index dba68e1d9869..e59580feefd9 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -35,6 +35,7 @@
 #include <linux/hugetlb.h>
 #include <linux/memcontrol.h>
 #include <linux/cleancache.h>
+#include <linux/shmem_fs.h>
 #include <linux/rmap.h>
 #include "internal.h"
 
@@ -134,7 +135,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
 			*shadowp = p;
 	}
 	__radix_tree_replace(&mapping->page_tree, node, slot, page,
-			     workingset_update_node, mapping);
+			     workingset_lookup_update(mapping));
 	mapping->nrpages++;
 	return 0;
 }
@@ -162,7 +163,7 @@ static void page_cache_tree_delete(struct address_space *mapping,
 
 		radix_tree_clear_tags(&mapping->page_tree, node, slot);
 		__radix_tree_replace(&mapping->page_tree, node, slot, shadow,
-				     workingset_update_node, mapping);
+				workingset_lookup_update(mapping));
 	}
 
 	page->mapping = NULL;
@@ -360,7 +361,7 @@ page_cache_tree_delete_batch(struct address_space *mapping, int count,
 		}
 		radix_tree_clear_tags(&mapping->page_tree, iter.node, slot);
 		__radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
-				     workingset_update_node, mapping);
+				workingset_lookup_update(mapping));
 		total_pages++;
 	}
 	mapping->nrpages -= total_pages;
diff --git a/mm/shmem.c b/mm/shmem.c
index 07a1d22807be..a72f68aee6a4 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -338,7 +338,7 @@ static int shmem_radix_tree_replace(struct address_space *mapping,
 	if (item != expected)
 		return -ENOENT;
 	__radix_tree_replace(&mapping->page_tree, node, pslot,
-			     replacement, NULL, NULL);
+			     replacement, NULL);
 	return 0;
 }
 
diff --git a/mm/truncate.c b/mm/truncate.c
index 3dfa2d5e642e..d578d542a6ee 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -42,7 +42,7 @@ static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
 	if (*slot != entry)
 		goto unlock;
 	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
-			     workingset_update_node, mapping);
+			     workingset_update_node);
 	mapping->nrexceptional--;
 unlock:
 	spin_unlock_irq(&mapping->tree_lock);
diff --git a/mm/workingset.c b/mm/workingset.c
index 7119cd745ace..0f7b4fb130e3 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -339,14 +339,8 @@ void workingset_activation(struct page *page)
 
 static struct list_lru shadow_nodes;
 
-void workingset_update_node(struct radix_tree_node *node, void *private)
+void workingset_update_node(struct radix_tree_node *node)
 {
-	struct address_space *mapping = private;
-
-	/* Only regular page cache has shadow entries */
-	if (dax_mapping(mapping) || shmem_mapping(mapping))
-		return;
-
 	/*
 	 * Track non-empty nodes that contain only shadow entries;
 	 * unlink those that contain pages or are being freed.
@@ -474,7 +468,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
 		goto out_invalid;
 	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
 	__radix_tree_delete_node(&mapping->page_tree, node,
-				 workingset_update_node, mapping);
+				 workingset_lookup_update(mapping));
 
 out_invalid:
 	spin_unlock(&mapping->tree_lock);
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 06c71178d07d..59245b3d587c 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -618,7 +618,7 @@ static void multiorder_account(void)
 	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
 	__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
 	assert(node->count == node->exceptional * 2);
-	__radix_tree_replace(&tree, node, slot, NULL, NULL, NULL);
+	__radix_tree_replace(&tree, node, slot, NULL, NULL);
 	assert(node->exceptional == 0);
 
 	item_kill_tree(&tree);
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

During truncate each entry in a pagevec is checked to see if it is an
exceptional entry and if so, the shadow entry is cleaned up.  This is
potentially expensive as multiple entries for a mapping locks/unlocks the
tree lock.  This batches the operation such that any exceptional entries
removed from a pagevec only acquire the mapping tree lock once. The corner
case where this is more expensive is where there is only one exceptional
entry but this is unlikely due to temporal locality and how it affects
LRU ordering. Note that for truncations of small files created recently,
this patch should show no gain because it only batches the handling of
exceptional entries.

sparsetruncate (large)
                              4.14.0-rc4             4.14.0-rc4
                         pickhelper-v1r1       batchshadow-v1r1
Min          Time       38.00 (   0.00%)       27.00 (  28.95%)
1st-qrtle    Time       40.00 (   0.00%)       28.00 (  30.00%)
2nd-qrtle    Time       44.00 (   0.00%)       41.00 (   6.82%)
3rd-qrtle    Time      146.00 (   0.00%)      147.00 (  -0.68%)
Max-90%      Time      153.00 (   0.00%)      153.00 (   0.00%)
Max-95%      Time      155.00 (   0.00%)      156.00 (  -0.65%)
Max-99%      Time      181.00 (   0.00%)      171.00 (   5.52%)
Amean        Time       93.04 (   0.00%)       88.43 (   4.96%)
Best99%Amean Time       92.08 (   0.00%)       86.13 (   6.46%)
Best95%Amean Time       89.19 (   0.00%)       83.13 (   6.80%)
Best90%Amean Time       85.60 (   0.00%)       79.15 (   7.53%)
Best75%Amean Time       72.95 (   0.00%)       65.09 (  10.78%)
Best50%Amean Time       39.86 (   0.00%)       28.20 (  29.25%)
Best25%Amean Time       39.44 (   0.00%)       27.70 (  29.77%)

bonnie
                                      4.14.0-rc4             4.14.0-rc4
                                 pickhelper-v1r1       batchshadow-v1r1
Hmean     SeqCreate ops         71.92 (   0.00%)       76.78 (   6.76%)
Hmean     SeqCreate read        42.42 (   0.00%)       45.01 (   6.10%)
Hmean     SeqCreate del      26519.88 (   0.00%)    27191.87 (   2.53%)
Hmean     RandCreate ops        71.92 (   0.00%)       76.95 (   7.00%)
Hmean     RandCreate read       44.44 (   0.00%)       49.23 (  10.78%)
Hmean     RandCreate del     24948.62 (   0.00%)    24764.97 (  -0.74%)

Truncation of a large number of files shows a substantial gain with 99% of files
being truncated 6.46% faster. bonnie shows a modest gain of 2.53%

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/truncate.c | 86 ++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 61 insertions(+), 25 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index d578d542a6ee..d1913888edf4 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -25,44 +25,77 @@
 #include <linux/rmap.h>
 #include "internal.h"
 
-static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
-			       void *entry)
+/*
+ * Regular page slots are stabilized by the page lock even without the tree
+ * itself locked.  These unlocked entries need verification under the tree
+ * lock.
+ */
+static inline void __clear_shadow_entry(struct address_space *mapping,
+				pgoff_t index, void *entry)
 {
 	struct radix_tree_node *node;
 	void **slot;
 
-	spin_lock_irq(&mapping->tree_lock);
-	/*
-	 * Regular page slots are stabilized by the page lock even
-	 * without the tree itself locked.  These unlocked entries
-	 * need verification under the tree lock.
-	 */
 	if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
-		goto unlock;
+		return;
 	if (*slot != entry)
-		goto unlock;
+		return;
 	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
 			     workingset_update_node);
 	mapping->nrexceptional--;
-unlock:
+}
+
+static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
+			       void *entry)
+{
+	spin_lock_irq(&mapping->tree_lock);
+	__clear_shadow_entry(mapping, index, entry);
 	spin_unlock_irq(&mapping->tree_lock);
 }
 
 /*
- * Unconditionally remove exceptional entry. Usually called from truncate path.
+ * Unconditionally remove exceptional entries. Usually called from truncate
+ * path. Note that the pagevec may be altered by this function by removing
+ * exceptional entries similar to what pagevec_remove_exceptionals does.
  */
-static void truncate_exceptional_entry(struct address_space *mapping,
-				       pgoff_t index, void *entry)
+static void truncate_exceptional_pvec_entries(struct address_space *mapping,
+				struct pagevec *pvec, pgoff_t *indices, int ei)
 {
+	int i, j;
+	bool dax;
+
+	/* Return immediately if caller indicates there are no entries */
+	if (ei == PAGEVEC_SIZE)
+		return;
+
 	/* Handled by shmem itself */
 	if (shmem_mapping(mapping))
 		return;
 
-	if (dax_mapping(mapping)) {
-		dax_delete_mapping_entry(mapping, index);
-		return;
+	dax = dax_mapping(mapping);
+	if (!dax)
+		spin_lock_irq(&mapping->tree_lock);
+
+	for (i = ei, j = ei; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+		pgoff_t index = indices[i];
+
+		if (!radix_tree_exceptional_entry(page)) {
+			pvec->pages[j++] = page;
+			continue;
+		}
+
+		if (unlikely(dax)) {
+			dax_delete_mapping_entry(mapping, index);
+			continue;
+		}
+
+		__clear_shadow_entry(mapping, index, page);
 	}
-	clear_shadow_entry(mapping, index, entry);
+
+	if (!dax)
+		spin_unlock_irq(&mapping->tree_lock);
+	pvec->nr = j;
 }
 
 /*
@@ -301,6 +334,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		 */
 		struct page *pages[PAGEVEC_SIZE];
 		int batch_count = 0;
+		int ei = PAGEVEC_SIZE;
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
@@ -311,8 +345,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 				break;
 
 			if (radix_tree_exceptional_entry(page)) {
-				truncate_exceptional_entry(mapping, index,
-							   page);
+				if (ei == PAGEVEC_SIZE)
+					ei = i;
 				continue;
 			}
 
@@ -334,12 +368,11 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		delete_from_page_cache_batch(mapping, batch_count, pages);
 		for (i = 0; i < batch_count; i++)
 			unlock_page(pages[i]);
-		pagevec_remove_exceptionals(&pvec);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices, ei);
 		pagevec_release(&pvec);
 		cond_resched();
 		index++;
 	}
-
 	if (partial_start) {
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
@@ -381,6 +414,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 
 	index = start;
 	for ( ; ; ) {
+		int ei = PAGEVEC_SIZE;
+
 		cond_resched();
 		if (!pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
@@ -397,6 +432,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			pagevec_release(&pvec);
 			break;
 		}
+
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			}
 
 			if (radix_tree_exceptional_entry(page)) {
-				truncate_exceptional_entry(mapping, index,
-							   page);
+				if (ei == PAGEVEC_SIZE)
+					ei = i;
 				continue;
 			}
 
@@ -420,7 +456,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
-		pagevec_remove_exceptionals(&pvec);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices, ei);
 		pagevec_release(&pvec);
 		index++;
 	}
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

During truncate each entry in a pagevec is checked to see if it is an
exceptional entry and if so, the shadow entry is cleaned up.  This is
potentially expensive as multiple entries for a mapping locks/unlocks the
tree lock.  This batches the operation such that any exceptional entries
removed from a pagevec only acquire the mapping tree lock once. The corner
case where this is more expensive is where there is only one exceptional
entry but this is unlikely due to temporal locality and how it affects
LRU ordering. Note that for truncations of small files created recently,
this patch should show no gain because it only batches the handling of
exceptional entries.

sparsetruncate (large)
                              4.14.0-rc4             4.14.0-rc4
                         pickhelper-v1r1       batchshadow-v1r1
Min          Time       38.00 (   0.00%)       27.00 (  28.95%)
1st-qrtle    Time       40.00 (   0.00%)       28.00 (  30.00%)
2nd-qrtle    Time       44.00 (   0.00%)       41.00 (   6.82%)
3rd-qrtle    Time      146.00 (   0.00%)      147.00 (  -0.68%)
Max-90%      Time      153.00 (   0.00%)      153.00 (   0.00%)
Max-95%      Time      155.00 (   0.00%)      156.00 (  -0.65%)
Max-99%      Time      181.00 (   0.00%)      171.00 (   5.52%)
Amean        Time       93.04 (   0.00%)       88.43 (   4.96%)
Best99%Amean Time       92.08 (   0.00%)       86.13 (   6.46%)
Best95%Amean Time       89.19 (   0.00%)       83.13 (   6.80%)
Best90%Amean Time       85.60 (   0.00%)       79.15 (   7.53%)
Best75%Amean Time       72.95 (   0.00%)       65.09 (  10.78%)
Best50%Amean Time       39.86 (   0.00%)       28.20 (  29.25%)
Best25%Amean Time       39.44 (   0.00%)       27.70 (  29.77%)

bonnie
                                      4.14.0-rc4             4.14.0-rc4
                                 pickhelper-v1r1       batchshadow-v1r1
Hmean     SeqCreate ops         71.92 (   0.00%)       76.78 (   6.76%)
Hmean     SeqCreate read        42.42 (   0.00%)       45.01 (   6.10%)
Hmean     SeqCreate del      26519.88 (   0.00%)    27191.87 (   2.53%)
Hmean     RandCreate ops        71.92 (   0.00%)       76.95 (   7.00%)
Hmean     RandCreate read       44.44 (   0.00%)       49.23 (  10.78%)
Hmean     RandCreate del     24948.62 (   0.00%)    24764.97 (  -0.74%)

Truncation of a large number of files shows a substantial gain with 99% of files
being truncated 6.46% faster. bonnie shows a modest gain of 2.53%

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
 mm/truncate.c | 86 ++++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 61 insertions(+), 25 deletions(-)

diff --git a/mm/truncate.c b/mm/truncate.c
index d578d542a6ee..d1913888edf4 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -25,44 +25,77 @@
 #include <linux/rmap.h>
 #include "internal.h"
 
-static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
-			       void *entry)
+/*
+ * Regular page slots are stabilized by the page lock even without the tree
+ * itself locked.  These unlocked entries need verification under the tree
+ * lock.
+ */
+static inline void __clear_shadow_entry(struct address_space *mapping,
+				pgoff_t index, void *entry)
 {
 	struct radix_tree_node *node;
 	void **slot;
 
-	spin_lock_irq(&mapping->tree_lock);
-	/*
-	 * Regular page slots are stabilized by the page lock even
-	 * without the tree itself locked.  These unlocked entries
-	 * need verification under the tree lock.
-	 */
 	if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot))
-		goto unlock;
+		return;
 	if (*slot != entry)
-		goto unlock;
+		return;
 	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
 			     workingset_update_node);
 	mapping->nrexceptional--;
-unlock:
+}
+
+static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
+			       void *entry)
+{
+	spin_lock_irq(&mapping->tree_lock);
+	__clear_shadow_entry(mapping, index, entry);
 	spin_unlock_irq(&mapping->tree_lock);
 }
 
 /*
- * Unconditionally remove exceptional entry. Usually called from truncate path.
+ * Unconditionally remove exceptional entries. Usually called from truncate
+ * path. Note that the pagevec may be altered by this function by removing
+ * exceptional entries similar to what pagevec_remove_exceptionals does.
  */
-static void truncate_exceptional_entry(struct address_space *mapping,
-				       pgoff_t index, void *entry)
+static void truncate_exceptional_pvec_entries(struct address_space *mapping,
+				struct pagevec *pvec, pgoff_t *indices, int ei)
 {
+	int i, j;
+	bool dax;
+
+	/* Return immediately if caller indicates there are no entries */
+	if (ei == PAGEVEC_SIZE)
+		return;
+
 	/* Handled by shmem itself */
 	if (shmem_mapping(mapping))
 		return;
 
-	if (dax_mapping(mapping)) {
-		dax_delete_mapping_entry(mapping, index);
-		return;
+	dax = dax_mapping(mapping);
+	if (!dax)
+		spin_lock_irq(&mapping->tree_lock);
+
+	for (i = ei, j = ei; i < pagevec_count(pvec); i++) {
+		struct page *page = pvec->pages[i];
+		pgoff_t index = indices[i];
+
+		if (!radix_tree_exceptional_entry(page)) {
+			pvec->pages[j++] = page;
+			continue;
+		}
+
+		if (unlikely(dax)) {
+			dax_delete_mapping_entry(mapping, index);
+			continue;
+		}
+
+		__clear_shadow_entry(mapping, index, page);
 	}
-	clear_shadow_entry(mapping, index, entry);
+
+	if (!dax)
+		spin_unlock_irq(&mapping->tree_lock);
+	pvec->nr = j;
 }
 
 /*
@@ -301,6 +334,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		 */
 		struct page *pages[PAGEVEC_SIZE];
 		int batch_count = 0;
+		int ei = PAGEVEC_SIZE;
 
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
@@ -311,8 +345,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 				break;
 
 			if (radix_tree_exceptional_entry(page)) {
-				truncate_exceptional_entry(mapping, index,
-							   page);
+				if (ei == PAGEVEC_SIZE)
+					ei = i;
 				continue;
 			}
 
@@ -334,12 +368,11 @@ void truncate_inode_pages_range(struct address_space *mapping,
 		delete_from_page_cache_batch(mapping, batch_count, pages);
 		for (i = 0; i < batch_count; i++)
 			unlock_page(pages[i]);
-		pagevec_remove_exceptionals(&pvec);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices, ei);
 		pagevec_release(&pvec);
 		cond_resched();
 		index++;
 	}
-
 	if (partial_start) {
 		struct page *page = find_lock_page(mapping, start - 1);
 		if (page) {
@@ -381,6 +414,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 
 	index = start;
 	for ( ; ; ) {
+		int ei = PAGEVEC_SIZE;
+
 		cond_resched();
 		if (!pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE), indices)) {
@@ -397,6 +432,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			pagevec_release(&pvec);
 			break;
 		}
+
 		for (i = 0; i < pagevec_count(&pvec); i++) {
 			struct page *page = pvec.pages[i];
 
@@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			}
 
 			if (radix_tree_exceptional_entry(page)) {
-				truncate_exceptional_entry(mapping, index,
-							   page);
+				if (ei == PAGEVEC_SIZE)
+					ei = i;
 				continue;
 			}
 
@@ -420,7 +456,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 			truncate_inode_page(mapping, page);
 			unlock_page(page);
 		}
-		pagevec_remove_exceptionals(&pvec);
+		truncate_exceptional_pvec_entries(mapping, &pvec, indices, ei);
 		pagevec_release(&pvec);
 		index++;
 	}
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

When a pagevec is initialised on the stack, it is generally used multiple
times over a range of pages, looking up entries and then releasing them.
On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
on the grounds the page being released may be on those queues and the
pages may be cache hot. In many cases only the first drain is necessary
as it's unlikely that the range of pages being walked is racing against
LRU addition.  Even if there is such a race, the impact is marginal where
as constantly redraining the lru pagevecs costs.

This patch ensures that pagevec is only drained once in a given lifecycle
without increasing the cache footprint of the pagevec structure. Only
sparsetruncate tiny is shown here as large files have many exceptional
entries and calls pagecache_release less frequently.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                        batchshadow-v1r1          onedrain-v1r1
Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)

The impact on bonnie is marginal and within the noise because a significant
percentage of the file being truncated has been reclaimed and consists of
shadow entries which reduce the hotness of the pagevec_release path.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/pagevec.h | 4 +++-
 mm/swap.c               | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 4dcd5506f1ed..4231979be982 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,8 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	unsigned long cold;
+	bool cold;
+	bool drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -45,6 +46,7 @@ static inline void pagevec_init(struct pagevec *pvec, int cold)
 {
 	pvec->nr = 0;
 	pvec->cold = cold;
+	pvec->drained = false;
 }
 
 static inline void pagevec_reinit(struct pagevec *pvec)
diff --git a/mm/swap.c b/mm/swap.c
index a77d68f2c1b6..31bd9d8a5db7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -833,7 +833,10 @@ EXPORT_SYMBOL(release_pages);
  */
 void __pagevec_release(struct pagevec *pvec)
 {
-	lru_add_drain();
+	if (!pvec->drained) {
+		lru_add_drain();
+		pvec->drained = true;
+	}
 	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
 	pagevec_reinit(pvec);
 }
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

When a pagevec is initialised on the stack, it is generally used multiple
times over a range of pages, looking up entries and then releasing them.
On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
on the grounds the page being released may be on those queues and the
pages may be cache hot. In many cases only the first drain is necessary
as it's unlikely that the range of pages being walked is racing against
LRU addition.  Even if there is such a race, the impact is marginal where
as constantly redraining the lru pagevecs costs.

This patch ensures that pagevec is only drained once in a given lifecycle
without increasing the cache footprint of the pagevec structure. Only
sparsetruncate tiny is shown here as large files have many exceptional
entries and calls pagecache_release less frequently.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                        batchshadow-v1r1          onedrain-v1r1
Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)

The impact on bonnie is marginal and within the noise because a significant
percentage of the file being truncated has been reclaimed and consists of
shadow entries which reduce the hotness of the pagevec_release path.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/pagevec.h | 4 +++-
 mm/swap.c               | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 4dcd5506f1ed..4231979be982 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,8 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	unsigned long cold;
+	bool cold;
+	bool drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -45,6 +46,7 @@ static inline void pagevec_init(struct pagevec *pvec, int cold)
 {
 	pvec->nr = 0;
 	pvec->cold = cold;
+	pvec->drained = false;
 }
 
 static inline void pagevec_reinit(struct pagevec *pvec)
diff --git a/mm/swap.c b/mm/swap.c
index a77d68f2c1b6..31bd9d8a5db7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -833,7 +833,10 @@ EXPORT_SYMBOL(release_pages);
  */
 void __pagevec_release(struct pagevec *pvec)
 {
-	lru_add_drain();
+	if (!pvec->drained) {
+		lru_add_drain();
+		pvec->drained = true;
+	}
 	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
 	pagevec_reinit(pvec);
 }
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 5/8] mm, pagevec: Remove cold parameter for pagevecs
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Every pagevec_init user claims the pages being released are false even in
cases where it is unlikely the pages are hot. As no one cares about the
hotness of pages being released to the allocator, just ditch the parameter.

No performance impact is expected as the overhead is marginal. The parameter
is removed simply because it is a bit stupid to have a useless parameter
copied everywhere.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 fs/afs/write.c                      | 4 ++--
 fs/btrfs/extent_io.c                | 4 ++--
 fs/buffer.c                         | 4 ++--
 fs/cachefiles/rdwr.c                | 4 ++--
 fs/ceph/addr.c                      | 4 ++--
 fs/dax.c                            | 2 +-
 fs/ext4/file.c                      | 2 +-
 fs/ext4/inode.c                     | 6 +++---
 fs/f2fs/checkpoint.c                | 2 +-
 fs/f2fs/data.c                      | 2 +-
 fs/f2fs/file.c                      | 2 +-
 fs/f2fs/node.c                      | 8 ++++----
 fs/fscache/page.c                   | 2 +-
 fs/gfs2/aops.c                      | 2 +-
 fs/hugetlbfs/inode.c                | 2 +-
 fs/nilfs2/btree.c                   | 2 +-
 fs/nilfs2/page.c                    | 8 ++++----
 fs/nilfs2/segment.c                 | 4 ++--
 include/linux/pagevec.h             | 4 +---
 mm/filemap.c                        | 2 +-
 mm/mlock.c                          | 4 ++--
 mm/page-writeback.c                 | 2 +-
 mm/shmem.c                          | 6 +++---
 mm/swap.c                           | 4 ++--
 mm/truncate.c                       | 6 +++---
 26 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e2410eb5d96e..b0a39578ccd1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1866,7 +1866,7 @@ static void i915_address_space_init(struct i915_address_space *vm,
 	INIT_LIST_HEAD(&vm->unbound_list);
 
 	list_add_tail(&vm->global_link, &dev_priv->vm_list);
-	pagevec_init(&vm->free_pages, false);
+	pagevec_init(&vm->free_pages);
 }
 
 static void i915_address_space_fini(struct i915_address_space *vm)
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 106e43db1115..8766326c59fe 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -308,7 +308,7 @@ static void afs_kill_pages(struct afs_vnode *vnode, bool error,
 	_enter("{%x:%u},%lx-%lx",
 	       vnode->fid.vid, vnode->fid.vnode, first, last);
 
-	pagevec_init(&pv, 0);
+	pagevec_init(&pv);
 
 	do {
 		_debug("kill %lx-%lx", first, last);
@@ -609,7 +609,7 @@ void afs_pages_written_back(struct afs_vnode *vnode, struct afs_call *call)
 
 	ASSERT(wb != NULL);
 
-	pagevec_init(&pv, 0);
+	pagevec_init(&pv);
 
 	do {
 		_debug("done %lx-%lx", first, last);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 970190cd347e..381acbda895c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3801,7 +3801,7 @@ int btree_write_cache_pages(struct address_space *mapping,
 	int scanned = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
@@ -3945,7 +3945,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 	if (!igrab(inode))
 		return 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
diff --git a/fs/buffer.c b/fs/buffer.c
index 170df856bdb9..de2ae6b15dd5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1632,7 +1632,7 @@ void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
 	struct buffer_head *head;
 
 	end = (block + len - 1) >> (PAGE_SHIFT - bd_inode->i_blkbits);
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (pagevec_lookup_range(&pvec, bd_mapping, &index, end)) {
 		count = pagevec_count(&pvec);
 		for (i = 0; i < count; i++) {
@@ -3545,7 +3545,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
 	if (length <= 0)
 		return -ENOENT;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	do {
 		unsigned nr_pages, i;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 18d7aa61ef0f..23097cca2674 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -710,7 +710,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
 	/* calculate the shift required to use bmap */
 	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
 
-	pagevec_init(&pagevec, 0);
+	pagevec_init(&pagevec);
 
 	op->op.flags &= FSCACHE_OP_KEEP_FLAGS;
 	op->op.flags |= FSCACHE_OP_ASYNC;
@@ -844,7 +844,7 @@ int cachefiles_allocate_pages(struct fscache_retrieval *op,
 
 	ret = cachefiles_has_space(cache, 0, *nr_pages);
 	if (ret == 0) {
-		pagevec_init(&pagevec, 0);
+		pagevec_init(&pagevec);
 
 		list_for_each_entry(page, pages, lru) {
 			if (pagevec_add(&pagevec, page) == 0)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index b3e3edc09d80..03afcea820e2 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -679,7 +679,7 @@ static void ceph_release_pages(struct page **pages, int num)
 	struct pagevec pvec;
 	int i;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	for (i = 0; i < num; i++) {
 		if (pagevec_add(&pvec, pages[i]) == 0)
 			pagevec_release(&pvec);
@@ -810,7 +810,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 	if (fsc->mount_options->wsize < wsize)
 		wsize = fsc->mount_options->wsize;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	start_index = wbc->range_cyclic ? mapping->writeback_index : 0;
 	index = start_index;
diff --git a/fs/dax.c b/fs/dax.c
index 3318ae9046e6..a14367639134 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -789,7 +789,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 
 	tag_pages_for_writeback(mapping, start_index, end_index);
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (!done) {
 		pvec.nr = find_get_entries_tag(mapping, start_index,
 				PAGECACHE_TAG_TOWRITE, PAGEVEC_SIZE,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index b1da660ac3bc..155ba5895efb 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -475,7 +475,7 @@ static int ext4_find_unwritten_pgoff(struct inode *inode,
 	index = startoff >> PAGE_SHIFT;
 	end = (endoff - 1) >> PAGE_SHIFT;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	do {
 		int i;
 		unsigned long nr_pages;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 31db875bc7a1..b15d227df6ae 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1718,7 +1718,7 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd,
 		ext4_es_remove_extent(inode, start, last - start + 1);
 	}
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (index <= end) {
 		nr_pages = pagevec_lookup_range(&pvec, mapping, &index, end);
 		if (nr_pages == 0)
@@ -2344,7 +2344,7 @@ static int mpage_map_and_submit_buffers(struct mpage_da_data *mpd)
 	lblk = start << bpp_bits;
 	pblock = mpd->map.m_pblk;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (start <= end) {
 		nr_pages = pagevec_lookup_range(&pvec, inode->i_mapping,
 						&start, end);
@@ -2615,7 +2615,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	else
 		tag = PAGECACHE_TAG_DIRTY;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	mpd->map.m_len = 0;
 	mpd->next_page = index;
 	while (index <= end) {
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 04fe1df052b2..6bd32481f9aa 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -313,7 +313,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
 	};
 	struct blk_plug plug;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	blk_start_plug(&plug);
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 36b535207c88..83503e0ab287 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1635,7 +1635,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
 	int range_whole = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	if (get_dirty_pages(mapping->host) <=
 				SM_I(F2FS_M_SB(mapping))->min_hot_blocks)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 517e112c8a9a..fbf675c766d4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -320,7 +320,7 @@ static pgoff_t __get_first_dirty_index(struct address_space *mapping,
 		return 0;
 
 	/* find first dirty page index */
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	nr_pages = pagevec_lookup_tag(&pvec, mapping, &pgofs,
 					PAGECACHE_TAG_DIRTY, 1);
 	pgofs = nr_pages ? pvec.pages[0]->index : ULONG_MAX;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index fca87835a1da..fb512154a708 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1281,7 +1281,7 @@ static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino)
 	struct pagevec pvec;
 	struct page *last_page = NULL;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = 0;
 	end = ULONG_MAX;
 
@@ -1439,7 +1439,7 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
 			return PTR_ERR_OR_ZERO(last_page);
 	}
 retry:
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = 0;
 	end = ULONG_MAX;
 
@@ -1554,7 +1554,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc,
 	int nwritten = 0;
 	int ret = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 next_step:
 	index = 0;
@@ -1659,7 +1659,7 @@ int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
 	struct pagevec pvec;
 	int ret2, ret = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (index <= end) {
 		int i, nr_pages;
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 0ad3fd3ad0b4..961029e04027 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -1175,7 +1175,7 @@ void __fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
 		return;
 	}
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	next = 0;
 	do {
 		if (!pagevec_lookup(&pvec, mapping, &next))
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 68ed06962537..84310209827b 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -387,7 +387,7 @@ static int gfs2_write_cache_jdata(struct address_space *mapping,
 	int range_whole = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		writeback_index = mapping->writeback_index; /* prev offset */
 		index = writeback_index;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 59073e9f01a4..2c87306799c4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -407,7 +407,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 
 	memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
 	pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	next = start;
 	while (next < end) {
 		/*
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index 06ffa135dfa6..5e90c5bd91d9 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -2156,7 +2156,7 @@ static void nilfs_btree_lookup_dirty_buffers(struct nilfs_bmap *btree,
 	     level++)
 		INIT_LIST_HEAD(&lists[level]);
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (pagevec_lookup_tag(&pvec, btcache, &index, PAGECACHE_TAG_DIRTY,
 				  PAGEVEC_SIZE)) {
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 8616c46d33da..60ee9d2b901d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -255,7 +255,7 @@ int nilfs_copy_dirty_pages(struct address_space *dmap,
 	pgoff_t index = 0;
 	int err = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 repeat:
 	if (!pagevec_lookup_tag(&pvec, smap, &index, PAGECACHE_TAG_DIRTY,
 				PAGEVEC_SIZE))
@@ -310,7 +310,7 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 	pgoff_t index = 0;
 	int err;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 repeat:
 	n = pagevec_lookup(&pvec, smap, &index);
 	if (!n)
@@ -374,7 +374,7 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
 	unsigned int i;
 	pgoff_t index = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
 				  PAGEVEC_SIZE)) {
@@ -519,7 +519,7 @@ unsigned long nilfs_find_uncommitted_extent(struct inode *inode,
 	index = start_blk >> (PAGE_SHIFT - inode->i_blkbits);
 	nblocks_in_page = 1U << (PAGE_SHIFT - inode->i_blkbits);
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 repeat:
 	pvec.nr = find_get_pages_contig(inode->i_mapping, index, PAGEVEC_SIZE,
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 70ded52dc1dd..0d56c4ea7216 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -708,7 +708,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode,
 		index = start >> PAGE_SHIFT;
 		last = end >> PAGE_SHIFT;
 	}
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
  repeat:
 	if (unlikely(index > last) ||
 	    !pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
@@ -757,7 +757,7 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode,
 	unsigned int i;
 	pgoff_t index = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
 				  PAGEVEC_SIZE)) {
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 4231979be982..11fdbe539aa6 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,6 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	bool cold;
 	bool drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
@@ -42,10 +41,9 @@ unsigned pagevec_lookup_tag(struct pagevec *pvec,
 		struct address_space *mapping, pgoff_t *index, int tag,
 		unsigned nr_pages);
 
-static inline void pagevec_init(struct pagevec *pvec, int cold)
+static inline void pagevec_init(struct pagevec *pvec)
 {
 	pvec->nr = 0;
-	pvec->cold = cold;
 	pvec->drained = false;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index e59580feefd9..aaaa324a283b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -520,7 +520,7 @@ static void __filemap_fdatawait_range(struct address_space *mapping,
 	if (end_byte < start_byte)
 		return;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while ((index <= end) &&
 			(nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
 			PAGECACHE_TAG_WRITEBACK,
diff --git a/mm/mlock.c b/mm/mlock.c
index dfc6f1912176..936d39ad4c04 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -288,7 +288,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 	struct pagevec pvec_putback;
 	int pgrescued = 0;
 
-	pagevec_init(&pvec_putback, 0);
+	pagevec_init(&pvec_putback);
 
 	/* Phase 1: page isolation */
 	spin_lock_irq(zone_lru_lock(zone));
@@ -447,7 +447,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
 		struct pagevec pvec;
 		struct zone *zone;
 
-		pagevec_init(&pvec, 0);
+		pagevec_init(&pvec);
 		/*
 		 * Although FOLL_DUMP is intended for get_dump_page(),
 		 * it just so happens that its special treatment of the
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c3bed3f5cd24..21923ad7317b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2167,7 +2167,7 @@ int write_cache_pages(struct address_space *mapping,
 	int range_whole = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		writeback_index = mapping->writeback_index; /* prev offset */
 		index = writeback_index;
diff --git a/mm/shmem.c b/mm/shmem.c
index a72f68aee6a4..7ea8b276ba8b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -747,7 +747,7 @@ void shmem_unlock_mapping(struct address_space *mapping)
 	pgoff_t indices[PAGEVEC_SIZE];
 	pgoff_t index = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	/*
 	 * Minor point, but we might as well stop if someone else SHM_LOCKs it.
 	 */
@@ -790,7 +790,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	if (lend == -1)
 		end = -1;	/* unsigned, so actually very big */
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = start;
 	while (index < end) {
 		pvec.nr = find_get_entries(mapping, index,
@@ -2528,7 +2528,7 @@ static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
 	bool done = false;
 	int i;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	pvec.nr = 1;		/* start small: we may be there already */
 	while (!done) {
 		pvec.nr = find_get_entries(mapping, index,
diff --git a/mm/swap.c b/mm/swap.c
index 31bd9d8a5db7..73682e1dc0a2 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -210,7 +210,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 	}
 	if (pgdat)
 		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
-	release_pages(pvec->pages, pvec->nr, pvec->cold);
+	release_pages(pvec->pages, pvec->nr, 0);
 	pagevec_reinit(pvec);
 }
 
@@ -837,7 +837,7 @@ void __pagevec_release(struct pagevec *pvec)
 		lru_add_drain();
 		pvec->drained = true;
 	}
-	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
+	release_pages(pvec->pages, pagevec_count(pvec), 0);
 	pagevec_reinit(pvec);
 }
 EXPORT_SYMBOL(__pagevec_release);
diff --git a/mm/truncate.c b/mm/truncate.c
index d1913888edf4..dcce4ed63430 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -322,7 +322,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	else
 		end = (lend + 1) >> PAGE_SHIFT;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = start;
 	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE),
@@ -554,7 +554,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 	unsigned long count = 0;
 	int i;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
 			indices)) {
@@ -684,7 +684,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
 		goto out;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = start;
 	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 5/8] mm, pagevec: Remove cold parameter for pagevecs
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Every pagevec_init user claims the pages being released are false even in
cases where it is unlikely the pages are hot. As no one cares about the
hotness of pages being released to the allocator, just ditch the parameter.

No performance impact is expected as the overhead is marginal. The parameter
is removed simply because it is a bit stupid to have a useless parameter
copied everywhere.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 fs/afs/write.c                      | 4 ++--
 fs/btrfs/extent_io.c                | 4 ++--
 fs/buffer.c                         | 4 ++--
 fs/cachefiles/rdwr.c                | 4 ++--
 fs/ceph/addr.c                      | 4 ++--
 fs/dax.c                            | 2 +-
 fs/ext4/file.c                      | 2 +-
 fs/ext4/inode.c                     | 6 +++---
 fs/f2fs/checkpoint.c                | 2 +-
 fs/f2fs/data.c                      | 2 +-
 fs/f2fs/file.c                      | 2 +-
 fs/f2fs/node.c                      | 8 ++++----
 fs/fscache/page.c                   | 2 +-
 fs/gfs2/aops.c                      | 2 +-
 fs/hugetlbfs/inode.c                | 2 +-
 fs/nilfs2/btree.c                   | 2 +-
 fs/nilfs2/page.c                    | 8 ++++----
 fs/nilfs2/segment.c                 | 4 ++--
 include/linux/pagevec.h             | 4 +---
 mm/filemap.c                        | 2 +-
 mm/mlock.c                          | 4 ++--
 mm/page-writeback.c                 | 2 +-
 mm/shmem.c                          | 6 +++---
 mm/swap.c                           | 4 ++--
 mm/truncate.c                       | 6 +++---
 26 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index e2410eb5d96e..b0a39578ccd1 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1866,7 +1866,7 @@ static void i915_address_space_init(struct i915_address_space *vm,
 	INIT_LIST_HEAD(&vm->unbound_list);
 
 	list_add_tail(&vm->global_link, &dev_priv->vm_list);
-	pagevec_init(&vm->free_pages, false);
+	pagevec_init(&vm->free_pages);
 }
 
 static void i915_address_space_fini(struct i915_address_space *vm)
diff --git a/fs/afs/write.c b/fs/afs/write.c
index 106e43db1115..8766326c59fe 100644
--- a/fs/afs/write.c
+++ b/fs/afs/write.c
@@ -308,7 +308,7 @@ static void afs_kill_pages(struct afs_vnode *vnode, bool error,
 	_enter("{%x:%u},%lx-%lx",
 	       vnode->fid.vid, vnode->fid.vnode, first, last);
 
-	pagevec_init(&pv, 0);
+	pagevec_init(&pv);
 
 	do {
 		_debug("kill %lx-%lx", first, last);
@@ -609,7 +609,7 @@ void afs_pages_written_back(struct afs_vnode *vnode, struct afs_call *call)
 
 	ASSERT(wb != NULL);
 
-	pagevec_init(&pv, 0);
+	pagevec_init(&pv);
 
 	do {
 		_debug("done %lx-%lx", first, last);
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 970190cd347e..381acbda895c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -3801,7 +3801,7 @@ int btree_write_cache_pages(struct address_space *mapping,
 	int scanned = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
@@ -3945,7 +3945,7 @@ static int extent_write_cache_pages(struct address_space *mapping,
 	if (!igrab(inode))
 		return 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		index = mapping->writeback_index; /* Start from prev offset */
 		end = -1;
diff --git a/fs/buffer.c b/fs/buffer.c
index 170df856bdb9..de2ae6b15dd5 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1632,7 +1632,7 @@ void clean_bdev_aliases(struct block_device *bdev, sector_t block, sector_t len)
 	struct buffer_head *head;
 
 	end = (block + len - 1) >> (PAGE_SHIFT - bd_inode->i_blkbits);
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (pagevec_lookup_range(&pvec, bd_mapping, &index, end)) {
 		count = pagevec_count(&pvec);
 		for (i = 0; i < count; i++) {
@@ -3545,7 +3545,7 @@ page_cache_seek_hole_data(struct inode *inode, loff_t offset, loff_t length,
 	if (length <= 0)
 		return -ENOENT;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	do {
 		unsigned nr_pages, i;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 18d7aa61ef0f..23097cca2674 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -710,7 +710,7 @@ int cachefiles_read_or_alloc_pages(struct fscache_retrieval *op,
 	/* calculate the shift required to use bmap */
 	shift = PAGE_SHIFT - inode->i_sb->s_blocksize_bits;
 
-	pagevec_init(&pagevec, 0);
+	pagevec_init(&pagevec);
 
 	op->op.flags &= FSCACHE_OP_KEEP_FLAGS;
 	op->op.flags |= FSCACHE_OP_ASYNC;
@@ -844,7 +844,7 @@ int cachefiles_allocate_pages(struct fscache_retrieval *op,
 
 	ret = cachefiles_has_space(cache, 0, *nr_pages);
 	if (ret == 0) {
-		pagevec_init(&pagevec, 0);
+		pagevec_init(&pagevec);
 
 		list_for_each_entry(page, pages, lru) {
 			if (pagevec_add(&pagevec, page) == 0)
diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
index b3e3edc09d80..03afcea820e2 100644
--- a/fs/ceph/addr.c
+++ b/fs/ceph/addr.c
@@ -679,7 +679,7 @@ static void ceph_release_pages(struct page **pages, int num)
 	struct pagevec pvec;
 	int i;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	for (i = 0; i < num; i++) {
 		if (pagevec_add(&pvec, pages[i]) == 0)
 			pagevec_release(&pvec);
@@ -810,7 +810,7 @@ static int ceph_writepages_start(struct address_space *mapping,
 	if (fsc->mount_options->wsize < wsize)
 		wsize = fsc->mount_options->wsize;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	start_index = wbc->range_cyclic ? mapping->writeback_index : 0;
 	index = start_index;
diff --git a/fs/dax.c b/fs/dax.c
index 3318ae9046e6..a14367639134 100644
--- a/fs/dax.c
+++ b/fs/dax.c
@@ -789,7 +789,7 @@ int dax_writeback_mapping_range(struct address_space *mapping,
 
 	tag_pages_for_writeback(mapping, start_index, end_index);
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (!done) {
 		pvec.nr = find_get_entries_tag(mapping, start_index,
 				PAGECACHE_TAG_TOWRITE, PAGEVEC_SIZE,
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index b1da660ac3bc..155ba5895efb 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -475,7 +475,7 @@ static int ext4_find_unwritten_pgoff(struct inode *inode,
 	index = startoff >> PAGE_SHIFT;
 	end = (endoff - 1) >> PAGE_SHIFT;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	do {
 		int i;
 		unsigned long nr_pages;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 31db875bc7a1..b15d227df6ae 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1718,7 +1718,7 @@ static void mpage_release_unused_pages(struct mpage_da_data *mpd,
 		ext4_es_remove_extent(inode, start, last - start + 1);
 	}
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (index <= end) {
 		nr_pages = pagevec_lookup_range(&pvec, mapping, &index, end);
 		if (nr_pages == 0)
@@ -2344,7 +2344,7 @@ static int mpage_map_and_submit_buffers(struct mpage_da_data *mpd)
 	lblk = start << bpp_bits;
 	pblock = mpd->map.m_pblk;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (start <= end) {
 		nr_pages = pagevec_lookup_range(&pvec, inode->i_mapping,
 						&start, end);
@@ -2615,7 +2615,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
 	else
 		tag = PAGECACHE_TAG_DIRTY;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	mpd->map.m_len = 0;
 	mpd->next_page = index;
 	while (index <= end) {
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 04fe1df052b2..6bd32481f9aa 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -313,7 +313,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type,
 	};
 	struct blk_plug plug;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	blk_start_plug(&plug);
 
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 36b535207c88..83503e0ab287 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -1635,7 +1635,7 @@ static int f2fs_write_cache_pages(struct address_space *mapping,
 	int range_whole = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	if (get_dirty_pages(mapping->host) <=
 				SM_I(F2FS_M_SB(mapping))->min_hot_blocks)
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 517e112c8a9a..fbf675c766d4 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -320,7 +320,7 @@ static pgoff_t __get_first_dirty_index(struct address_space *mapping,
 		return 0;
 
 	/* find first dirty page index */
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	nr_pages = pagevec_lookup_tag(&pvec, mapping, &pgofs,
 					PAGECACHE_TAG_DIRTY, 1);
 	pgofs = nr_pages ? pvec.pages[0]->index : ULONG_MAX;
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index fca87835a1da..fb512154a708 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1281,7 +1281,7 @@ static struct page *last_fsync_dnode(struct f2fs_sb_info *sbi, nid_t ino)
 	struct pagevec pvec;
 	struct page *last_page = NULL;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = 0;
 	end = ULONG_MAX;
 
@@ -1439,7 +1439,7 @@ int fsync_node_pages(struct f2fs_sb_info *sbi, struct inode *inode,
 			return PTR_ERR_OR_ZERO(last_page);
 	}
 retry:
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = 0;
 	end = ULONG_MAX;
 
@@ -1554,7 +1554,7 @@ int sync_node_pages(struct f2fs_sb_info *sbi, struct writeback_control *wbc,
 	int nwritten = 0;
 	int ret = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 next_step:
 	index = 0;
@@ -1659,7 +1659,7 @@ int wait_on_node_pages_writeback(struct f2fs_sb_info *sbi, nid_t ino)
 	struct pagevec pvec;
 	int ret2, ret = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (index <= end) {
 		int i, nr_pages;
diff --git a/fs/fscache/page.c b/fs/fscache/page.c
index 0ad3fd3ad0b4..961029e04027 100644
--- a/fs/fscache/page.c
+++ b/fs/fscache/page.c
@@ -1175,7 +1175,7 @@ void __fscache_uncache_all_inode_pages(struct fscache_cookie *cookie,
 		return;
 	}
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	next = 0;
 	do {
 		if (!pagevec_lookup(&pvec, mapping, &next))
diff --git a/fs/gfs2/aops.c b/fs/gfs2/aops.c
index 68ed06962537..84310209827b 100644
--- a/fs/gfs2/aops.c
+++ b/fs/gfs2/aops.c
@@ -387,7 +387,7 @@ static int gfs2_write_cache_jdata(struct address_space *mapping,
 	int range_whole = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		writeback_index = mapping->writeback_index; /* prev offset */
 		index = writeback_index;
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 59073e9f01a4..2c87306799c4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -407,7 +407,7 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart,
 
 	memset(&pseudo_vma, 0, sizeof(struct vm_area_struct));
 	pseudo_vma.vm_flags = (VM_HUGETLB | VM_MAYSHARE | VM_SHARED);
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	next = start;
 	while (next < end) {
 		/*
diff --git a/fs/nilfs2/btree.c b/fs/nilfs2/btree.c
index 06ffa135dfa6..5e90c5bd91d9 100644
--- a/fs/nilfs2/btree.c
+++ b/fs/nilfs2/btree.c
@@ -2156,7 +2156,7 @@ static void nilfs_btree_lookup_dirty_buffers(struct nilfs_bmap *btree,
 	     level++)
 		INIT_LIST_HEAD(&lists[level]);
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (pagevec_lookup_tag(&pvec, btcache, &index, PAGECACHE_TAG_DIRTY,
 				  PAGEVEC_SIZE)) {
diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c
index 8616c46d33da..60ee9d2b901d 100644
--- a/fs/nilfs2/page.c
+++ b/fs/nilfs2/page.c
@@ -255,7 +255,7 @@ int nilfs_copy_dirty_pages(struct address_space *dmap,
 	pgoff_t index = 0;
 	int err = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 repeat:
 	if (!pagevec_lookup_tag(&pvec, smap, &index, PAGECACHE_TAG_DIRTY,
 				PAGEVEC_SIZE))
@@ -310,7 +310,7 @@ void nilfs_copy_back_pages(struct address_space *dmap,
 	pgoff_t index = 0;
 	int err;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 repeat:
 	n = pagevec_lookup(&pvec, smap, &index);
 	if (!n)
@@ -374,7 +374,7 @@ void nilfs_clear_dirty_pages(struct address_space *mapping, bool silent)
 	unsigned int i;
 	pgoff_t index = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
 				  PAGEVEC_SIZE)) {
@@ -519,7 +519,7 @@ unsigned long nilfs_find_uncommitted_extent(struct inode *inode,
 	index = start_blk >> (PAGE_SHIFT - inode->i_blkbits);
 	nblocks_in_page = 1U << (PAGE_SHIFT - inode->i_blkbits);
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 repeat:
 	pvec.nr = find_get_pages_contig(inode->i_mapping, index, PAGEVEC_SIZE,
diff --git a/fs/nilfs2/segment.c b/fs/nilfs2/segment.c
index 70ded52dc1dd..0d56c4ea7216 100644
--- a/fs/nilfs2/segment.c
+++ b/fs/nilfs2/segment.c
@@ -708,7 +708,7 @@ static size_t nilfs_lookup_dirty_data_buffers(struct inode *inode,
 		index = start >> PAGE_SHIFT;
 		last = end >> PAGE_SHIFT;
 	}
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
  repeat:
 	if (unlikely(index > last) ||
 	    !pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
@@ -757,7 +757,7 @@ static void nilfs_lookup_dirty_node_buffers(struct inode *inode,
 	unsigned int i;
 	pgoff_t index = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 
 	while (pagevec_lookup_tag(&pvec, mapping, &index, PAGECACHE_TAG_DIRTY,
 				  PAGEVEC_SIZE)) {
diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 4231979be982..11fdbe539aa6 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,6 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	bool cold;
 	bool drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
@@ -42,10 +41,9 @@ unsigned pagevec_lookup_tag(struct pagevec *pvec,
 		struct address_space *mapping, pgoff_t *index, int tag,
 		unsigned nr_pages);
 
-static inline void pagevec_init(struct pagevec *pvec, int cold)
+static inline void pagevec_init(struct pagevec *pvec)
 {
 	pvec->nr = 0;
-	pvec->cold = cold;
 	pvec->drained = false;
 }
 
diff --git a/mm/filemap.c b/mm/filemap.c
index e59580feefd9..aaaa324a283b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -520,7 +520,7 @@ static void __filemap_fdatawait_range(struct address_space *mapping,
 	if (end_byte < start_byte)
 		return;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while ((index <= end) &&
 			(nr_pages = pagevec_lookup_tag(&pvec, mapping, &index,
 			PAGECACHE_TAG_WRITEBACK,
diff --git a/mm/mlock.c b/mm/mlock.c
index dfc6f1912176..936d39ad4c04 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -288,7 +288,7 @@ static void __munlock_pagevec(struct pagevec *pvec, struct zone *zone)
 	struct pagevec pvec_putback;
 	int pgrescued = 0;
 
-	pagevec_init(&pvec_putback, 0);
+	pagevec_init(&pvec_putback);
 
 	/* Phase 1: page isolation */
 	spin_lock_irq(zone_lru_lock(zone));
@@ -447,7 +447,7 @@ void munlock_vma_pages_range(struct vm_area_struct *vma,
 		struct pagevec pvec;
 		struct zone *zone;
 
-		pagevec_init(&pvec, 0);
+		pagevec_init(&pvec);
 		/*
 		 * Although FOLL_DUMP is intended for get_dump_page(),
 		 * it just so happens that its special treatment of the
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index c3bed3f5cd24..21923ad7317b 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2167,7 +2167,7 @@ int write_cache_pages(struct address_space *mapping,
 	int range_whole = 0;
 	int tag;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	if (wbc->range_cyclic) {
 		writeback_index = mapping->writeback_index; /* prev offset */
 		index = writeback_index;
diff --git a/mm/shmem.c b/mm/shmem.c
index a72f68aee6a4..7ea8b276ba8b 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -747,7 +747,7 @@ void shmem_unlock_mapping(struct address_space *mapping)
 	pgoff_t indices[PAGEVEC_SIZE];
 	pgoff_t index = 0;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	/*
 	 * Minor point, but we might as well stop if someone else SHM_LOCKs it.
 	 */
@@ -790,7 +790,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
 	if (lend == -1)
 		end = -1;	/* unsigned, so actually very big */
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = start;
 	while (index < end) {
 		pvec.nr = find_get_entries(mapping, index,
@@ -2528,7 +2528,7 @@ static pgoff_t shmem_seek_hole_data(struct address_space *mapping,
 	bool done = false;
 	int i;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	pvec.nr = 1;		/* start small: we may be there already */
 	while (!done) {
 		pvec.nr = find_get_entries(mapping, index,
diff --git a/mm/swap.c b/mm/swap.c
index 31bd9d8a5db7..73682e1dc0a2 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -210,7 +210,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 	}
 	if (pgdat)
 		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
-	release_pages(pvec->pages, pvec->nr, pvec->cold);
+	release_pages(pvec->pages, pvec->nr, 0);
 	pagevec_reinit(pvec);
 }
 
@@ -837,7 +837,7 @@ void __pagevec_release(struct pagevec *pvec)
 		lru_add_drain();
 		pvec->drained = true;
 	}
-	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
+	release_pages(pvec->pages, pagevec_count(pvec), 0);
 	pagevec_reinit(pvec);
 }
 EXPORT_SYMBOL(__pagevec_release);
diff --git a/mm/truncate.c b/mm/truncate.c
index d1913888edf4..dcce4ed63430 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -322,7 +322,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
 	else
 		end = (lend + 1) >> PAGE_SHIFT;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = start;
 	while (index < end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE),
@@ -554,7 +554,7 @@ unsigned long invalidate_mapping_pages(struct address_space *mapping,
 	unsigned long count = 0;
 	int i;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
 			indices)) {
@@ -684,7 +684,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
 	if (mapping->nrpages == 0 && mapping->nrexceptional == 0)
 		goto out;
 
-	pagevec_init(&pvec, 0);
+	pagevec_init(&pvec);
 	index = start;
 	while (index <= end && pagevec_lookup_entries(&pvec, mapping, index,
 			min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1,
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 6/8] mm: Remove cold parameter for release_pages
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

All callers of release_pages claim the pages being released are cache hot.
As no one cares about the hotness of pages being released to the allocator,
just ditch the parameter.

No performance impact is expected as the overhead is marginal. The parameter
is removed simply because it is a bit stupid to have a useless parameter
copied everywhere.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 6 ++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c   | 6 +++---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 4 ++--
 drivers/gpu/drm/radeon/radeon_ttm.c     | 2 +-
 fs/fuse/dev.c                           | 2 +-
 include/linux/pagemap.h                 | 2 +-
 include/linux/swap.h                    | 2 +-
 mm/swap.c                               | 8 ++++----
 mm/swap_state.c                         | 2 +-
 11 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 60d8bedb694d..cd664832f9e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -553,8 +553,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				 * invalidated it. Free it and try again
 				 */
 				release_pages(e->user_pages,
-					      e->robj->tbo.ttm->num_pages,
-					      false);
+					      e->robj->tbo.ttm->num_pages);
 				kvfree(e->user_pages);
 				e->user_pages = NULL;
 			}
@@ -691,8 +690,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				continue;
 
 			release_pages(e->user_pages,
-				      e->robj->tbo.ttm->num_pages,
-				      false);
+				      e->robj->tbo.ttm->num_pages);
 			kvfree(e->user_pages);
 		}
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 7171968f261e..8cd32598276e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -347,7 +347,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data,
 	return 0;
 
 free_pages:
-	release_pages(bo->tbo.ttm->pages, bo->tbo.ttm->num_pages, false);
+	release_pages(bo->tbo.ttm->pages, bo->tbo.ttm->num_pages);
 
 unlock_mmap_sem:
 	up_read(&current->mm->mmap_sem);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index bc746131987f..d792959fac43 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -659,7 +659,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
 	return 0;
 
 release_pages:
-	release_pages(pages, pinned, 0);
+	release_pages(pages, pinned);
 	return r;
 }
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 57881167ccd2..bcc8c2d7c7c9 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -779,7 +779,7 @@ static struct page **etnaviv_gem_userptr_do_get_pages(
 	up_read(&mm->mmap_sem);
 
 	if (ret < 0) {
-		release_pages(pvec, pinned, 0);
+		release_pages(pvec, pinned);
 		kvfree(pvec);
 		return ERR_PTR(ret);
 	}
@@ -852,7 +852,7 @@ static int etnaviv_gem_userptr_get_pages(struct etnaviv_gem_object *etnaviv_obj)
 		}
 	}
 
-	release_pages(pvec, pinned, 0);
+	release_pages(pvec, pinned);
 	kvfree(pvec);
 
 	work = kmalloc(sizeof(*work), GFP_KERNEL);
@@ -886,7 +886,7 @@ static void etnaviv_gem_userptr_release(struct etnaviv_gem_object *etnaviv_obj)
 	if (etnaviv_obj->pages) {
 		int npages = etnaviv_obj->base.size >> PAGE_SHIFT;
 
-		release_pages(etnaviv_obj->pages, npages, 0);
+		release_pages(etnaviv_obj->pages, npages);
 		kvfree(etnaviv_obj->pages);
 	}
 	put_task_struct(etnaviv_obj->userptr.task);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 709efe2357ea..aa22361bd5a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -554,7 +554,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 	}
 	mutex_unlock(&obj->mm.lock);
 
-	release_pages(pvec, pinned, 0);
+	release_pages(pvec, pinned);
 	kvfree(pvec);
 
 	i915_gem_object_put(obj);
@@ -668,7 +668,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 		__i915_gem_userptr_set_active(obj, true);
 
 	if (IS_ERR(pages))
-		release_pages(pvec, pinned, 0);
+		release_pages(pvec, pinned);
 	kvfree(pvec);
 
 	return pages;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index bf69bf9086bf..1fdfc7a46072 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -597,7 +597,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 	kfree(ttm->sg);
 
 release_pages:
-	release_pages(ttm->pages, pinned, 0);
+	release_pages(ttm->pages, pinned);
 	return r;
 }
 
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 13c65dd2d37d..6ea5bd9fc023 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1636,7 +1636,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
 
 static void fuse_retrieve_end(struct fuse_conn *fc, struct fuse_req *req)
 {
-	release_pages(req->pages, req->num_pages, false);
+	release_pages(req->pages, req->num_pages);
 }
 
 static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 333349585776..514f6d9d8083 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -115,7 +115,7 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 	m->gfp_mask = mask;
 }
 
-void release_pages(struct page **pages, int nr, bool cold);
+void release_pages(struct page **pages, int nr);
 
 /*
  * speculatively take a reference to a page.
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 257c48a525c8..31a89eb48e71 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -492,7 +492,7 @@ extern void exit_swap_address_space(unsigned int type);
 #define free_page_and_swap_cache(page) \
 	put_page(page)
 #define free_pages_and_swap_cache(pages, nr) \
-	release_pages((pages), (nr), false);
+	release_pages((pages), (nr));
 
 static inline void show_swap_cache_info(void)
 {
diff --git a/mm/swap.c b/mm/swap.c
index 73682e1dc0a2..0b78ea3cbdea 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -210,7 +210,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 	}
 	if (pgdat)
 		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
-	release_pages(pvec->pages, pvec->nr, 0);
+	release_pages(pvec->pages, pvec->nr);
 	pagevec_reinit(pvec);
 }
 
@@ -740,7 +740,7 @@ void lru_add_drain_all(void)
  * Decrement the reference count on all the pages in @pages.  If it
  * fell to zero, remove the page from the LRU and free it.
  */
-void release_pages(struct page **pages, int nr, bool cold)
+void release_pages(struct page **pages, int nr)
 {
 	int i;
 	LIST_HEAD(pages_to_free);
@@ -817,7 +817,7 @@ void release_pages(struct page **pages, int nr, bool cold)
 		spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
 
 	mem_cgroup_uncharge_list(&pages_to_free);
-	free_hot_cold_page_list(&pages_to_free, cold);
+	free_hot_cold_page_list(&pages_to_free, 0);
 }
 EXPORT_SYMBOL(release_pages);
 
@@ -837,7 +837,7 @@ void __pagevec_release(struct pagevec *pvec)
 		lru_add_drain();
 		pvec->drained = true;
 	}
-	release_pages(pvec->pages, pagevec_count(pvec), 0);
+	release_pages(pvec->pages, pagevec_count(pvec));
 	pagevec_reinit(pvec);
 }
 EXPORT_SYMBOL(__pagevec_release);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 05b6803f0cce..f7048f1c8e3e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -318,7 +318,7 @@ void free_pages_and_swap_cache(struct page **pages, int nr)
 	lru_add_drain();
 	for (i = 0; i < nr; i++)
 		free_swap_cache(pagep[i]);
-	release_pages(pagep, nr, false);
+	release_pages(pagep, nr);
 }
 
 /*
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 6/8] mm: Remove cold parameter for release_pages
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

All callers of release_pages claim the pages being released are cache hot.
As no one cares about the hotness of pages being released to the allocator,
just ditch the parameter.

No performance impact is expected as the overhead is marginal. The parameter
is removed simply because it is a bit stupid to have a useless parameter
copied everywhere.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 6 ++----
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c   | 6 +++---
 drivers/gpu/drm/i915/i915_gem_userptr.c | 4 ++--
 drivers/gpu/drm/radeon/radeon_ttm.c     | 2 +-
 fs/fuse/dev.c                           | 2 +-
 include/linux/pagemap.h                 | 2 +-
 include/linux/swap.h                    | 2 +-
 mm/swap.c                               | 8 ++++----
 mm/swap_state.c                         | 2 +-
 11 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 60d8bedb694d..cd664832f9e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -553,8 +553,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				 * invalidated it. Free it and try again
 				 */
 				release_pages(e->user_pages,
-					      e->robj->tbo.ttm->num_pages,
-					      false);
+					      e->robj->tbo.ttm->num_pages);
 				kvfree(e->user_pages);
 				e->user_pages = NULL;
 			}
@@ -691,8 +690,7 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
 				continue;
 
 			release_pages(e->user_pages,
-				      e->robj->tbo.ttm->num_pages,
-				      false);
+				      e->robj->tbo.ttm->num_pages);
 			kvfree(e->user_pages);
 		}
 	}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 7171968f261e..8cd32598276e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -347,7 +347,7 @@ int amdgpu_gem_userptr_ioctl(struct drm_device *dev, void *data,
 	return 0;
 
 free_pages:
-	release_pages(bo->tbo.ttm->pages, bo->tbo.ttm->num_pages, false);
+	release_pages(bo->tbo.ttm->pages, bo->tbo.ttm->num_pages);
 
 unlock_mmap_sem:
 	up_read(&current->mm->mmap_sem);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index bc746131987f..d792959fac43 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -659,7 +659,7 @@ int amdgpu_ttm_tt_get_user_pages(struct ttm_tt *ttm, struct page **pages)
 	return 0;
 
 release_pages:
-	release_pages(pages, pinned, 0);
+	release_pages(pages, pinned);
 	return r;
 }
 
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gem.c b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
index 57881167ccd2..bcc8c2d7c7c9 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gem.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gem.c
@@ -779,7 +779,7 @@ static struct page **etnaviv_gem_userptr_do_get_pages(
 	up_read(&mm->mmap_sem);
 
 	if (ret < 0) {
-		release_pages(pvec, pinned, 0);
+		release_pages(pvec, pinned);
 		kvfree(pvec);
 		return ERR_PTR(ret);
 	}
@@ -852,7 +852,7 @@ static int etnaviv_gem_userptr_get_pages(struct etnaviv_gem_object *etnaviv_obj)
 		}
 	}
 
-	release_pages(pvec, pinned, 0);
+	release_pages(pvec, pinned);
 	kvfree(pvec);
 
 	work = kmalloc(sizeof(*work), GFP_KERNEL);
@@ -886,7 +886,7 @@ static void etnaviv_gem_userptr_release(struct etnaviv_gem_object *etnaviv_obj)
 	if (etnaviv_obj->pages) {
 		int npages = etnaviv_obj->base.size >> PAGE_SHIFT;
 
-		release_pages(etnaviv_obj->pages, npages, 0);
+		release_pages(etnaviv_obj->pages, npages);
 		kvfree(etnaviv_obj->pages);
 	}
 	put_task_struct(etnaviv_obj->userptr.task);
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 709efe2357ea..aa22361bd5a1 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -554,7 +554,7 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
 	}
 	mutex_unlock(&obj->mm.lock);
 
-	release_pages(pvec, pinned, 0);
+	release_pages(pvec, pinned);
 	kvfree(pvec);
 
 	i915_gem_object_put(obj);
@@ -668,7 +668,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
 		__i915_gem_userptr_set_active(obj, true);
 
 	if (IS_ERR(pages))
-		release_pages(pvec, pinned, 0);
+		release_pages(pvec, pinned);
 	kvfree(pvec);
 
 	return pages;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index bf69bf9086bf..1fdfc7a46072 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -597,7 +597,7 @@ static int radeon_ttm_tt_pin_userptr(struct ttm_tt *ttm)
 	kfree(ttm->sg);
 
 release_pages:
-	release_pages(ttm->pages, pinned, 0);
+	release_pages(ttm->pages, pinned);
 	return r;
 }
 
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 13c65dd2d37d..6ea5bd9fc023 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1636,7 +1636,7 @@ static int fuse_notify_store(struct fuse_conn *fc, unsigned int size,
 
 static void fuse_retrieve_end(struct fuse_conn *fc, struct fuse_req *req)
 {
-	release_pages(req->pages, req->num_pages, false);
+	release_pages(req->pages, req->num_pages);
 }
 
 static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 333349585776..514f6d9d8083 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -115,7 +115,7 @@ static inline void mapping_set_gfp_mask(struct address_space *m, gfp_t mask)
 	m->gfp_mask = mask;
 }
 
-void release_pages(struct page **pages, int nr, bool cold);
+void release_pages(struct page **pages, int nr);
 
 /*
  * speculatively take a reference to a page.
diff --git a/include/linux/swap.h b/include/linux/swap.h
index 257c48a525c8..31a89eb48e71 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -492,7 +492,7 @@ extern void exit_swap_address_space(unsigned int type);
 #define free_page_and_swap_cache(page) \
 	put_page(page)
 #define free_pages_and_swap_cache(pages, nr) \
-	release_pages((pages), (nr), false);
+	release_pages((pages), (nr));
 
 static inline void show_swap_cache_info(void)
 {
diff --git a/mm/swap.c b/mm/swap.c
index 73682e1dc0a2..0b78ea3cbdea 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -210,7 +210,7 @@ static void pagevec_lru_move_fn(struct pagevec *pvec,
 	}
 	if (pgdat)
 		spin_unlock_irqrestore(&pgdat->lru_lock, flags);
-	release_pages(pvec->pages, pvec->nr, 0);
+	release_pages(pvec->pages, pvec->nr);
 	pagevec_reinit(pvec);
 }
 
@@ -740,7 +740,7 @@ void lru_add_drain_all(void)
  * Decrement the reference count on all the pages in @pages.  If it
  * fell to zero, remove the page from the LRU and free it.
  */
-void release_pages(struct page **pages, int nr, bool cold)
+void release_pages(struct page **pages, int nr)
 {
 	int i;
 	LIST_HEAD(pages_to_free);
@@ -817,7 +817,7 @@ void release_pages(struct page **pages, int nr, bool cold)
 		spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
 
 	mem_cgroup_uncharge_list(&pages_to_free);
-	free_hot_cold_page_list(&pages_to_free, cold);
+	free_hot_cold_page_list(&pages_to_free, 0);
 }
 EXPORT_SYMBOL(release_pages);
 
@@ -837,7 +837,7 @@ void __pagevec_release(struct pagevec *pvec)
 		lru_add_drain();
 		pvec->drained = true;
 	}
-	release_pages(pvec->pages, pagevec_count(pvec), 0);
+	release_pages(pvec->pages, pagevec_count(pvec));
 	pagevec_reinit(pvec);
 }
 EXPORT_SYMBOL(__pagevec_release);
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 05b6803f0cce..f7048f1c8e3e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -318,7 +318,7 @@ void free_pages_and_swap_cache(struct page **pages, int nr)
 	lru_add_drain();
 	for (i = 0; i < nr; i++)
 		free_swap_cache(pagep[i]);
-	release_pages(pagep, nr, false);
+	release_pages(pagep, nr);
 }
 
 /*
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page*
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Most callers users of free_hot_cold_page claim the pages being released are
cache hot. The exception is the page reclaim paths where it is likely that
enough pages will be freed in the near future that the per-cpu lists are
going to be recycled and the cache hotness information is lost. As no one
really cares about the hotness of pages being released to the allocator,
just ditch the parameter.

The APIs are renamed to indicate that it's no longer about hot/cold pages. It
should also be less confusing as there are subtle differences between them.
__free_pages drops a reference and frees a page when the refcount reaches
zero. free_hot_cold_page handled pages whose refcount was already zero
which is non-obvious from the name. free_unref_page should be more obvious.

No performance impact is expected as the overhead is marginal. The parameter
is removed simply because it is a bit stupid to have a useless parameter
copied everywhere.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 arch/powerpc/mm/mmu_context_book3s64.c |  2 +-
 arch/powerpc/mm/pgtable_64.c           |  2 +-
 arch/sparc/mm/init_64.c                |  2 +-
 arch/tile/mm/homecache.c               |  2 +-
 include/linux/gfp.h                    |  4 ++--
 include/trace/events/kmem.h            | 11 ++++-------
 mm/page_alloc.c                        | 29 ++++++++++++-----------------
 mm/rmap.c                              |  2 +-
 mm/swap.c                              |  4 ++--
 mm/vmscan.c                            |  6 +++---
 10 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 05e15386d4cb..a7e998158f37 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -200,7 +200,7 @@ static void destroy_pagetable_page(struct mm_struct *mm)
 	/* We allow PTE_FRAG_NR fragments from a PTE page */
 	if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
 		pgtable_page_dtor(page);
-		free_hot_cold_page(page, 0);
+		free_unref_page(page);
 	}
 }
 
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index ac0717a90ca6..1ec3aee43624 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -404,7 +404,7 @@ void pte_fragment_free(unsigned long *table, int kernel)
 	if (put_page_testzero(page)) {
 		if (!kernel)
 			pgtable_page_dtor(page);
-		free_hot_cold_page(page, 0);
+		free_unref_page(page);
 	}
 }
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index b2ba410b26f4..42f535b6935a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2942,7 +2942,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm,
 	if (!page)
 		return NULL;
 	if (!pgtable_page_ctor(page)) {
-		free_hot_cold_page(page, 0);
+		free_unref_page(page);
 		return NULL;
 	}
 	return (pte_t *) page_address(page);
diff --git a/arch/tile/mm/homecache.c b/arch/tile/mm/homecache.c
index b51cc28acd0a..4432f31e8479 100644
--- a/arch/tile/mm/homecache.c
+++ b/arch/tile/mm/homecache.c
@@ -409,7 +409,7 @@ void __homecache_free_pages(struct page *page, unsigned int order)
 	if (put_page_testzero(page)) {
 		homecache_change_page_home(page, order, PAGE_HOME_HASH);
 		if (order == 0) {
-			free_hot_cold_page(page, false);
+			free_unref_page(page);
 		} else {
 			init_page_count(page);
 			__free_pages(page, order);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f780718b7391..cc0cdbaa1b24 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -538,8 +538,8 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
-extern void free_hot_cold_page(struct page *page, bool cold);
-extern void free_hot_cold_page_list(struct list_head *list, bool cold);
+extern void free_unref_page(struct page *page);
+extern void free_unref_page_list(struct list_head *list);
 
 struct page_frag_cache;
 extern void __page_frag_cache_drain(struct page *page, unsigned int count);
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 6b2e154fd23a..8e87ee321c33 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -171,24 +171,21 @@ TRACE_EVENT(mm_page_free,
 
 TRACE_EVENT(mm_page_free_batched,
 
-	TP_PROTO(struct page *page, int cold),
+	TP_PROTO(struct page *page),
 
-	TP_ARGS(page, cold),
+	TP_ARGS(page),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	pfn		)
-		__field(	int,		cold		)
 	),
 
 	TP_fast_assign(
 		__entry->pfn		= page_to_pfn(page);
-		__entry->cold		= cold;
 	),
 
-	TP_printk("page=%p pfn=%lu order=0 cold=%d",
+	TP_printk("page=%p pfn=%lu order=0",
 			pfn_to_page(__entry->pfn),
-			__entry->pfn,
-			__entry->cold)
+			__entry->pfn)
 );
 
 TRACE_EVENT(mm_page_alloc,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 167e163cf733..13582efc57a0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2590,7 +2590,7 @@ void mark_free_pages(struct zone *zone)
 }
 #endif /* CONFIG_PM */
 
-static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
+static bool free_unref_page_prepare(struct page *page, unsigned long pfn)
 {
 	int migratetype;
 
@@ -2602,8 +2602,7 @@ static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
 	return true;
 }
 
-static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
-				bool cold)
+static void free_unref_page_commit(struct page *page, unsigned long pfn)
 {
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
@@ -2628,10 +2627,7 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
 	}
 
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
-	if (!cold)
-		list_add(&page->lru, &pcp->lists[migratetype]);
-	else
-		list_add_tail(&page->lru, &pcp->lists[migratetype]);
+	list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
 		unsigned long batch = READ_ONCE(pcp->batch);
@@ -2642,25 +2638,24 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
 
 /*
  * Free a 0-order page
- * cold == true ? free a cold page : free a hot page
  */
-void free_hot_cold_page(struct page *page, bool cold)
+void free_unref_page(struct page *page)
 {
 	unsigned long flags;
 	unsigned long pfn = page_to_pfn(page);
 
-	if (!free_hot_cold_page_prepare(page, pfn))
+	if (!free_unref_page_prepare(page, pfn))
 		return;
 
 	local_irq_save(flags);
-	free_hot_cold_page_commit(page, pfn, cold);
+	free_unref_page_commit(page, pfn);
 	local_irq_restore(flags);
 }
 
 /*
  * Free a list of 0-order pages
  */
-void free_hot_cold_page_list(struct list_head *list, bool cold)
+void free_unref_page_list(struct list_head *list)
 {
 	struct page *page, *next;
 	unsigned long flags, pfn;
@@ -2668,7 +2663,7 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
 		pfn = page_to_pfn(page);
-		if (!free_hot_cold_page_prepare(page, pfn))
+		if (!free_unref_page_prepare(page, pfn))
 			list_del(&page->lru);
 		page->private = pfn;
 	}
@@ -2678,8 +2673,8 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
 		unsigned long pfn = page->private;
 
 		page->private = 0;
-		trace_mm_page_free_batched(page, cold);
-		free_hot_cold_page_commit(page, pfn, cold);
+		trace_mm_page_free_batched(page);
+		free_unref_page_commit(page, pfn);
 	}
 	local_irq_restore(flags);
 }
@@ -4292,7 +4287,7 @@ void __free_pages(struct page *page, unsigned int order)
 {
 	if (put_page_testzero(page)) {
 		if (order == 0)
-			free_hot_cold_page(page, false);
+			free_unref_page(page);
 		else
 			__free_pages_ok(page, order);
 	}
@@ -4350,7 +4345,7 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
 		unsigned int order = compound_order(page);
 
 		if (order == 0)
-			free_hot_cold_page(page, false);
+			free_unref_page(page);
 		else
 			__free_pages_ok(page, order);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index b874c4761e84..8f3889553209 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1318,7 +1318,7 @@ void page_remove_rmap(struct page *page, bool compound)
 	 * It would be tidy to reset the PageAnon mapping here,
 	 * but that might overwrite a racing page_add_anon_rmap
 	 * which increments mapcount after us but sets mapping
-	 * before us: so leave the reset to free_hot_cold_page,
+	 * before us: so leave the reset to free_unref_page,
 	 * and remember that it's only reliable while mapped.
 	 * Leaving it set also helps swapoff to reinstate ptes
 	 * faster for those pages still in swapcache.
diff --git a/mm/swap.c b/mm/swap.c
index 0b78ea3cbdea..beaf11ff67d5 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -76,7 +76,7 @@ static void __page_cache_release(struct page *page)
 static void __put_single_page(struct page *page)
 {
 	__page_cache_release(page);
-	free_hot_cold_page(page, false);
+	free_unref_page(page);
 }
 
 static void __put_compound_page(struct page *page)
@@ -817,7 +817,7 @@ void release_pages(struct page **pages, int nr)
 		spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
 
 	mem_cgroup_uncharge_list(&pages_to_free);
-	free_hot_cold_page_list(&pages_to_free, 0);
+	free_unref_page_list(&pages_to_free);
 }
 EXPORT_SYMBOL(release_pages);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 13d711dd8776..e574e6266ab8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1348,7 +1348,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 	mem_cgroup_uncharge_list(&free_pages);
 	try_to_unmap_flush();
-	free_hot_cold_page_list(&free_pages, true);
+	free_unref_page_list(&free_pages);
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
@@ -1823,7 +1823,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&page_list);
-	free_hot_cold_page_list(&page_list, true);
+	free_unref_page_list(&page_list);
 
 	/*
 	 * If reclaim is isolating dirty pages under writeback, it implies
@@ -2062,7 +2062,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&l_hold);
-	free_hot_cold_page_list(&l_hold, true);
+	free_unref_page_list(&l_hold);
 	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate,
 			nr_deactivate, nr_rotated, sc->priority, file);
 }
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page*
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

Most callers users of free_hot_cold_page claim the pages being released are
cache hot. The exception is the page reclaim paths where it is likely that
enough pages will be freed in the near future that the per-cpu lists are
going to be recycled and the cache hotness information is lost. As no one
really cares about the hotness of pages being released to the allocator,
just ditch the parameter.

The APIs are renamed to indicate that it's no longer about hot/cold pages. It
should also be less confusing as there are subtle differences between them.
__free_pages drops a reference and frees a page when the refcount reaches
zero. free_hot_cold_page handled pages whose refcount was already zero
which is non-obvious from the name. free_unref_page should be more obvious.

No performance impact is expected as the overhead is marginal. The parameter
is removed simply because it is a bit stupid to have a useless parameter
copied everywhere.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 arch/powerpc/mm/mmu_context_book3s64.c |  2 +-
 arch/powerpc/mm/pgtable_64.c           |  2 +-
 arch/sparc/mm/init_64.c                |  2 +-
 arch/tile/mm/homecache.c               |  2 +-
 include/linux/gfp.h                    |  4 ++--
 include/trace/events/kmem.h            | 11 ++++-------
 mm/page_alloc.c                        | 29 ++++++++++++-----------------
 mm/rmap.c                              |  2 +-
 mm/swap.c                              |  4 ++--
 mm/vmscan.c                            |  6 +++---
 10 files changed, 28 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
index 05e15386d4cb..a7e998158f37 100644
--- a/arch/powerpc/mm/mmu_context_book3s64.c
+++ b/arch/powerpc/mm/mmu_context_book3s64.c
@@ -200,7 +200,7 @@ static void destroy_pagetable_page(struct mm_struct *mm)
 	/* We allow PTE_FRAG_NR fragments from a PTE page */
 	if (page_ref_sub_and_test(page, PTE_FRAG_NR - count)) {
 		pgtable_page_dtor(page);
-		free_hot_cold_page(page, 0);
+		free_unref_page(page);
 	}
 }
 
diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index ac0717a90ca6..1ec3aee43624 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -404,7 +404,7 @@ void pte_fragment_free(unsigned long *table, int kernel)
 	if (put_page_testzero(page)) {
 		if (!kernel)
 			pgtable_page_dtor(page);
-		free_hot_cold_page(page, 0);
+		free_unref_page(page);
 	}
 }
 
diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index b2ba410b26f4..42f535b6935a 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2942,7 +2942,7 @@ pgtable_t pte_alloc_one(struct mm_struct *mm,
 	if (!page)
 		return NULL;
 	if (!pgtable_page_ctor(page)) {
-		free_hot_cold_page(page, 0);
+		free_unref_page(page);
 		return NULL;
 	}
 	return (pte_t *) page_address(page);
diff --git a/arch/tile/mm/homecache.c b/arch/tile/mm/homecache.c
index b51cc28acd0a..4432f31e8479 100644
--- a/arch/tile/mm/homecache.c
+++ b/arch/tile/mm/homecache.c
@@ -409,7 +409,7 @@ void __homecache_free_pages(struct page *page, unsigned int order)
 	if (put_page_testzero(page)) {
 		homecache_change_page_home(page, order, PAGE_HOME_HASH);
 		if (order == 0) {
-			free_hot_cold_page(page, false);
+			free_unref_page(page);
 		} else {
 			init_page_count(page);
 			__free_pages(page, order);
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f780718b7391..cc0cdbaa1b24 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -538,8 +538,8 @@ void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask);
 
 extern void __free_pages(struct page *page, unsigned int order);
 extern void free_pages(unsigned long addr, unsigned int order);
-extern void free_hot_cold_page(struct page *page, bool cold);
-extern void free_hot_cold_page_list(struct list_head *list, bool cold);
+extern void free_unref_page(struct page *page);
+extern void free_unref_page_list(struct list_head *list);
 
 struct page_frag_cache;
 extern void __page_frag_cache_drain(struct page *page, unsigned int count);
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 6b2e154fd23a..8e87ee321c33 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -171,24 +171,21 @@ TRACE_EVENT(mm_page_free,
 
 TRACE_EVENT(mm_page_free_batched,
 
-	TP_PROTO(struct page *page, int cold),
+	TP_PROTO(struct page *page),
 
-	TP_ARGS(page, cold),
+	TP_ARGS(page),
 
 	TP_STRUCT__entry(
 		__field(	unsigned long,	pfn		)
-		__field(	int,		cold		)
 	),
 
 	TP_fast_assign(
 		__entry->pfn		= page_to_pfn(page);
-		__entry->cold		= cold;
 	),
 
-	TP_printk("page=%p pfn=%lu order=0 cold=%d",
+	TP_printk("page=%p pfn=%lu order=0",
 			pfn_to_page(__entry->pfn),
-			__entry->pfn,
-			__entry->cold)
+			__entry->pfn)
 );
 
 TRACE_EVENT(mm_page_alloc,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 167e163cf733..13582efc57a0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2590,7 +2590,7 @@ void mark_free_pages(struct zone *zone)
 }
 #endif /* CONFIG_PM */
 
-static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
+static bool free_unref_page_prepare(struct page *page, unsigned long pfn)
 {
 	int migratetype;
 
@@ -2602,8 +2602,7 @@ static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
 	return true;
 }
 
-static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
-				bool cold)
+static void free_unref_page_commit(struct page *page, unsigned long pfn)
 {
 	struct zone *zone = page_zone(page);
 	struct per_cpu_pages *pcp;
@@ -2628,10 +2627,7 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
 	}
 
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
-	if (!cold)
-		list_add(&page->lru, &pcp->lists[migratetype]);
-	else
-		list_add_tail(&page->lru, &pcp->lists[migratetype]);
+	list_add_tail(&page->lru, &pcp->lists[migratetype]);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
 		unsigned long batch = READ_ONCE(pcp->batch);
@@ -2642,25 +2638,24 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
 
 /*
  * Free a 0-order page
- * cold == true ? free a cold page : free a hot page
  */
-void free_hot_cold_page(struct page *page, bool cold)
+void free_unref_page(struct page *page)
 {
 	unsigned long flags;
 	unsigned long pfn = page_to_pfn(page);
 
-	if (!free_hot_cold_page_prepare(page, pfn))
+	if (!free_unref_page_prepare(page, pfn))
 		return;
 
 	local_irq_save(flags);
-	free_hot_cold_page_commit(page, pfn, cold);
+	free_unref_page_commit(page, pfn);
 	local_irq_restore(flags);
 }
 
 /*
  * Free a list of 0-order pages
  */
-void free_hot_cold_page_list(struct list_head *list, bool cold)
+void free_unref_page_list(struct list_head *list)
 {
 	struct page *page, *next;
 	unsigned long flags, pfn;
@@ -2668,7 +2663,7 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
 	/* Prepare pages for freeing */
 	list_for_each_entry_safe(page, next, list, lru) {
 		pfn = page_to_pfn(page);
-		if (!free_hot_cold_page_prepare(page, pfn))
+		if (!free_unref_page_prepare(page, pfn))
 			list_del(&page->lru);
 		page->private = pfn;
 	}
@@ -2678,8 +2673,8 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
 		unsigned long pfn = page->private;
 
 		page->private = 0;
-		trace_mm_page_free_batched(page, cold);
-		free_hot_cold_page_commit(page, pfn, cold);
+		trace_mm_page_free_batched(page);
+		free_unref_page_commit(page, pfn);
 	}
 	local_irq_restore(flags);
 }
@@ -4292,7 +4287,7 @@ void __free_pages(struct page *page, unsigned int order)
 {
 	if (put_page_testzero(page)) {
 		if (order == 0)
-			free_hot_cold_page(page, false);
+			free_unref_page(page);
 		else
 			__free_pages_ok(page, order);
 	}
@@ -4350,7 +4345,7 @@ void __page_frag_cache_drain(struct page *page, unsigned int count)
 		unsigned int order = compound_order(page);
 
 		if (order == 0)
-			free_hot_cold_page(page, false);
+			free_unref_page(page);
 		else
 			__free_pages_ok(page, order);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index b874c4761e84..8f3889553209 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1318,7 +1318,7 @@ void page_remove_rmap(struct page *page, bool compound)
 	 * It would be tidy to reset the PageAnon mapping here,
 	 * but that might overwrite a racing page_add_anon_rmap
 	 * which increments mapcount after us but sets mapping
-	 * before us: so leave the reset to free_hot_cold_page,
+	 * before us: so leave the reset to free_unref_page,
 	 * and remember that it's only reliable while mapped.
 	 * Leaving it set also helps swapoff to reinstate ptes
 	 * faster for those pages still in swapcache.
diff --git a/mm/swap.c b/mm/swap.c
index 0b78ea3cbdea..beaf11ff67d5 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -76,7 +76,7 @@ static void __page_cache_release(struct page *page)
 static void __put_single_page(struct page *page)
 {
 	__page_cache_release(page);
-	free_hot_cold_page(page, false);
+	free_unref_page(page);
 }
 
 static void __put_compound_page(struct page *page)
@@ -817,7 +817,7 @@ void release_pages(struct page **pages, int nr)
 		spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
 
 	mem_cgroup_uncharge_list(&pages_to_free);
-	free_hot_cold_page_list(&pages_to_free, 0);
+	free_unref_page_list(&pages_to_free);
 }
 EXPORT_SYMBOL(release_pages);
 
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 13d711dd8776..e574e6266ab8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1348,7 +1348,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 	mem_cgroup_uncharge_list(&free_pages);
 	try_to_unmap_flush();
-	free_hot_cold_page_list(&free_pages, true);
+	free_unref_page_list(&free_pages);
 
 	list_splice(&ret_pages, page_list);
 	count_vm_events(PGACTIVATE, pgactivate);
@@ -1823,7 +1823,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&page_list);
-	free_hot_cold_page_list(&page_list, true);
+	free_unref_page_list(&page_list);
 
 	/*
 	 * If reclaim is isolating dirty pages under writeback, it implies
@@ -2062,7 +2062,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&l_hold);
-	free_hot_cold_page_list(&l_hold, true);
+	free_unref_page_list(&l_hold);
 	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate,
 			nr_deactivate, nr_rotated, sc->priority, file);
 }
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 8/8] mm: Remove __GFP_COLD
  2017-10-18  7:59 ` Mel Gorman
@ 2017-10-18  7:59   ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of. Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact. Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.

This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache. Hence, there is only a potential benefit
for microbenchmarks that alloc/free pages in a tight loop. It's even worse
when THP is taken into account which has little or no chance of getting a
cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
pages will thrash the cache anyway.

The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path. A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising given
that the __GFP_COLD branches are a miniscule percentage of the fault path.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c         |  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-desc.c            |  2 +-
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c     |  3 +--
 .../net/ethernet/cavium/liquidio/octeon_network.h    |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c           |  5 ++---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c  |  4 ++--
 drivers/net/ethernet/qlogic/qlge/qlge_main.c         |  3 +--
 drivers/net/ethernet/sfc/falcon/rx.c                 |  2 +-
 drivers/net/ethernet/sfc/rx.c                        |  2 +-
 drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c      |  2 +-
 drivers/net/ethernet/ti/netcp_core.c                 |  2 +-
 drivers/net/virtio_net.c                             |  1 -
 drivers/staging/lustre/lustre/mdc/mdc_request.c      |  2 +-
 fs/cachefiles/rdwr.c                                 |  6 ++----
 include/linux/gfp.h                                  |  5 -----
 include/linux/pagemap.h                              |  8 +-------
 include/linux/skbuff.h                               |  2 +-
 include/linux/slab.h                                 |  3 ---
 include/trace/events/mmflags.h                       |  1 -
 kernel/power/snapshot.c                              |  4 ++--
 mm/filemap.c                                         |  6 +++---
 mm/page_alloc.c                                      | 20 ++++++--------------
 mm/percpu-vm.c                                       |  2 +-
 net/core/skbuff.c                                    |  4 ++--
 tools/perf/builtin-kmem.c                            |  1 -
 25 files changed, 32 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index f7dc22f65d9f..51f2800ac053 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -517,7 +517,7 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num)
 
 
 		rc = ena_alloc_rx_page(rx_ring, rx_info,
-				       __GFP_COLD | GFP_ATOMIC | __GFP_COMP);
+				       GFP_ATOMIC | __GFP_COMP);
 		if (unlikely(rc < 0)) {
 			netif_warn(rx_ring->adapter, rx_err, rx_ring->netdev,
 				   "failed to alloc buffer for rx queue %d\n",
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
index 45d92304068e..cc1e4f820e64 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
@@ -295,7 +295,7 @@ static int xgbe_alloc_pages(struct xgbe_prv_data *pdata,
 	order = alloc_order;
 
 	/* Try to obtain pages, decreasing order if necessary */
-	gfp = GFP_ATOMIC | __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
+	gfp = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN;
 	while (order >= 0) {
 		pages = alloc_pages_node(node, gfp, order);
 		if (pages)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 0654e0c76bc2..519ca6534b85 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -304,8 +304,7 @@ int aq_ring_rx_fill(struct aq_ring_s *self)
 		buff->flags = 0U;
 		buff->len = AQ_CFG_RX_FRAME_MAX;
 
-		buff->page = alloc_pages(GFP_ATOMIC | __GFP_COLD |
-					 __GFP_COMP, pages_order);
+		buff->page = alloc_pages(GFP_ATOMIC | __GFP_COMP, pages_order);
 		if (!buff->page) {
 			err = -ENOMEM;
 			goto err_exit;
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index 9e36319cead6..57853eead4b5 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -195,7 +195,7 @@ static inline void
 	struct sk_buff *skb;
 	struct octeon_skb_page_info *skb_pg_info;
 
-	page = alloc_page(GFP_ATOMIC | __GFP_COLD);
+	page = alloc_page(GFP_ATOMIC);
 	if (unlikely(!page))
 		return NULL;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b97a55c827eb..ffead38cf5da 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -193,7 +193,7 @@ static int mlx4_en_fill_rx_buffers(struct mlx4_en_priv *priv)
 
 			if (mlx4_en_prepare_rx_desc(priv, ring,
 						    ring->actual_size,
-						    GFP_KERNEL | __GFP_COLD)) {
+						    GFP_KERNEL)) {
 				if (ring->actual_size < MLX4_EN_MIN_RX_SIZE) {
 					en_err(priv, "Failed to allocate enough rx buffers\n");
 					return -ENOMEM;
@@ -552,8 +552,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
 	do {
 		if (mlx4_en_prepare_rx_desc(priv, ring,
 					    ring->prod & ring->size_mask,
-					    GFP_ATOMIC | __GFP_COLD |
-					    __GFP_MEMALLOC))
+					    GFP_ATOMIC | __GFP_MEMALLOC))
 			break;
 		ring->prod++;
 	} while (likely(--missing));
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1c0187f0af51..6364c9a7a372 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1183,7 +1183,7 @@ static void *nfp_net_rx_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
 	if (!dp->xdp_prog)
 		frag = netdev_alloc_frag(dp->fl_bufsz);
 	else
-		frag = page_address(alloc_page(GFP_KERNEL | __GFP_COLD));
+		frag = page_address(alloc_page(GFP_KERNEL));
 	if (!frag) {
 		nn_dp_warn(dp, "Failed to alloc receive page frag\n");
 		return NULL;
@@ -1206,7 +1206,7 @@ static void *nfp_net_napi_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
 	if (!dp->xdp_prog)
 		frag = napi_alloc_frag(dp->fl_bufsz);
 	else
-		frag = page_address(alloc_page(GFP_ATOMIC | __GFP_COLD));
+		frag = page_address(alloc_page(GFP_ATOMIC));
 	if (!frag) {
 		nn_dp_warn(dp, "Failed to alloc receive page frag\n");
 		return NULL;
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 9feec7009443..abfe2a5b28d5 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1092,8 +1092,7 @@ static int ql_get_next_chunk(struct ql_adapter *qdev, struct rx_ring *rx_ring,
 {
 	if (!rx_ring->pg_chunk.page) {
 		u64 map;
-		rx_ring->pg_chunk.page = alloc_pages(__GFP_COLD | __GFP_COMP |
-						GFP_ATOMIC,
+		rx_ring->pg_chunk.page = alloc_pages(__GFP_COMP | GFP_ATOMIC,
 						qdev->lbq_buf_order);
 		if (unlikely(!rx_ring->pg_chunk.page)) {
 			netif_err(qdev, drv, qdev->ndev,
diff --git a/drivers/net/ethernet/sfc/falcon/rx.c b/drivers/net/ethernet/sfc/falcon/rx.c
index 6a8406dc0c2b..91097aea6c41 100644
--- a/drivers/net/ethernet/sfc/falcon/rx.c
+++ b/drivers/net/ethernet/sfc/falcon/rx.c
@@ -163,7 +163,7 @@ static int ef4_init_rx_buffers(struct ef4_rx_queue *rx_queue, bool atomic)
 	do {
 		page = ef4_reuse_page(rx_queue);
 		if (page == NULL) {
-			page = alloc_pages(__GFP_COLD | __GFP_COMP |
+			page = alloc_pages(__GFP_COMP |
 					   (atomic ? GFP_ATOMIC : GFP_KERNEL),
 					   efx->rx_buffer_order);
 			if (unlikely(page == NULL))
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 42443f434569..0004c50d3c83 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -163,7 +163,7 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic)
 	do {
 		page = efx_reuse_page(rx_queue);
 		if (page == NULL) {
-			page = alloc_pages(__GFP_COLD | __GFP_COMP |
+			page = alloc_pages(__GFP_COMP |
 					   (atomic ? GFP_ATOMIC : GFP_KERNEL),
 					   efx->rx_buffer_order);
 			if (unlikely(page == NULL))
diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
index e9672b1f9968..031cf9c3435a 100644
--- a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
+++ b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
@@ -335,7 +335,7 @@ static int xlgmac_alloc_pages(struct xlgmac_pdata *pdata,
 	dma_addr_t pages_dma;
 
 	/* Try to obtain pages, decreasing order if necessary */
-	gfp |= __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
+	gfp |= __GFP_COMP | __GFP_NOWARN;
 	while (order >= 0) {
 		pages = alloc_pages(gfp, order);
 		if (pages)
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 437d36289786..50d2b76771b5 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -906,7 +906,7 @@ static int netcp_allocate_rx_buf(struct netcp_intf *netcp, int fdq)
 		sw_data[0] = (u32)bufptr;
 	} else {
 		/* Allocate a secondary receive queue entry */
-		page = alloc_page(GFP_ATOMIC | GFP_DMA | __GFP_COLD);
+		page = alloc_page(GFP_ATOMIC | GFP_DMA);
 		if (unlikely(!page)) {
 			dev_warn_ratelimited(netcp->ndev_dev, "Secondary page alloc failed\n");
 			goto fail;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 511f8339fa96..5eec09d63fc0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -988,7 +988,6 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 	int err;
 	bool oom;
 
-	gfp |= __GFP_COLD;
 	do {
 		if (vi->mergeable_rx_bufs)
 			err = add_recvbuf_mergeable(vi, rq, gfp);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 6ef8ddec4ab6..32716bc75d3b 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -1151,7 +1151,7 @@ static int mdc_read_page_remote(void *data, struct page *page0)
 	}
 
 	for (npages = 1; npages < max_pages; npages++) {
-		page = page_cache_alloc_cold(inode->i_mapping);
+		page = page_cache_alloc(inode->i_mapping);
 		if (!page)
 			break;
 		page_pool[npages] = page;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 23097cca2674..883bc7bb12c5 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -256,8 +256,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 			goto backing_page_already_present;
 
 		if (!newpage) {
-			newpage = __page_cache_alloc(cachefiles_gfp |
-						     __GFP_COLD);
+			newpage = __page_cache_alloc(cachefiles_gfp);
 			if (!newpage)
 				goto nomem_monitor;
 		}
@@ -493,8 +492,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 				goto backing_page_already_present;
 
 			if (!newpage) {
-				newpage = __page_cache_alloc(cachefiles_gfp |
-							     __GFP_COLD);
+				newpage = __page_cache_alloc(cachefiles_gfp);
 				if (!newpage)
 					goto nomem;
 			}
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index cc0cdbaa1b24..3047cdc796b2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -23,7 +23,6 @@ struct vm_area_struct;
 #define ___GFP_HIGH		0x20u
 #define ___GFP_IO		0x40u
 #define ___GFP_FS		0x80u
-#define ___GFP_COLD		0x100u
 #define ___GFP_NOWARN		0x200u
 #define ___GFP_RETRY_MAYFAIL	0x400u
 #define ___GFP_NOFAIL		0x800u
@@ -192,9 +191,6 @@ struct vm_area_struct;
 /*
  * Action modifiers
  *
- * __GFP_COLD indicates that the caller does not expect to be used in the near
- *   future. Where possible, a cache-cold page will be returned.
- *
  * __GFP_NOWARN suppresses allocation failure reports.
  *
  * __GFP_COMP address compound page metadata.
@@ -207,7 +203,6 @@ struct vm_area_struct;
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
  */
-#define __GFP_COLD	((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 514f6d9d8083..0d78856d3135 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -231,15 +231,9 @@ static inline struct page *page_cache_alloc(struct address_space *x)
 	return __page_cache_alloc(mapping_gfp_mask(x));
 }
 
-static inline struct page *page_cache_alloc_cold(struct address_space *x)
-{
-	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
-}
-
 static inline gfp_t readahead_gfp_mask(struct address_space *x)
 {
-	return mapping_gfp_mask(x) |
-				  __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN;
+	return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
 }
 
 typedef int filler_t(void *, struct page *);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72299ef00061..5fedbb398b10 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2675,7 +2675,7 @@ static inline struct page *__dev_alloc_pages(gfp_t gfp_mask,
 	 * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
 	 *     code in gfp_to_alloc_flags that should be enforcing this.
 	 */
-	gfp_mask |= __GFP_COLD | __GFP_COMP | __GFP_MEMALLOC;
+	gfp_mask |= __GFP_COMP | __GFP_MEMALLOC;
 
 	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
 }
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41473df6dfb0..d2529c6b37a9 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -458,9 +458,6 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
  * Also it is possible to set different flags by OR'ing
  * in one or more of the following additional @flags:
  *
- * %__GFP_COLD - Request cache-cold pages instead of
- *   trying to return cache-warm pages.
- *
  * %__GFP_HIGH - This allocation has high priority and may use emergency pools.
  *
  * %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index fec6291a6703..ab24013b0e33 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -31,7 +31,6 @@
 	{(unsigned long)__GFP_ATOMIC,		"__GFP_ATOMIC"},	\
 	{(unsigned long)__GFP_IO,		"__GFP_IO"},		\
 	{(unsigned long)__GFP_FS,		"__GFP_FS"},		\
-	{(unsigned long)__GFP_COLD,		"__GFP_COLD"},		\
 	{(unsigned long)__GFP_NOWARN,		"__GFP_NOWARN"},	\
 	{(unsigned long)__GFP_RETRY_MAYFAIL,	"__GFP_RETRY_MAYFAIL"},	\
 	{(unsigned long)__GFP_NOFAIL,		"__GFP_NOFAIL"},	\
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 0972a8e09d08..8a77a49f8f43 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1882,7 +1882,7 @@ static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
  */
 static inline int get_highmem_buffer(int safe_needed)
 {
-	buffer = get_image_page(GFP_ATOMIC | __GFP_COLD, safe_needed);
+	buffer = get_image_page(GFP_ATOMIC, safe_needed);
 	return buffer ? 0 : -ENOMEM;
 }
 
@@ -1943,7 +1943,7 @@ static int swsusp_alloc(struct memory_bitmap *copy_bm,
 		while (nr_pages-- > 0) {
 			struct page *page;
 
-			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			page = alloc_image_page(GFP_ATOMIC);
 			if (!page)
 				goto err_out;
 			memory_bm_set_bit(copy_bm, page_to_pfn(page));
diff --git a/mm/filemap.c b/mm/filemap.c
index aaaa324a283b..61175ac0b396 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2260,7 +2260,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		 * Ok, it wasn't cached, so we need to create a new
 		 * page..
 		 */
-		page = page_cache_alloc_cold(mapping);
+		page = page_cache_alloc(mapping);
 		if (!page) {
 			error = -ENOMEM;
 			goto out;
@@ -2372,7 +2372,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
 	int ret;
 
 	do {
-		page = __page_cache_alloc(gfp_mask|__GFP_COLD);
+		page = __page_cache_alloc(gfp_mask);
 		if (!page)
 			return -ENOMEM;
 
@@ -2776,7 +2776,7 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 repeat:
 	page = find_get_page(mapping, index);
 	if (!page) {
-		page = __page_cache_alloc(gfp | __GFP_COLD);
+		page = __page_cache_alloc(gfp);
 		if (!page)
 			return ERR_PTR(-ENOMEM);
 		err = add_to_page_cache_lru(page, mapping, index, gfp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13582efc57a0..ab68af66096f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2315,7 +2315,7 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
  */
 static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			unsigned long count, struct list_head *list,
-			int migratetype, bool cold)
+			int migratetype)
 {
 	int i, alloced = 0;
 
@@ -2337,10 +2337,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		 * merge IO requests if the physical pages are ordered
 		 * properly.
 		 */
-		if (likely(!cold))
-			list_add(&page->lru, list);
-		else
-			list_add_tail(&page->lru, list);
+		list_add(&page->lru, list);
 		list = &page->lru;
 		alloced++;
 		if (is_migrate_cma(get_pcppage_migratetype(page)))
@@ -2783,7 +2780,7 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 
 /* Remove page from the per-cpu list, caller must protect the list */
 static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype,
-			bool cold, struct per_cpu_pages *pcp,
+			struct per_cpu_pages *pcp,
 			struct list_head *list)
 {
 	struct page *page;
@@ -2792,16 +2789,12 @@ static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype,
 		if (list_empty(list)) {
 			pcp->count += rmqueue_bulk(zone, 0,
 					pcp->batch, list,
-					migratetype, cold);
+					migratetype);
 			if (unlikely(list_empty(list)))
 				return NULL;
 		}
 
-		if (cold)
-			page = list_last_entry(list, struct page, lru);
-		else
-			page = list_first_entry(list, struct page, lru);
-
+		page = list_first_entry(list, struct page, lru);
 		list_del(&page->lru);
 		pcp->count--;
 	} while (check_new_pcp(page));
@@ -2816,14 +2809,13 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 {
 	struct per_cpu_pages *pcp;
 	struct list_head *list;
-	bool cold = ((gfp_flags & __GFP_COLD) != 0);
 	struct page *page;
 	unsigned long flags;
 
 	local_irq_save(flags);
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	list = &pcp->lists[migratetype];
-	page = __rmqueue_pcplist(zone,  migratetype, cold, pcp, list);
+	page = __rmqueue_pcplist(zone,  migratetype, pcp, list);
 	if (page) {
 		__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
 		zone_statistics(preferred_zone, zone);
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 15dab691ea70..9158e5a81391 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -81,7 +81,7 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
 static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
 			    struct page **pages, int page_start, int page_end)
 {
-	const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_COLD;
+	const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM;
 	unsigned int cpu, tcpu;
 	int i;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 16982de649b9..8951d90818d5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -357,7 +357,7 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
  */
 void *netdev_alloc_frag(unsigned int fragsz)
 {
-	return __netdev_alloc_frag(fragsz, GFP_ATOMIC | __GFP_COLD);
+	return __netdev_alloc_frag(fragsz, GFP_ATOMIC);
 }
 EXPORT_SYMBOL(netdev_alloc_frag);
 
@@ -370,7 +370,7 @@ static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 
 void *napi_alloc_frag(unsigned int fragsz)
 {
-	return __napi_alloc_frag(fragsz, GFP_ATOMIC | __GFP_COLD);
+	return __napi_alloc_frag(fragsz, GFP_ATOMIC);
 }
 EXPORT_SYMBOL(napi_alloc_frag);
 
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 24ee68ecdd42..34c7dc86c62b 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -640,7 +640,6 @@ static const struct {
 	{ "__GFP_ATOMIC",		"_A" },
 	{ "__GFP_IO",			"I" },
 	{ "__GFP_FS",			"F" },
-	{ "__GFP_COLD",			"CO" },
 	{ "__GFP_NOWARN",		"NWR" },
 	{ "__GFP_RETRY_MAYFAIL",	"R" },
 	{ "__GFP_NOFAIL",		"NF" },
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 8/8] mm: Remove __GFP_COLD
@ 2017-10-18  7:59   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18  7:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

As the page free path makes no distinction between cache hot and cold
pages, there is no real useful ordering of pages in the free list that
allocation requests can take advantage of. Juding from the users of
__GFP_COLD, it is likely that a number of them are the result of copying
other sites instead of actually measuring the impact. Remove the
__GFP_COLD parameter which simplifies a number of paths in the page
allocator.

This is potentially controversial but bear in mind that the size of the
per-cpu pagelists versus modern cache sizes means that the whole per-cpu
list can often fit in the L3 cache. Hence, there is only a potential benefit
for microbenchmarks that alloc/free pages in a tight loop. It's even worse
when THP is taken into account which has little or no chance of getting a
cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
pages will thrash the cache anyway.

The truncate microbenchmarks are not shown as this patch affects the
allocation path and not the free path. A page fault microbenchmark was
tested but it showed no sigificant difference which is not surprising given
that the __GFP_COLD branches are a miniscule percentage of the fault path.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c         |  2 +-
 drivers/net/ethernet/amd/xgbe/xgbe-desc.c            |  2 +-
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c     |  3 +--
 .../net/ethernet/cavium/liquidio/octeon_network.h    |  2 +-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c           |  5 ++---
 drivers/net/ethernet/netronome/nfp/nfp_net_common.c  |  4 ++--
 drivers/net/ethernet/qlogic/qlge/qlge_main.c         |  3 +--
 drivers/net/ethernet/sfc/falcon/rx.c                 |  2 +-
 drivers/net/ethernet/sfc/rx.c                        |  2 +-
 drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c      |  2 +-
 drivers/net/ethernet/ti/netcp_core.c                 |  2 +-
 drivers/net/virtio_net.c                             |  1 -
 drivers/staging/lustre/lustre/mdc/mdc_request.c      |  2 +-
 fs/cachefiles/rdwr.c                                 |  6 ++----
 include/linux/gfp.h                                  |  5 -----
 include/linux/pagemap.h                              |  8 +-------
 include/linux/skbuff.h                               |  2 +-
 include/linux/slab.h                                 |  3 ---
 include/trace/events/mmflags.h                       |  1 -
 kernel/power/snapshot.c                              |  4 ++--
 mm/filemap.c                                         |  6 +++---
 mm/page_alloc.c                                      | 20 ++++++--------------
 mm/percpu-vm.c                                       |  2 +-
 net/core/skbuff.c                                    |  4 ++--
 tools/perf/builtin-kmem.c                            |  1 -
 25 files changed, 32 insertions(+), 62 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index f7dc22f65d9f..51f2800ac053 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -517,7 +517,7 @@ static int ena_refill_rx_bufs(struct ena_ring *rx_ring, u32 num)
 
 
 		rc = ena_alloc_rx_page(rx_ring, rx_info,
-				       __GFP_COLD | GFP_ATOMIC | __GFP_COMP);
+				       GFP_ATOMIC | __GFP_COMP);
 		if (unlikely(rc < 0)) {
 			netif_warn(rx_ring->adapter, rx_err, rx_ring->netdev,
 				   "failed to alloc buffer for rx queue %d\n",
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
index 45d92304068e..cc1e4f820e64 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-desc.c
@@ -295,7 +295,7 @@ static int xgbe_alloc_pages(struct xgbe_prv_data *pdata,
 	order = alloc_order;
 
 	/* Try to obtain pages, decreasing order if necessary */
-	gfp = GFP_ATOMIC | __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
+	gfp = GFP_ATOMIC | __GFP_COMP | __GFP_NOWARN;
 	while (order >= 0) {
 		pages = alloc_pages_node(node, gfp, order);
 		if (pages)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 0654e0c76bc2..519ca6534b85 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -304,8 +304,7 @@ int aq_ring_rx_fill(struct aq_ring_s *self)
 		buff->flags = 0U;
 		buff->len = AQ_CFG_RX_FRAME_MAX;
 
-		buff->page = alloc_pages(GFP_ATOMIC | __GFP_COLD |
-					 __GFP_COMP, pages_order);
+		buff->page = alloc_pages(GFP_ATOMIC | __GFP_COMP, pages_order);
 		if (!buff->page) {
 			err = -ENOMEM;
 			goto err_exit;
diff --git a/drivers/net/ethernet/cavium/liquidio/octeon_network.h b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
index 9e36319cead6..57853eead4b5 100644
--- a/drivers/net/ethernet/cavium/liquidio/octeon_network.h
+++ b/drivers/net/ethernet/cavium/liquidio/octeon_network.h
@@ -195,7 +195,7 @@ static inline void
 	struct sk_buff *skb;
 	struct octeon_skb_page_info *skb_pg_info;
 
-	page = alloc_page(GFP_ATOMIC | __GFP_COLD);
+	page = alloc_page(GFP_ATOMIC);
 	if (unlikely(!page))
 		return NULL;
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index b97a55c827eb..ffead38cf5da 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -193,7 +193,7 @@ static int mlx4_en_fill_rx_buffers(struct mlx4_en_priv *priv)
 
 			if (mlx4_en_prepare_rx_desc(priv, ring,
 						    ring->actual_size,
-						    GFP_KERNEL | __GFP_COLD)) {
+						    GFP_KERNEL)) {
 				if (ring->actual_size < MLX4_EN_MIN_RX_SIZE) {
 					en_err(priv, "Failed to allocate enough rx buffers\n");
 					return -ENOMEM;
@@ -552,8 +552,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
 	do {
 		if (mlx4_en_prepare_rx_desc(priv, ring,
 					    ring->prod & ring->size_mask,
-					    GFP_ATOMIC | __GFP_COLD |
-					    __GFP_MEMALLOC))
+					    GFP_ATOMIC | __GFP_MEMALLOC))
 			break;
 		ring->prod++;
 	} while (likely(--missing));
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 1c0187f0af51..6364c9a7a372 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1183,7 +1183,7 @@ static void *nfp_net_rx_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
 	if (!dp->xdp_prog)
 		frag = netdev_alloc_frag(dp->fl_bufsz);
 	else
-		frag = page_address(alloc_page(GFP_KERNEL | __GFP_COLD));
+		frag = page_address(alloc_page(GFP_KERNEL));
 	if (!frag) {
 		nn_dp_warn(dp, "Failed to alloc receive page frag\n");
 		return NULL;
@@ -1206,7 +1206,7 @@ static void *nfp_net_napi_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
 	if (!dp->xdp_prog)
 		frag = napi_alloc_frag(dp->fl_bufsz);
 	else
-		frag = page_address(alloc_page(GFP_ATOMIC | __GFP_COLD));
+		frag = page_address(alloc_page(GFP_ATOMIC));
 	if (!frag) {
 		nn_dp_warn(dp, "Failed to alloc receive page frag\n");
 		return NULL;
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 9feec7009443..abfe2a5b28d5 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -1092,8 +1092,7 @@ static int ql_get_next_chunk(struct ql_adapter *qdev, struct rx_ring *rx_ring,
 {
 	if (!rx_ring->pg_chunk.page) {
 		u64 map;
-		rx_ring->pg_chunk.page = alloc_pages(__GFP_COLD | __GFP_COMP |
-						GFP_ATOMIC,
+		rx_ring->pg_chunk.page = alloc_pages(__GFP_COMP | GFP_ATOMIC,
 						qdev->lbq_buf_order);
 		if (unlikely(!rx_ring->pg_chunk.page)) {
 			netif_err(qdev, drv, qdev->ndev,
diff --git a/drivers/net/ethernet/sfc/falcon/rx.c b/drivers/net/ethernet/sfc/falcon/rx.c
index 6a8406dc0c2b..91097aea6c41 100644
--- a/drivers/net/ethernet/sfc/falcon/rx.c
+++ b/drivers/net/ethernet/sfc/falcon/rx.c
@@ -163,7 +163,7 @@ static int ef4_init_rx_buffers(struct ef4_rx_queue *rx_queue, bool atomic)
 	do {
 		page = ef4_reuse_page(rx_queue);
 		if (page == NULL) {
-			page = alloc_pages(__GFP_COLD | __GFP_COMP |
+			page = alloc_pages(__GFP_COMP |
 					   (atomic ? GFP_ATOMIC : GFP_KERNEL),
 					   efx->rx_buffer_order);
 			if (unlikely(page == NULL))
diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 42443f434569..0004c50d3c83 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -163,7 +163,7 @@ static int efx_init_rx_buffers(struct efx_rx_queue *rx_queue, bool atomic)
 	do {
 		page = efx_reuse_page(rx_queue);
 		if (page == NULL) {
-			page = alloc_pages(__GFP_COLD | __GFP_COMP |
+			page = alloc_pages(__GFP_COMP |
 					   (atomic ? GFP_ATOMIC : GFP_KERNEL),
 					   efx->rx_buffer_order);
 			if (unlikely(page == NULL))
diff --git a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
index e9672b1f9968..031cf9c3435a 100644
--- a/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
+++ b/drivers/net/ethernet/synopsys/dwc-xlgmac-desc.c
@@ -335,7 +335,7 @@ static int xlgmac_alloc_pages(struct xlgmac_pdata *pdata,
 	dma_addr_t pages_dma;
 
 	/* Try to obtain pages, decreasing order if necessary */
-	gfp |= __GFP_COLD | __GFP_COMP | __GFP_NOWARN;
+	gfp |= __GFP_COMP | __GFP_NOWARN;
 	while (order >= 0) {
 		pages = alloc_pages(gfp, order);
 		if (pages)
diff --git a/drivers/net/ethernet/ti/netcp_core.c b/drivers/net/ethernet/ti/netcp_core.c
index 437d36289786..50d2b76771b5 100644
--- a/drivers/net/ethernet/ti/netcp_core.c
+++ b/drivers/net/ethernet/ti/netcp_core.c
@@ -906,7 +906,7 @@ static int netcp_allocate_rx_buf(struct netcp_intf *netcp, int fdq)
 		sw_data[0] = (u32)bufptr;
 	} else {
 		/* Allocate a secondary receive queue entry */
-		page = alloc_page(GFP_ATOMIC | GFP_DMA | __GFP_COLD);
+		page = alloc_page(GFP_ATOMIC | GFP_DMA);
 		if (unlikely(!page)) {
 			dev_warn_ratelimited(netcp->ndev_dev, "Secondary page alloc failed\n");
 			goto fail;
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 511f8339fa96..5eec09d63fc0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -988,7 +988,6 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 	int err;
 	bool oom;
 
-	gfp |= __GFP_COLD;
 	do {
 		if (vi->mergeable_rx_bufs)
 			err = add_recvbuf_mergeable(vi, rq, gfp);
diff --git a/drivers/staging/lustre/lustre/mdc/mdc_request.c b/drivers/staging/lustre/lustre/mdc/mdc_request.c
index 6ef8ddec4ab6..32716bc75d3b 100644
--- a/drivers/staging/lustre/lustre/mdc/mdc_request.c
+++ b/drivers/staging/lustre/lustre/mdc/mdc_request.c
@@ -1151,7 +1151,7 @@ static int mdc_read_page_remote(void *data, struct page *page0)
 	}
 
 	for (npages = 1; npages < max_pages; npages++) {
-		page = page_cache_alloc_cold(inode->i_mapping);
+		page = page_cache_alloc(inode->i_mapping);
 		if (!page)
 			break;
 		page_pool[npages] = page;
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c
index 23097cca2674..883bc7bb12c5 100644
--- a/fs/cachefiles/rdwr.c
+++ b/fs/cachefiles/rdwr.c
@@ -256,8 +256,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object,
 			goto backing_page_already_present;
 
 		if (!newpage) {
-			newpage = __page_cache_alloc(cachefiles_gfp |
-						     __GFP_COLD);
+			newpage = __page_cache_alloc(cachefiles_gfp);
 			if (!newpage)
 				goto nomem_monitor;
 		}
@@ -493,8 +492,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object,
 				goto backing_page_already_present;
 
 			if (!newpage) {
-				newpage = __page_cache_alloc(cachefiles_gfp |
-							     __GFP_COLD);
+				newpage = __page_cache_alloc(cachefiles_gfp);
 				if (!newpage)
 					goto nomem;
 			}
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index cc0cdbaa1b24..3047cdc796b2 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -23,7 +23,6 @@ struct vm_area_struct;
 #define ___GFP_HIGH		0x20u
 #define ___GFP_IO		0x40u
 #define ___GFP_FS		0x80u
-#define ___GFP_COLD		0x100u
 #define ___GFP_NOWARN		0x200u
 #define ___GFP_RETRY_MAYFAIL	0x400u
 #define ___GFP_NOFAIL		0x800u
@@ -192,9 +191,6 @@ struct vm_area_struct;
 /*
  * Action modifiers
  *
- * __GFP_COLD indicates that the caller does not expect to be used in the near
- *   future. Where possible, a cache-cold page will be returned.
- *
  * __GFP_NOWARN suppresses allocation failure reports.
  *
  * __GFP_COMP address compound page metadata.
@@ -207,7 +203,6 @@ struct vm_area_struct;
  *   distinguishing in the source between false positives and allocations that
  *   cannot be supported (e.g. page tables).
  */
-#define __GFP_COLD	((__force gfp_t)___GFP_COLD)
 #define __GFP_NOWARN	((__force gfp_t)___GFP_NOWARN)
 #define __GFP_COMP	((__force gfp_t)___GFP_COMP)
 #define __GFP_ZERO	((__force gfp_t)___GFP_ZERO)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 514f6d9d8083..0d78856d3135 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -231,15 +231,9 @@ static inline struct page *page_cache_alloc(struct address_space *x)
 	return __page_cache_alloc(mapping_gfp_mask(x));
 }
 
-static inline struct page *page_cache_alloc_cold(struct address_space *x)
-{
-	return __page_cache_alloc(mapping_gfp_mask(x)|__GFP_COLD);
-}
-
 static inline gfp_t readahead_gfp_mask(struct address_space *x)
 {
-	return mapping_gfp_mask(x) |
-				  __GFP_COLD | __GFP_NORETRY | __GFP_NOWARN;
+	return mapping_gfp_mask(x) | __GFP_NORETRY | __GFP_NOWARN;
 }
 
 typedef int filler_t(void *, struct page *);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72299ef00061..5fedbb398b10 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2675,7 +2675,7 @@ static inline struct page *__dev_alloc_pages(gfp_t gfp_mask,
 	 * 4.  __GFP_MEMALLOC is ignored if __GFP_NOMEMALLOC is set due to
 	 *     code in gfp_to_alloc_flags that should be enforcing this.
 	 */
-	gfp_mask |= __GFP_COLD | __GFP_COMP | __GFP_MEMALLOC;
+	gfp_mask |= __GFP_COMP | __GFP_MEMALLOC;
 
 	return alloc_pages_node(NUMA_NO_NODE, gfp_mask, order);
 }
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 41473df6dfb0..d2529c6b37a9 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -458,9 +458,6 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
  * Also it is possible to set different flags by OR'ing
  * in one or more of the following additional @flags:
  *
- * %__GFP_COLD - Request cache-cold pages instead of
- *   trying to return cache-warm pages.
- *
  * %__GFP_HIGH - This allocation has high priority and may use emergency pools.
  *
  * %__GFP_NOFAIL - Indicate that this allocation is in no way allowed to fail
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index fec6291a6703..ab24013b0e33 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -31,7 +31,6 @@
 	{(unsigned long)__GFP_ATOMIC,		"__GFP_ATOMIC"},	\
 	{(unsigned long)__GFP_IO,		"__GFP_IO"},		\
 	{(unsigned long)__GFP_FS,		"__GFP_FS"},		\
-	{(unsigned long)__GFP_COLD,		"__GFP_COLD"},		\
 	{(unsigned long)__GFP_NOWARN,		"__GFP_NOWARN"},	\
 	{(unsigned long)__GFP_RETRY_MAYFAIL,	"__GFP_RETRY_MAYFAIL"},	\
 	{(unsigned long)__GFP_NOFAIL,		"__GFP_NOFAIL"},	\
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 0972a8e09d08..8a77a49f8f43 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1882,7 +1882,7 @@ static int enough_free_mem(unsigned int nr_pages, unsigned int nr_highmem)
  */
 static inline int get_highmem_buffer(int safe_needed)
 {
-	buffer = get_image_page(GFP_ATOMIC | __GFP_COLD, safe_needed);
+	buffer = get_image_page(GFP_ATOMIC, safe_needed);
 	return buffer ? 0 : -ENOMEM;
 }
 
@@ -1943,7 +1943,7 @@ static int swsusp_alloc(struct memory_bitmap *copy_bm,
 		while (nr_pages-- > 0) {
 			struct page *page;
 
-			page = alloc_image_page(GFP_ATOMIC | __GFP_COLD);
+			page = alloc_image_page(GFP_ATOMIC);
 			if (!page)
 				goto err_out;
 			memory_bm_set_bit(copy_bm, page_to_pfn(page));
diff --git a/mm/filemap.c b/mm/filemap.c
index aaaa324a283b..61175ac0b396 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2260,7 +2260,7 @@ static ssize_t generic_file_buffered_read(struct kiocb *iocb,
 		 * Ok, it wasn't cached, so we need to create a new
 		 * page..
 		 */
-		page = page_cache_alloc_cold(mapping);
+		page = page_cache_alloc(mapping);
 		if (!page) {
 			error = -ENOMEM;
 			goto out;
@@ -2372,7 +2372,7 @@ static int page_cache_read(struct file *file, pgoff_t offset, gfp_t gfp_mask)
 	int ret;
 
 	do {
-		page = __page_cache_alloc(gfp_mask|__GFP_COLD);
+		page = __page_cache_alloc(gfp_mask);
 		if (!page)
 			return -ENOMEM;
 
@@ -2776,7 +2776,7 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 repeat:
 	page = find_get_page(mapping, index);
 	if (!page) {
-		page = __page_cache_alloc(gfp | __GFP_COLD);
+		page = __page_cache_alloc(gfp);
 		if (!page)
 			return ERR_PTR(-ENOMEM);
 		err = add_to_page_cache_lru(page, mapping, index, gfp);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13582efc57a0..ab68af66096f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2315,7 +2315,7 @@ static struct page *__rmqueue(struct zone *zone, unsigned int order,
  */
 static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			unsigned long count, struct list_head *list,
-			int migratetype, bool cold)
+			int migratetype)
 {
 	int i, alloced = 0;
 
@@ -2337,10 +2337,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		 * merge IO requests if the physical pages are ordered
 		 * properly.
 		 */
-		if (likely(!cold))
-			list_add(&page->lru, list);
-		else
-			list_add_tail(&page->lru, list);
+		list_add(&page->lru, list);
 		list = &page->lru;
 		alloced++;
 		if (is_migrate_cma(get_pcppage_migratetype(page)))
@@ -2783,7 +2780,7 @@ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z)
 
 /* Remove page from the per-cpu list, caller must protect the list */
 static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype,
-			bool cold, struct per_cpu_pages *pcp,
+			struct per_cpu_pages *pcp,
 			struct list_head *list)
 {
 	struct page *page;
@@ -2792,16 +2789,12 @@ static struct page *__rmqueue_pcplist(struct zone *zone, int migratetype,
 		if (list_empty(list)) {
 			pcp->count += rmqueue_bulk(zone, 0,
 					pcp->batch, list,
-					migratetype, cold);
+					migratetype);
 			if (unlikely(list_empty(list)))
 				return NULL;
 		}
 
-		if (cold)
-			page = list_last_entry(list, struct page, lru);
-		else
-			page = list_first_entry(list, struct page, lru);
-
+		page = list_first_entry(list, struct page, lru);
 		list_del(&page->lru);
 		pcp->count--;
 	} while (check_new_pcp(page));
@@ -2816,14 +2809,13 @@ static struct page *rmqueue_pcplist(struct zone *preferred_zone,
 {
 	struct per_cpu_pages *pcp;
 	struct list_head *list;
-	bool cold = ((gfp_flags & __GFP_COLD) != 0);
 	struct page *page;
 	unsigned long flags;
 
 	local_irq_save(flags);
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
 	list = &pcp->lists[migratetype];
-	page = __rmqueue_pcplist(zone,  migratetype, cold, pcp, list);
+	page = __rmqueue_pcplist(zone,  migratetype, pcp, list);
 	if (page) {
 		__count_zid_vm_events(PGALLOC, page_zonenum(page), 1 << order);
 		zone_statistics(preferred_zone, zone);
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 15dab691ea70..9158e5a81391 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -81,7 +81,7 @@ static void pcpu_free_pages(struct pcpu_chunk *chunk,
 static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
 			    struct page **pages, int page_start, int page_end)
 {
-	const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_COLD;
+	const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM;
 	unsigned int cpu, tcpu;
 	int i;
 
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 16982de649b9..8951d90818d5 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -357,7 +357,7 @@ static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
  */
 void *netdev_alloc_frag(unsigned int fragsz)
 {
-	return __netdev_alloc_frag(fragsz, GFP_ATOMIC | __GFP_COLD);
+	return __netdev_alloc_frag(fragsz, GFP_ATOMIC);
 }
 EXPORT_SYMBOL(netdev_alloc_frag);
 
@@ -370,7 +370,7 @@ static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 
 void *napi_alloc_frag(unsigned int fragsz)
 {
-	return __napi_alloc_frag(fragsz, GFP_ATOMIC | __GFP_COLD);
+	return __napi_alloc_frag(fragsz, GFP_ATOMIC);
 }
 EXPORT_SYMBOL(napi_alloc_frag);
 
diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c
index 24ee68ecdd42..34c7dc86c62b 100644
--- a/tools/perf/builtin-kmem.c
+++ b/tools/perf/builtin-kmem.c
@@ -640,7 +640,6 @@ static const struct {
 	{ "__GFP_ATOMIC",		"_A" },
 	{ "__GFP_IO",			"I" },
 	{ "__GFP_FS",			"F" },
-	{ "__GFP_COLD",			"CO" },
 	{ "__GFP_NOWARN",		"NWR" },
 	{ "__GFP_RETRY_MAYFAIL",	"R" },
 	{ "__GFP_NOFAIL",		"NF" },
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages
  2017-10-18  7:59   ` Mel Gorman
@ 2017-10-18  9:02     ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-18  9:02 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> Freeing a list of pages current enables/disables IRQs for each page freed.
> This patch splits freeing a list of pages into two operations -- preparing
> the pages for freeing and the actual freeing. This is a tradeoff - we're
> taking two passes of the list to free in exchange for avoiding multiple
> enable/disable of IRQs.

There's also some overhead of storing pfn in page->private, but all that
seems negligible compared to irq disable/enable...

> sparsetruncate (tiny)
>                               4.14.0-rc4             4.14.0-rc4
>                            janbatch-v1r1            oneirq-v1r1
> Min          Time      149.00 (   0.00%)      141.00 (   5.37%)
> 1st-qrtle    Time      150.00 (   0.00%)      142.00 (   5.33%)
> 2nd-qrtle    Time      151.00 (   0.00%)      142.00 (   5.96%)
> 3rd-qrtle    Time      151.00 (   0.00%)      143.00 (   5.30%)
> Max-90%      Time      153.00 (   0.00%)      144.00 (   5.88%)
> Max-95%      Time      155.00 (   0.00%)      147.00 (   5.16%)
> Max-99%      Time      201.00 (   0.00%)      195.00 (   2.99%)
> Max          Time      236.00 (   0.00%)      230.00 (   2.54%)
> Amean        Time      152.65 (   0.00%)      144.37 (   5.43%)
> Stddev       Time        9.78 (   0.00%)       10.44 (  -6.72%)
> Coeff        Time        6.41 (   0.00%)        7.23 ( -12.84%)
> Best99%Amean Time      152.07 (   0.00%)      143.72 (   5.50%)
> Best95%Amean Time      150.75 (   0.00%)      142.37 (   5.56%)
> Best90%Amean Time      150.59 (   0.00%)      142.19 (   5.58%)
> Best75%Amean Time      150.36 (   0.00%)      141.92 (   5.61%)
> Best50%Amean Time      150.04 (   0.00%)      141.69 (   5.56%)
> Best25%Amean Time      149.85 (   0.00%)      141.38 (   5.65%)
> 
> With a tiny number of files, each file truncated has resident page cache
> and it shows that time to truncate is roughtly 5-6% with some minor jitter.
> 
>                                       4.14.0-rc4             4.14.0-rc4
>                                    janbatch-v1r1            oneirq-v1r1
> Hmean     SeqCreate ops         65.27 (   0.00%)       81.86 (  25.43%)
> Hmean     SeqCreate read        39.48 (   0.00%)       47.44 (  20.16%)
> Hmean     SeqCreate del      24963.95 (   0.00%)    26319.99 (   5.43%)
> Hmean     RandCreate ops        65.47 (   0.00%)       82.01 (  25.26%)
> Hmean     RandCreate read       42.04 (   0.00%)       51.75 (  23.09%)
> Hmean     RandCreate del     23377.66 (   0.00%)    23764.79 (   1.66%)
> 
> As expected, there is a small gain for the delete operation.

Looks good.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

A nit below.

> @@ -2647,11 +2663,25 @@ void free_hot_cold_page(struct page *page, bool cold)
>  void free_hot_cold_page_list(struct list_head *list, bool cold)
>  {
>  	struct page *page, *next;
> +	unsigned long flags, pfn;
> +
> +	/* Prepare pages for freeing */
> +	list_for_each_entry_safe(page, next, list, lru) {
> +		pfn = page_to_pfn(page);
> +		if (!free_hot_cold_page_prepare(page, pfn))
> +			list_del(&page->lru);
> +		page->private = pfn;

We have (set_)page_private() helpers so better to use them (makes it a
bit easier to check for all places where page->private is used to e.g.
avoid a clash)?

> +	}
>  
> +	local_irq_save(flags);
>  	list_for_each_entry_safe(page, next, list, lru) {
> +		unsigned long pfn = page->private;
> +
> +		page->private = 0;

Same here.

>  		trace_mm_page_free_batched(page, cold);
> -		free_hot_cold_page(page, cold);
> +		free_hot_cold_page_commit(page, pfn, cold);
>  	}
> +	local_irq_restore(flags);
>  }
>  
>  /*
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages
@ 2017-10-18  9:02     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-18  9:02 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> Freeing a list of pages current enables/disables IRQs for each page freed.
> This patch splits freeing a list of pages into two operations -- preparing
> the pages for freeing and the actual freeing. This is a tradeoff - we're
> taking two passes of the list to free in exchange for avoiding multiple
> enable/disable of IRQs.

There's also some overhead of storing pfn in page->private, but all that
seems negligible compared to irq disable/enable...

> sparsetruncate (tiny)
>                               4.14.0-rc4             4.14.0-rc4
>                            janbatch-v1r1            oneirq-v1r1
> Min          Time      149.00 (   0.00%)      141.00 (   5.37%)
> 1st-qrtle    Time      150.00 (   0.00%)      142.00 (   5.33%)
> 2nd-qrtle    Time      151.00 (   0.00%)      142.00 (   5.96%)
> 3rd-qrtle    Time      151.00 (   0.00%)      143.00 (   5.30%)
> Max-90%      Time      153.00 (   0.00%)      144.00 (   5.88%)
> Max-95%      Time      155.00 (   0.00%)      147.00 (   5.16%)
> Max-99%      Time      201.00 (   0.00%)      195.00 (   2.99%)
> Max          Time      236.00 (   0.00%)      230.00 (   2.54%)
> Amean        Time      152.65 (   0.00%)      144.37 (   5.43%)
> Stddev       Time        9.78 (   0.00%)       10.44 (  -6.72%)
> Coeff        Time        6.41 (   0.00%)        7.23 ( -12.84%)
> Best99%Amean Time      152.07 (   0.00%)      143.72 (   5.50%)
> Best95%Amean Time      150.75 (   0.00%)      142.37 (   5.56%)
> Best90%Amean Time      150.59 (   0.00%)      142.19 (   5.58%)
> Best75%Amean Time      150.36 (   0.00%)      141.92 (   5.61%)
> Best50%Amean Time      150.04 (   0.00%)      141.69 (   5.56%)
> Best25%Amean Time      149.85 (   0.00%)      141.38 (   5.65%)
> 
> With a tiny number of files, each file truncated has resident page cache
> and it shows that time to truncate is roughtly 5-6% with some minor jitter.
> 
>                                       4.14.0-rc4             4.14.0-rc4
>                                    janbatch-v1r1            oneirq-v1r1
> Hmean     SeqCreate ops         65.27 (   0.00%)       81.86 (  25.43%)
> Hmean     SeqCreate read        39.48 (   0.00%)       47.44 (  20.16%)
> Hmean     SeqCreate del      24963.95 (   0.00%)    26319.99 (   5.43%)
> Hmean     RandCreate ops        65.47 (   0.00%)       82.01 (  25.26%)
> Hmean     RandCreate read       42.04 (   0.00%)       51.75 (  23.09%)
> Hmean     RandCreate del     23377.66 (   0.00%)    23764.79 (   1.66%)
> 
> As expected, there is a small gain for the delete operation.

Looks good.

> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

A nit below.

> @@ -2647,11 +2663,25 @@ void free_hot_cold_page(struct page *page, bool cold)
>  void free_hot_cold_page_list(struct list_head *list, bool cold)
>  {
>  	struct page *page, *next;
> +	unsigned long flags, pfn;
> +
> +	/* Prepare pages for freeing */
> +	list_for_each_entry_safe(page, next, list, lru) {
> +		pfn = page_to_pfn(page);
> +		if (!free_hot_cold_page_prepare(page, pfn))
> +			list_del(&page->lru);
> +		page->private = pfn;

We have (set_)page_private() helpers so better to use them (makes it a
bit easier to check for all places where page->private is used to e.g.
avoid a clash)?

> +	}
>  
> +	local_irq_save(flags);
>  	list_for_each_entry_safe(page, next, list, lru) {
> +		unsigned long pfn = page->private;
> +
> +		page->private = 0;

Same here.

>  		trace_mm_page_free_batched(page, cold);
> -		free_hot_cold_page(page, cold);
> +		free_hot_cold_page_commit(page, pfn, cold);
>  	}
> +	local_irq_restore(flags);
>  }
>  
>  /*
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages
  2017-10-18  9:02     ` Vlastimil Babka
@ 2017-10-18 10:15       ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18 10:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Wed, Oct 18, 2017 at 11:02:18AM +0200, Vlastimil Babka wrote:
> On 10/18/2017 09:59 AM, Mel Gorman wrote:
> > Freeing a list of pages current enables/disables IRQs for each page freed.
> > This patch splits freeing a list of pages into two operations -- preparing
> > the pages for freeing and the actual freeing. This is a tradeoff - we're
> > taking two passes of the list to free in exchange for avoiding multiple
> > enable/disable of IRQs.
> 
> There's also some overhead of storing pfn in page->private, but all that
> seems negligible compared to irq disable/enable...
> 

Exactly and it's cheaper than doing a second page to pfn lookup.

> <SNIP>
> Looks good.
> 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 

Thanks.

> A nit below.
> 
> > @@ -2647,11 +2663,25 @@ void free_hot_cold_page(struct page *page, bool cold)
> >  void free_hot_cold_page_list(struct list_head *list, bool cold)
> >  {
> >  	struct page *page, *next;
> > +	unsigned long flags, pfn;
> > +
> > +	/* Prepare pages for freeing */
> > +	list_for_each_entry_safe(page, next, list, lru) {
> > +		pfn = page_to_pfn(page);
> > +		if (!free_hot_cold_page_prepare(page, pfn))
> > +			list_del(&page->lru);
> > +		page->private = pfn;
> 
> We have (set_)page_private() helpers so better to use them (makes it a
> bit easier to check for all places where page->private is used to e.g.
> avoid a clash)?
> 

Agreed and it's trivial to do so

---8<---
mm, page_alloc: Enable/disable IRQs once when freeing a list of page -fix

Use page_private and set_page_private helpers.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 167e163cf733..092973014c1e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2670,14 +2670,14 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
 		pfn = page_to_pfn(page);
 		if (!free_hot_cold_page_prepare(page, pfn))
 			list_del(&page->lru);
-		page->private = pfn;
+		set_page_private(page, pfn);
 	}
 
 	local_irq_save(flags);
 	list_for_each_entry_safe(page, next, list, lru) {
-		unsigned long pfn = page->private;
+		unsigned long pfn = page_private(page);
 
-		page->private = 0;
+		set_page_private(page, 0);
 		trace_mm_page_free_batched(page, cold);
 		free_hot_cold_page_commit(page, pfn, cold);
 	}

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages
@ 2017-10-18 10:15       ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-18 10:15 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Wed, Oct 18, 2017 at 11:02:18AM +0200, Vlastimil Babka wrote:
> On 10/18/2017 09:59 AM, Mel Gorman wrote:
> > Freeing a list of pages current enables/disables IRQs for each page freed.
> > This patch splits freeing a list of pages into two operations -- preparing
> > the pages for freeing and the actual freeing. This is a tradeoff - we're
> > taking two passes of the list to free in exchange for avoiding multiple
> > enable/disable of IRQs.
> 
> There's also some overhead of storing pfn in page->private, but all that
> seems negligible compared to irq disable/enable...
> 

Exactly and it's cheaper than doing a second page to pfn lookup.

> <SNIP>
> Looks good.
> 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 

Thanks.

> A nit below.
> 
> > @@ -2647,11 +2663,25 @@ void free_hot_cold_page(struct page *page, bool cold)
> >  void free_hot_cold_page_list(struct list_head *list, bool cold)
> >  {
> >  	struct page *page, *next;
> > +	unsigned long flags, pfn;
> > +
> > +	/* Prepare pages for freeing */
> > +	list_for_each_entry_safe(page, next, list, lru) {
> > +		pfn = page_to_pfn(page);
> > +		if (!free_hot_cold_page_prepare(page, pfn))
> > +			list_del(&page->lru);
> > +		page->private = pfn;
> 
> We have (set_)page_private() helpers so better to use them (makes it a
> bit easier to check for all places where page->private is used to e.g.
> avoid a clash)?
> 

Agreed and it's trivial to do so

---8<---
mm, page_alloc: Enable/disable IRQs once when freeing a list of page -fix

Use page_private and set_page_private helpers.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 167e163cf733..092973014c1e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2670,14 +2670,14 @@ void free_hot_cold_page_list(struct list_head *list, bool cold)
 		pfn = page_to_pfn(page);
 		if (!free_hot_cold_page_prepare(page, pfn))
 			list_del(&page->lru);
-		page->private = pfn;
+		set_page_private(page, pfn);
 	}
 
 	local_irq_save(flags);
 	list_for_each_entry_safe(page, next, list, lru) {
-		unsigned long pfn = page->private;
+		unsigned long pfn = page_private(page);
 
-		page->private = 0;
+		set_page_private(page, 0);
 		trace_mm_page_free_batched(page, cold);
 		free_hot_cold_page_commit(page, pfn, cold);
 	}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/8] mm, truncate: Do not check mapping for every page being truncated
  2017-10-18  7:59   ` Mel Gorman
@ 2017-10-19  8:11     ` Jan Kara
  -1 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2017-10-19  8:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Wed 18-10-17 08:59:46, Mel Gorman wrote:
> During truncation, the mapping has already been checked for shmem and dax
> so it's known that workingset_update_node is required. This patch avoids
> the checks on mapping for each page being truncated. In all other cases,
> a lookup helper is used to determine if workingset_update_node() needs
> to be called. The one danger is that the API is slightly harder to use as
> calling workingset_update_node directly without checking for dax or shmem
> mappings could lead to surprises. However, the API rarely needs to be used
> and hopefully the comment is enough to give people the hint.
> 
> sparsetruncate (tiny)
>                               4.14.0-rc4             4.14.0-rc4
>                              oneirq-v1r1        pickhelper-v1r1
> Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
> 1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
> 2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> 3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
> Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
> Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
> Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
> Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
> Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
> Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
> Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
> Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
> Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
> Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
> Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
> Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
> Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)
> 
> As you'd expect, the gain is marginal but it can be detected. The differences
> in bonnie are all within the noise which is not surprising given the impact
> on the microbenchmark.
> 
> radix_tree_update_node_t is a callback for some radix operations that
> optionally passes in a private field. The only user of the callback is
> workingset_update_node and as it no longer requires a mapping, the private
> field is removed.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/dax.c                              |  2 +-
>  include/linux/radix-tree.h            |  7 +++----
>  include/linux/swap.h                  | 13 ++++++++++++-
>  lib/idr.c                             |  2 +-
>  lib/radix-tree.c                      | 30 +++++++++++++-----------------
>  mm/filemap.c                          |  7 ++++---
>  mm/shmem.c                            |  2 +-
>  mm/truncate.c                         |  2 +-
>  mm/workingset.c                       | 10 ++--------
>  tools/testing/radix-tree/multiorder.c |  2 +-
>  10 files changed, 39 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index f001d8c72a06..3318ae9046e6 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -565,7 +565,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
>  		ret = __radix_tree_lookup(page_tree, index, &node, &slot);
>  		WARN_ON_ONCE(ret != entry);
>  		__radix_tree_replace(page_tree, node, slot,
> -				     new_entry, NULL, NULL);
> +				     new_entry, NULL);
>  		entry = new_entry;
>  	}
>  
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index 567ebb5eaab0..0ca448c1cb42 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -301,18 +301,17 @@ void *__radix_tree_lookup(const struct radix_tree_root *, unsigned long index,
>  void *radix_tree_lookup(const struct radix_tree_root *, unsigned long);
>  void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *,
>  					unsigned long index);
> -typedef void (*radix_tree_update_node_t)(struct radix_tree_node *, void *);
> +typedef void (*radix_tree_update_node_t)(struct radix_tree_node *);
>  void __radix_tree_replace(struct radix_tree_root *, struct radix_tree_node *,
>  			  void __rcu **slot, void *entry,
> -			  radix_tree_update_node_t update_node, void *private);
> +			  radix_tree_update_node_t update_node);
>  void radix_tree_iter_replace(struct radix_tree_root *,
>  		const struct radix_tree_iter *, void __rcu **slot, void *entry);
>  void radix_tree_replace_slot(struct radix_tree_root *,
>  			     void __rcu **slot, void *entry);
>  void __radix_tree_delete_node(struct radix_tree_root *,
>  			      struct radix_tree_node *,
> -			      radix_tree_update_node_t update_node,
> -			      void *private);
> +			      radix_tree_update_node_t update_node);
>  void radix_tree_iter_delete(struct radix_tree_root *,
>  			struct radix_tree_iter *iter, void __rcu **slot);
>  void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8a807292037f..257c48a525c8 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -292,7 +292,18 @@ struct vma_swap_readahead {
>  void *workingset_eviction(struct address_space *mapping, struct page *page);
>  bool workingset_refault(void *shadow);
>  void workingset_activation(struct page *page);
> -void workingset_update_node(struct radix_tree_node *node, void *private);
> +
> +/* Do not use directly, use workingset_lookup_update */
> +void workingset_update_node(struct radix_tree_node *node);
> +
> +/* Returns workingset_update_node() if the mapping has shadow entries. */
> +#define workingset_lookup_update(mapping)				\
> +({									\
> +	radix_tree_update_node_t __helper = workingset_update_node;	\
> +	if (dax_mapping(mapping) || shmem_mapping(mapping))		\
> +		__helper = NULL;					\
> +	__helper;							\
> +})
>  
>  /* linux/mm/page_alloc.c */
>  extern unsigned long totalram_pages;
> diff --git a/lib/idr.c b/lib/idr.c
> index edd9b2be1651..2593ce513a18 100644
> --- a/lib/idr.c
> +++ b/lib/idr.c
> @@ -171,7 +171,7 @@ void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
>  	if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE))
>  		return ERR_PTR(-ENOENT);
>  
> -	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL, NULL);
> +	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL);
>  
>  	return entry;
>  }
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 8b1feca1230a..c8d55565fafa 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -677,8 +677,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
>   *	@root		radix tree root
>   */
>  static inline bool radix_tree_shrink(struct radix_tree_root *root,
> -				     radix_tree_update_node_t update_node,
> -				     void *private)
> +				     radix_tree_update_node_t update_node)
>  {
>  	bool shrunk = false;
>  
> @@ -739,7 +738,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
>  		if (!radix_tree_is_internal_node(child)) {
>  			node->slots[0] = (void __rcu *)RADIX_TREE_RETRY;
>  			if (update_node)
> -				update_node(node, private);
> +				update_node(node);
>  		}
>  
>  		WARN_ON_ONCE(!list_empty(&node->private_list));
> @@ -752,7 +751,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
>  
>  static bool delete_node(struct radix_tree_root *root,
>  			struct radix_tree_node *node,
> -			radix_tree_update_node_t update_node, void *private)
> +			radix_tree_update_node_t update_node)
>  {
>  	bool deleted = false;
>  
> @@ -762,8 +761,8 @@ static bool delete_node(struct radix_tree_root *root,
>  		if (node->count) {
>  			if (node_to_entry(node) ==
>  					rcu_dereference_raw(root->rnode))
> -				deleted |= radix_tree_shrink(root, update_node,
> -								private);
> +				deleted |= radix_tree_shrink(root,
> +								update_node);
>  			return deleted;
>  		}
>  
> @@ -1173,7 +1172,6 @@ static int calculate_count(struct radix_tree_root *root,
>   * @slot:		pointer to slot in @node
>   * @item:		new item to store in the slot.
>   * @update_node:	callback for changing leaf nodes
> - * @private:		private data to pass to @update_node
>   *
>   * For use with __radix_tree_lookup().  Caller must hold tree write locked
>   * across slot lookup and replacement.
> @@ -1181,7 +1179,7 @@ static int calculate_count(struct radix_tree_root *root,
>  void __radix_tree_replace(struct radix_tree_root *root,
>  			  struct radix_tree_node *node,
>  			  void __rcu **slot, void *item,
> -			  radix_tree_update_node_t update_node, void *private)
> +			  radix_tree_update_node_t update_node)
>  {
>  	void *old = rcu_dereference_raw(*slot);
>  	int exceptional = !!radix_tree_exceptional_entry(item) -
> @@ -1201,9 +1199,9 @@ void __radix_tree_replace(struct radix_tree_root *root,
>  		return;
>  
>  	if (update_node)
> -		update_node(node, private);
> +		update_node(node);
>  
> -	delete_node(root, node, update_node, private);
> +	delete_node(root, node, update_node);
>  }
>  
>  /**
> @@ -1225,7 +1223,7 @@ void __radix_tree_replace(struct radix_tree_root *root,
>  void radix_tree_replace_slot(struct radix_tree_root *root,
>  			     void __rcu **slot, void *item)
>  {
> -	__radix_tree_replace(root, NULL, slot, item, NULL, NULL);
> +	__radix_tree_replace(root, NULL, slot, item, NULL);
>  }
>  EXPORT_SYMBOL(radix_tree_replace_slot);
>  
> @@ -1242,7 +1240,7 @@ void radix_tree_iter_replace(struct radix_tree_root *root,
>  				const struct radix_tree_iter *iter,
>  				void __rcu **slot, void *item)
>  {
> -	__radix_tree_replace(root, iter->node, slot, item, NULL, NULL);
> +	__radix_tree_replace(root, iter->node, slot, item, NULL);
>  }
>  
>  #ifdef CONFIG_RADIX_TREE_MULTIORDER
> @@ -1972,7 +1970,6 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
>   *	@root:		radix tree root
>   *	@node:		node containing @index
>   *	@update_node:	callback for changing leaf nodes
> - *	@private:	private data to pass to @update_node
>   *
>   *	After clearing the slot at @index in @node from radix tree
>   *	rooted at @root, call this function to attempt freeing the
> @@ -1980,10 +1977,9 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
>   */
>  void __radix_tree_delete_node(struct radix_tree_root *root,
>  			      struct radix_tree_node *node,
> -			      radix_tree_update_node_t update_node,
> -			      void *private)
> +			      radix_tree_update_node_t update_node)
>  {
> -	delete_node(root, node, update_node, private);
> +	delete_node(root, node, update_node);
>  }
>  
>  static bool __radix_tree_delete(struct radix_tree_root *root,
> @@ -2001,7 +1997,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
>  			node_tag_clear(root, node, tag, offset);
>  
>  	replace_slot(slot, NULL, node, -1, exceptional);
> -	return node && delete_node(root, node, NULL, NULL);
> +	return node && delete_node(root, node, NULL);
>  }
>  
>  /**
> diff --git a/mm/filemap.c b/mm/filemap.c
> index dba68e1d9869..e59580feefd9 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -35,6 +35,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/memcontrol.h>
>  #include <linux/cleancache.h>
> +#include <linux/shmem_fs.h>
>  #include <linux/rmap.h>
>  #include "internal.h"
>  
> @@ -134,7 +135,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
>  			*shadowp = p;
>  	}
>  	__radix_tree_replace(&mapping->page_tree, node, slot, page,
> -			     workingset_update_node, mapping);
> +			     workingset_lookup_update(mapping));
>  	mapping->nrpages++;
>  	return 0;
>  }
> @@ -162,7 +163,7 @@ static void page_cache_tree_delete(struct address_space *mapping,
>  
>  		radix_tree_clear_tags(&mapping->page_tree, node, slot);
>  		__radix_tree_replace(&mapping->page_tree, node, slot, shadow,
> -				     workingset_update_node, mapping);
> +				workingset_lookup_update(mapping));
>  	}
>  
>  	page->mapping = NULL;
> @@ -360,7 +361,7 @@ page_cache_tree_delete_batch(struct address_space *mapping, int count,
>  		}
>  		radix_tree_clear_tags(&mapping->page_tree, iter.node, slot);
>  		__radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
> -				     workingset_update_node, mapping);
> +				workingset_lookup_update(mapping));
>  		total_pages++;
>  	}
>  	mapping->nrpages -= total_pages;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 07a1d22807be..a72f68aee6a4 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -338,7 +338,7 @@ static int shmem_radix_tree_replace(struct address_space *mapping,
>  	if (item != expected)
>  		return -ENOENT;
>  	__radix_tree_replace(&mapping->page_tree, node, pslot,
> -			     replacement, NULL, NULL);
> +			     replacement, NULL);
>  	return 0;
>  }
>  
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 3dfa2d5e642e..d578d542a6ee 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -42,7 +42,7 @@ static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
>  	if (*slot != entry)
>  		goto unlock;
>  	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
> -			     workingset_update_node, mapping);
> +			     workingset_update_node);
>  	mapping->nrexceptional--;
>  unlock:
>  	spin_unlock_irq(&mapping->tree_lock);
> diff --git a/mm/workingset.c b/mm/workingset.c
> index 7119cd745ace..0f7b4fb130e3 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -339,14 +339,8 @@ void workingset_activation(struct page *page)
>  
>  static struct list_lru shadow_nodes;
>  
> -void workingset_update_node(struct radix_tree_node *node, void *private)
> +void workingset_update_node(struct radix_tree_node *node)
>  {
> -	struct address_space *mapping = private;
> -
> -	/* Only regular page cache has shadow entries */
> -	if (dax_mapping(mapping) || shmem_mapping(mapping))
> -		return;
> -
>  	/*
>  	 * Track non-empty nodes that contain only shadow entries;
>  	 * unlink those that contain pages or are being freed.
> @@ -474,7 +468,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
>  		goto out_invalid;
>  	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
>  	__radix_tree_delete_node(&mapping->page_tree, node,
> -				 workingset_update_node, mapping);
> +				 workingset_lookup_update(mapping));
>  
>  out_invalid:
>  	spin_unlock(&mapping->tree_lock);
> diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
> index 06c71178d07d..59245b3d587c 100644
> --- a/tools/testing/radix-tree/multiorder.c
> +++ b/tools/testing/radix-tree/multiorder.c
> @@ -618,7 +618,7 @@ static void multiorder_account(void)
>  	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
>  	__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
>  	assert(node->count == node->exceptional * 2);
> -	__radix_tree_replace(&tree, node, slot, NULL, NULL, NULL);
> +	__radix_tree_replace(&tree, node, slot, NULL, NULL);
>  	assert(node->exceptional == 0);
>  
>  	item_kill_tree(&tree);
> -- 
> 2.14.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 2/8] mm, truncate: Do not check mapping for every page being truncated
@ 2017-10-19  8:11     ` Jan Kara
  0 siblings, 0 replies; 47+ messages in thread
From: Jan Kara @ 2017-10-19  8:11 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Wed 18-10-17 08:59:46, Mel Gorman wrote:
> During truncation, the mapping has already been checked for shmem and dax
> so it's known that workingset_update_node is required. This patch avoids
> the checks on mapping for each page being truncated. In all other cases,
> a lookup helper is used to determine if workingset_update_node() needs
> to be called. The one danger is that the API is slightly harder to use as
> calling workingset_update_node directly without checking for dax or shmem
> mappings could lead to surprises. However, the API rarely needs to be used
> and hopefully the comment is enough to give people the hint.
> 
> sparsetruncate (tiny)
>                               4.14.0-rc4             4.14.0-rc4
>                              oneirq-v1r1        pickhelper-v1r1
> Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
> 1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
> 2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> 3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
> Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
> Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
> Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
> Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
> Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
> Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
> Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
> Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
> Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
> Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
> Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
> Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
> Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)
> 
> As you'd expect, the gain is marginal but it can be detected. The differences
> in bonnie are all within the noise which is not surprising given the impact
> on the microbenchmark.
> 
> radix_tree_update_node_t is a callback for some radix operations that
> optionally passes in a private field. The only user of the callback is
> workingset_update_node and as it no longer requires a mapping, the private
> field is removed.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/dax.c                              |  2 +-
>  include/linux/radix-tree.h            |  7 +++----
>  include/linux/swap.h                  | 13 ++++++++++++-
>  lib/idr.c                             |  2 +-
>  lib/radix-tree.c                      | 30 +++++++++++++-----------------
>  mm/filemap.c                          |  7 ++++---
>  mm/shmem.c                            |  2 +-
>  mm/truncate.c                         |  2 +-
>  mm/workingset.c                       | 10 ++--------
>  tools/testing/radix-tree/multiorder.c |  2 +-
>  10 files changed, 39 insertions(+), 38 deletions(-)
> 
> diff --git a/fs/dax.c b/fs/dax.c
> index f001d8c72a06..3318ae9046e6 100644
> --- a/fs/dax.c
> +++ b/fs/dax.c
> @@ -565,7 +565,7 @@ static void *dax_insert_mapping_entry(struct address_space *mapping,
>  		ret = __radix_tree_lookup(page_tree, index, &node, &slot);
>  		WARN_ON_ONCE(ret != entry);
>  		__radix_tree_replace(page_tree, node, slot,
> -				     new_entry, NULL, NULL);
> +				     new_entry, NULL);
>  		entry = new_entry;
>  	}
>  
> diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
> index 567ebb5eaab0..0ca448c1cb42 100644
> --- a/include/linux/radix-tree.h
> +++ b/include/linux/radix-tree.h
> @@ -301,18 +301,17 @@ void *__radix_tree_lookup(const struct radix_tree_root *, unsigned long index,
>  void *radix_tree_lookup(const struct radix_tree_root *, unsigned long);
>  void __rcu **radix_tree_lookup_slot(const struct radix_tree_root *,
>  					unsigned long index);
> -typedef void (*radix_tree_update_node_t)(struct radix_tree_node *, void *);
> +typedef void (*radix_tree_update_node_t)(struct radix_tree_node *);
>  void __radix_tree_replace(struct radix_tree_root *, struct radix_tree_node *,
>  			  void __rcu **slot, void *entry,
> -			  radix_tree_update_node_t update_node, void *private);
> +			  radix_tree_update_node_t update_node);
>  void radix_tree_iter_replace(struct radix_tree_root *,
>  		const struct radix_tree_iter *, void __rcu **slot, void *entry);
>  void radix_tree_replace_slot(struct radix_tree_root *,
>  			     void __rcu **slot, void *entry);
>  void __radix_tree_delete_node(struct radix_tree_root *,
>  			      struct radix_tree_node *,
> -			      radix_tree_update_node_t update_node,
> -			      void *private);
> +			      radix_tree_update_node_t update_node);
>  void radix_tree_iter_delete(struct radix_tree_root *,
>  			struct radix_tree_iter *iter, void __rcu **slot);
>  void *radix_tree_delete_item(struct radix_tree_root *, unsigned long, void *);
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index 8a807292037f..257c48a525c8 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -292,7 +292,18 @@ struct vma_swap_readahead {
>  void *workingset_eviction(struct address_space *mapping, struct page *page);
>  bool workingset_refault(void *shadow);
>  void workingset_activation(struct page *page);
> -void workingset_update_node(struct radix_tree_node *node, void *private);
> +
> +/* Do not use directly, use workingset_lookup_update */
> +void workingset_update_node(struct radix_tree_node *node);
> +
> +/* Returns workingset_update_node() if the mapping has shadow entries. */
> +#define workingset_lookup_update(mapping)				\
> +({									\
> +	radix_tree_update_node_t __helper = workingset_update_node;	\
> +	if (dax_mapping(mapping) || shmem_mapping(mapping))		\
> +		__helper = NULL;					\
> +	__helper;							\
> +})
>  
>  /* linux/mm/page_alloc.c */
>  extern unsigned long totalram_pages;
> diff --git a/lib/idr.c b/lib/idr.c
> index edd9b2be1651..2593ce513a18 100644
> --- a/lib/idr.c
> +++ b/lib/idr.c
> @@ -171,7 +171,7 @@ void *idr_replace_ext(struct idr *idr, void *ptr, unsigned long id)
>  	if (!slot || radix_tree_tag_get(&idr->idr_rt, id, IDR_FREE))
>  		return ERR_PTR(-ENOENT);
>  
> -	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL, NULL);
> +	__radix_tree_replace(&idr->idr_rt, node, slot, ptr, NULL);
>  
>  	return entry;
>  }
> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> index 8b1feca1230a..c8d55565fafa 100644
> --- a/lib/radix-tree.c
> +++ b/lib/radix-tree.c
> @@ -677,8 +677,7 @@ static int radix_tree_extend(struct radix_tree_root *root, gfp_t gfp,
>   *	@root		radix tree root
>   */
>  static inline bool radix_tree_shrink(struct radix_tree_root *root,
> -				     radix_tree_update_node_t update_node,
> -				     void *private)
> +				     radix_tree_update_node_t update_node)
>  {
>  	bool shrunk = false;
>  
> @@ -739,7 +738,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
>  		if (!radix_tree_is_internal_node(child)) {
>  			node->slots[0] = (void __rcu *)RADIX_TREE_RETRY;
>  			if (update_node)
> -				update_node(node, private);
> +				update_node(node);
>  		}
>  
>  		WARN_ON_ONCE(!list_empty(&node->private_list));
> @@ -752,7 +751,7 @@ static inline bool radix_tree_shrink(struct radix_tree_root *root,
>  
>  static bool delete_node(struct radix_tree_root *root,
>  			struct radix_tree_node *node,
> -			radix_tree_update_node_t update_node, void *private)
> +			radix_tree_update_node_t update_node)
>  {
>  	bool deleted = false;
>  
> @@ -762,8 +761,8 @@ static bool delete_node(struct radix_tree_root *root,
>  		if (node->count) {
>  			if (node_to_entry(node) ==
>  					rcu_dereference_raw(root->rnode))
> -				deleted |= radix_tree_shrink(root, update_node,
> -								private);
> +				deleted |= radix_tree_shrink(root,
> +								update_node);
>  			return deleted;
>  		}
>  
> @@ -1173,7 +1172,6 @@ static int calculate_count(struct radix_tree_root *root,
>   * @slot:		pointer to slot in @node
>   * @item:		new item to store in the slot.
>   * @update_node:	callback for changing leaf nodes
> - * @private:		private data to pass to @update_node
>   *
>   * For use with __radix_tree_lookup().  Caller must hold tree write locked
>   * across slot lookup and replacement.
> @@ -1181,7 +1179,7 @@ static int calculate_count(struct radix_tree_root *root,
>  void __radix_tree_replace(struct radix_tree_root *root,
>  			  struct radix_tree_node *node,
>  			  void __rcu **slot, void *item,
> -			  radix_tree_update_node_t update_node, void *private)
> +			  radix_tree_update_node_t update_node)
>  {
>  	void *old = rcu_dereference_raw(*slot);
>  	int exceptional = !!radix_tree_exceptional_entry(item) -
> @@ -1201,9 +1199,9 @@ void __radix_tree_replace(struct radix_tree_root *root,
>  		return;
>  
>  	if (update_node)
> -		update_node(node, private);
> +		update_node(node);
>  
> -	delete_node(root, node, update_node, private);
> +	delete_node(root, node, update_node);
>  }
>  
>  /**
> @@ -1225,7 +1223,7 @@ void __radix_tree_replace(struct radix_tree_root *root,
>  void radix_tree_replace_slot(struct radix_tree_root *root,
>  			     void __rcu **slot, void *item)
>  {
> -	__radix_tree_replace(root, NULL, slot, item, NULL, NULL);
> +	__radix_tree_replace(root, NULL, slot, item, NULL);
>  }
>  EXPORT_SYMBOL(radix_tree_replace_slot);
>  
> @@ -1242,7 +1240,7 @@ void radix_tree_iter_replace(struct radix_tree_root *root,
>  				const struct radix_tree_iter *iter,
>  				void __rcu **slot, void *item)
>  {
> -	__radix_tree_replace(root, iter->node, slot, item, NULL, NULL);
> +	__radix_tree_replace(root, iter->node, slot, item, NULL);
>  }
>  
>  #ifdef CONFIG_RADIX_TREE_MULTIORDER
> @@ -1972,7 +1970,6 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
>   *	@root:		radix tree root
>   *	@node:		node containing @index
>   *	@update_node:	callback for changing leaf nodes
> - *	@private:	private data to pass to @update_node
>   *
>   *	After clearing the slot at @index in @node from radix tree
>   *	rooted at @root, call this function to attempt freeing the
> @@ -1980,10 +1977,9 @@ EXPORT_SYMBOL(radix_tree_gang_lookup_tag_slot);
>   */
>  void __radix_tree_delete_node(struct radix_tree_root *root,
>  			      struct radix_tree_node *node,
> -			      radix_tree_update_node_t update_node,
> -			      void *private)
> +			      radix_tree_update_node_t update_node)
>  {
> -	delete_node(root, node, update_node, private);
> +	delete_node(root, node, update_node);
>  }
>  
>  static bool __radix_tree_delete(struct radix_tree_root *root,
> @@ -2001,7 +1997,7 @@ static bool __radix_tree_delete(struct radix_tree_root *root,
>  			node_tag_clear(root, node, tag, offset);
>  
>  	replace_slot(slot, NULL, node, -1, exceptional);
> -	return node && delete_node(root, node, NULL, NULL);
> +	return node && delete_node(root, node, NULL);
>  }
>  
>  /**
> diff --git a/mm/filemap.c b/mm/filemap.c
> index dba68e1d9869..e59580feefd9 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -35,6 +35,7 @@
>  #include <linux/hugetlb.h>
>  #include <linux/memcontrol.h>
>  #include <linux/cleancache.h>
> +#include <linux/shmem_fs.h>
>  #include <linux/rmap.h>
>  #include "internal.h"
>  
> @@ -134,7 +135,7 @@ static int page_cache_tree_insert(struct address_space *mapping,
>  			*shadowp = p;
>  	}
>  	__radix_tree_replace(&mapping->page_tree, node, slot, page,
> -			     workingset_update_node, mapping);
> +			     workingset_lookup_update(mapping));
>  	mapping->nrpages++;
>  	return 0;
>  }
> @@ -162,7 +163,7 @@ static void page_cache_tree_delete(struct address_space *mapping,
>  
>  		radix_tree_clear_tags(&mapping->page_tree, node, slot);
>  		__radix_tree_replace(&mapping->page_tree, node, slot, shadow,
> -				     workingset_update_node, mapping);
> +				workingset_lookup_update(mapping));
>  	}
>  
>  	page->mapping = NULL;
> @@ -360,7 +361,7 @@ page_cache_tree_delete_batch(struct address_space *mapping, int count,
>  		}
>  		radix_tree_clear_tags(&mapping->page_tree, iter.node, slot);
>  		__radix_tree_replace(&mapping->page_tree, iter.node, slot, NULL,
> -				     workingset_update_node, mapping);
> +				workingset_lookup_update(mapping));
>  		total_pages++;
>  	}
>  	mapping->nrpages -= total_pages;
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 07a1d22807be..a72f68aee6a4 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -338,7 +338,7 @@ static int shmem_radix_tree_replace(struct address_space *mapping,
>  	if (item != expected)
>  		return -ENOENT;
>  	__radix_tree_replace(&mapping->page_tree, node, pslot,
> -			     replacement, NULL, NULL);
> +			     replacement, NULL);
>  	return 0;
>  }
>  
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 3dfa2d5e642e..d578d542a6ee 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -42,7 +42,7 @@ static void clear_shadow_entry(struct address_space *mapping, pgoff_t index,
>  	if (*slot != entry)
>  		goto unlock;
>  	__radix_tree_replace(&mapping->page_tree, node, slot, NULL,
> -			     workingset_update_node, mapping);
> +			     workingset_update_node);
>  	mapping->nrexceptional--;
>  unlock:
>  	spin_unlock_irq(&mapping->tree_lock);
> diff --git a/mm/workingset.c b/mm/workingset.c
> index 7119cd745ace..0f7b4fb130e3 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -339,14 +339,8 @@ void workingset_activation(struct page *page)
>  
>  static struct list_lru shadow_nodes;
>  
> -void workingset_update_node(struct radix_tree_node *node, void *private)
> +void workingset_update_node(struct radix_tree_node *node)
>  {
> -	struct address_space *mapping = private;
> -
> -	/* Only regular page cache has shadow entries */
> -	if (dax_mapping(mapping) || shmem_mapping(mapping))
> -		return;
> -
>  	/*
>  	 * Track non-empty nodes that contain only shadow entries;
>  	 * unlink those that contain pages or are being freed.
> @@ -474,7 +468,7 @@ static enum lru_status shadow_lru_isolate(struct list_head *item,
>  		goto out_invalid;
>  	inc_lruvec_page_state(virt_to_page(node), WORKINGSET_NODERECLAIM);
>  	__radix_tree_delete_node(&mapping->page_tree, node,
> -				 workingset_update_node, mapping);
> +				 workingset_lookup_update(mapping));
>  
>  out_invalid:
>  	spin_unlock(&mapping->tree_lock);
> diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
> index 06c71178d07d..59245b3d587c 100644
> --- a/tools/testing/radix-tree/multiorder.c
> +++ b/tools/testing/radix-tree/multiorder.c
> @@ -618,7 +618,7 @@ static void multiorder_account(void)
>  	__radix_tree_insert(&tree, 1 << 5, 5, (void *)0x12);
>  	__radix_tree_lookup(&tree, 1 << 5, &node, &slot);
>  	assert(node->count == node->exceptional * 2);
> -	__radix_tree_replace(&tree, node, slot, NULL, NULL, NULL);
> +	__radix_tree_replace(&tree, node, slot, NULL, NULL);
>  	assert(node->exceptional == 0);
>  
>  	item_kill_tree(&tree);
> -- 
> 2.14.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
  2017-10-18  7:59   ` Mel Gorman
@ 2017-10-19  9:12     ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19  9:12 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> When a pagevec is initialised on the stack, it is generally used multiple
> times over a range of pages, looking up entries and then releasing them.
> On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
> on the grounds the page being released may be on those queues and the
> pages may be cache hot. In many cases only the first drain is necessary
> as it's unlikely that the range of pages being walked is racing against
> LRU addition.  Even if there is such a race, the impact is marginal where
> as constantly redraining the lru pagevecs costs.

Right, the drain is only to a local cpu, not all of them, so that kind
of "racing" shouldn't be even possible.

> This patch ensures that pagevec is only drained once in a given lifecycle
> without increasing the cache footprint of the pagevec structure. Only

Well, strictly speaking it does prevent decreasing the cache footprint
by removing the 'cold' field later :)

> sparsetruncate tiny is shown here as large files have many exceptional
> entries and calls pagecache_release less frequently.
> 
> sparsetruncate (tiny)
>                               4.14.0-rc4             4.14.0-rc4
>                         batchshadow-v1r1          onedrain-v1r1
> Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
> 1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> 2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> 3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
> Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
> Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
> Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
> Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
> Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
> Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
> Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
> Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
> Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
> Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
> Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
> Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
> Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)
> 
> The impact on bonnie is marginal and within the noise because a significant
> percentage of the file being truncated has been reclaimed and consists of
> shadow entries which reduce the hotness of the pagevec_release path.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  include/linux/pagevec.h | 4 +++-
>  mm/swap.c               | 5 ++++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> index 4dcd5506f1ed..4231979be982 100644
> --- a/include/linux/pagevec.h
> +++ b/include/linux/pagevec.h
> @@ -16,7 +16,8 @@ struct address_space;
>  
>  struct pagevec {
>  	unsigned long nr;
> -	unsigned long cold;
> +	bool cold;
> +	bool drained;

'drained' sounds a bit misleading to me, I would expect it to refer to
*this* pagevec. What about e.g. "lru_drained"?

Thanks

>  	struct page *pages[PAGEVEC_SIZE];
>  };
>  
> @@ -45,6 +46,7 @@ static inline void pagevec_init(struct pagevec *pvec, int cold)
>  {
>  	pvec->nr = 0;
>  	pvec->cold = cold;
> +	pvec->drained = false;
>  }
>  
>  static inline void pagevec_reinit(struct pagevec *pvec)
> diff --git a/mm/swap.c b/mm/swap.c
> index a77d68f2c1b6..31bd9d8a5db7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -833,7 +833,10 @@ EXPORT_SYMBOL(release_pages);
>   */
>  void __pagevec_release(struct pagevec *pvec)
>  {
> -	lru_add_drain();
> +	if (!pvec->drained) {
> +		lru_add_drain();
> +		pvec->drained = true;
> +	}
>  	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
>  	pagevec_reinit(pvec);
>  }
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
@ 2017-10-19  9:12     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19  9:12 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> When a pagevec is initialised on the stack, it is generally used multiple
> times over a range of pages, looking up entries and then releasing them.
> On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
> on the grounds the page being released may be on those queues and the
> pages may be cache hot. In many cases only the first drain is necessary
> as it's unlikely that the range of pages being walked is racing against
> LRU addition.  Even if there is such a race, the impact is marginal where
> as constantly redraining the lru pagevecs costs.

Right, the drain is only to a local cpu, not all of them, so that kind
of "racing" shouldn't be even possible.

> This patch ensures that pagevec is only drained once in a given lifecycle
> without increasing the cache footprint of the pagevec structure. Only

Well, strictly speaking it does prevent decreasing the cache footprint
by removing the 'cold' field later :)

> sparsetruncate tiny is shown here as large files have many exceptional
> entries and calls pagecache_release less frequently.
> 
> sparsetruncate (tiny)
>                               4.14.0-rc4             4.14.0-rc4
>                         batchshadow-v1r1          onedrain-v1r1
> Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
> 1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> 2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> 3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
> Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
> Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
> Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
> Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
> Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
> Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
> Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
> Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
> Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
> Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
> Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
> Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
> Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)
> 
> The impact on bonnie is marginal and within the noise because a significant
> percentage of the file being truncated has been reclaimed and consists of
> shadow entries which reduce the hotness of the pagevec_release path.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  include/linux/pagevec.h | 4 +++-
>  mm/swap.c               | 5 ++++-
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> index 4dcd5506f1ed..4231979be982 100644
> --- a/include/linux/pagevec.h
> +++ b/include/linux/pagevec.h
> @@ -16,7 +16,8 @@ struct address_space;
>  
>  struct pagevec {
>  	unsigned long nr;
> -	unsigned long cold;
> +	bool cold;
> +	bool drained;

'drained' sounds a bit misleading to me, I would expect it to refer to
*this* pagevec. What about e.g. "lru_drained"?

Thanks

>  	struct page *pages[PAGEVEC_SIZE];
>  };
>  
> @@ -45,6 +46,7 @@ static inline void pagevec_init(struct pagevec *pvec, int cold)
>  {
>  	pvec->nr = 0;
>  	pvec->cold = cold;
> +	pvec->drained = false;
>  }
>  
>  static inline void pagevec_reinit(struct pagevec *pvec)
> diff --git a/mm/swap.c b/mm/swap.c
> index a77d68f2c1b6..31bd9d8a5db7 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -833,7 +833,10 @@ EXPORT_SYMBOL(release_pages);
>   */
>  void __pagevec_release(struct pagevec *pvec)
>  {
> -	lru_add_drain();
> +	if (!pvec->drained) {
> +		lru_add_drain();
> +		pvec->drained = true;
> +	}
>  	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
>  	pagevec_reinit(pvec);
>  }
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/8] mm, pagevec: Remove cold parameter for pagevecs
  2017-10-18  7:59   ` Mel Gorman
@ 2017-10-19  9:24     ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19  9:24 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> Every pagevec_init user claims the pages being released are false even in
> cases where it is unlikely the pages are hot. As no one cares about the
> hotness of pages being released to the allocator, just ditch the parameter.
> 
> No performance impact is expected as the overhead is marginal. The parameter
> is removed simply because it is a bit stupid to have a useless parameter
> copied everywhere.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 5/8] mm, pagevec: Remove cold parameter for pagevecs
@ 2017-10-19  9:24     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19  9:24 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> Every pagevec_init user claims the pages being released are false even in
> cases where it is unlikely the pages are hot. As no one cares about the
> hotness of pages being released to the allocator, just ditch the parameter.
> 
> No performance impact is expected as the overhead is marginal. The parameter
> is removed simply because it is a bit stupid to have a useless parameter
> copied everywhere.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/8] mm: Remove cold parameter for release_pages
  2017-10-18  7:59   ` Mel Gorman
@ 2017-10-19  9:26     ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19  9:26 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> All callers of release_pages claim the pages being released are cache hot.
> As no one cares about the hotness of pages being released to the allocator,
> just ditch the parameter.
> 
> No performance impact is expected as the overhead is marginal. The parameter
> is removed simply because it is a bit stupid to have a useless parameter
> copied everywhere.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 6/8] mm: Remove cold parameter for release_pages
@ 2017-10-19  9:26     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19  9:26 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> All callers of release_pages claim the pages being released are cache hot.
> As no one cares about the hotness of pages being released to the allocator,
> just ditch the parameter.
> 
> No performance impact is expected as the overhead is marginal. The parameter
> is removed simply because it is a bit stupid to have a useless parameter
> copied everywhere.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
  2017-10-19  9:12     ` Vlastimil Babka
@ 2017-10-19  9:33       ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-19  9:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Thu, Oct 19, 2017 at 11:12:52AM +0200, Vlastimil Babka wrote:
> On 10/18/2017 09:59 AM, Mel Gorman wrote:
> > When a pagevec is initialised on the stack, it is generally used multiple
> > times over a range of pages, looking up entries and then releasing them.
> > On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
> > on the grounds the page being released may be on those queues and the
> > pages may be cache hot. In many cases only the first drain is necessary
> > as it's unlikely that the range of pages being walked is racing against
> > LRU addition.  Even if there is such a race, the impact is marginal where
> > as constantly redraining the lru pagevecs costs.
> 
> Right, the drain is only to a local cpu, not all of them, so that kind
> of "racing" shouldn't be even possible.
> 

Potentially the user of the pagevec can be preempted and another process
modify the per-cpu pagevecs in parallel. Note even that users of a pagevec
in a loop may call cond_resched so preemption is not even necessary if
the pagevec is being used for a large enough number of operations. The
risk is still marginal.

> > This patch ensures that pagevec is only drained once in a given lifecycle
> > without increasing the cache footprint of the pagevec structure. Only
> 
> Well, strictly speaking it does prevent decreasing the cache footprint
> by removing the 'cold' field later :)
> 

Debatable. Even freeing a cold page if it was handled properly still has
a cache footprint impact because the struct page fields are accessed.
Maybe that's not what you meant and you are referring to the size of the
structure itself. If so, note that I change the type of the two fields so
they should fit in the same size as an unsigned long in many cases.

As an aside, I did at one point have a patch that removed the drained
field as well and increased the number of pagevec entries but it didn't
work out in terms of overall performance so I dropped it.

> > sparsetruncate tiny is shown here as large files have many exceptional
> > entries and calls pagecache_release less frequently.
> > 
> > sparsetruncate (tiny)
> >                               4.14.0-rc4             4.14.0-rc4
> >                         batchshadow-v1r1          onedrain-v1r1
> > Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
> > 1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> > 2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> > 3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
> > Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
> > Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
> > Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
> > Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
> > Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
> > Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
> > Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
> > Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
> > Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
> > Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
> > Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
> > Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
> > Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)
> > 
> > The impact on bonnie is marginal and within the noise because a significant
> > percentage of the file being truncated has been reclaimed and consists of
> > shadow entries which reduce the hotness of the pagevec_release path.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  include/linux/pagevec.h | 4 +++-
> >  mm/swap.c               | 5 ++++-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> > index 4dcd5506f1ed..4231979be982 100644
> > --- a/include/linux/pagevec.h
> > +++ b/include/linux/pagevec.h
> > @@ -16,7 +16,8 @@ struct address_space;
> >  
> >  struct pagevec {
> >  	unsigned long nr;
> > -	unsigned long cold;
> > +	bool cold;
> > +	bool drained;
> 
> 'drained' sounds a bit misleading to me, I would expect it to refer to
> *this* pagevec. What about e.g. "lru_drained"?
> 

It's not draining the LRU as such. How about the following patch on top
of the series? If another full series repost is necessary, I'll fold it
in.

---8<---
mm, pagevec: Rename pagevec drained field

According to Vlastimil Babka, the drained field in pagevec is potentially
misleading because it might be interpreted as draining this pagevec instead
of the percpu lru pagevecs. Rename the field for clarity.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/pagevec.h | 4 ++--
 mm/swap.c               | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 8dcde51e80ff..ba5dc27ef6bb 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,7 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	bool drained;
+	bool percpu_pvec_drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -52,7 +52,7 @@ static inline unsigned pagevec_lookup_tag(struct pagevec *pvec,
 static inline void pagevec_init(struct pagevec *pvec)
 {
 	pvec->nr = 0;
-	pvec->drained = false;
+	pvec->percpu_pvec_drained = false;
 }
 
 static inline void pagevec_reinit(struct pagevec *pvec)
diff --git a/mm/swap.c b/mm/swap.c
index b480279c760c..38e1b6374a97 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -833,9 +833,9 @@ EXPORT_SYMBOL(release_pages);
  */
 void __pagevec_release(struct pagevec *pvec)
 {
-	if (!pvec->drained) {
+	if (!pvec->percpu_pvec_drained) {
 		lru_add_drain();
-		pvec->drained = true;
+		pvec->percpu_pvec_drained = true;
 	}
 	release_pages(pvec->pages, pagevec_count(pvec));
 	pagevec_reinit(pvec);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
@ 2017-10-19  9:33       ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-19  9:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Thu, Oct 19, 2017 at 11:12:52AM +0200, Vlastimil Babka wrote:
> On 10/18/2017 09:59 AM, Mel Gorman wrote:
> > When a pagevec is initialised on the stack, it is generally used multiple
> > times over a range of pages, looking up entries and then releasing them.
> > On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
> > on the grounds the page being released may be on those queues and the
> > pages may be cache hot. In many cases only the first drain is necessary
> > as it's unlikely that the range of pages being walked is racing against
> > LRU addition.  Even if there is such a race, the impact is marginal where
> > as constantly redraining the lru pagevecs costs.
> 
> Right, the drain is only to a local cpu, not all of them, so that kind
> of "racing" shouldn't be even possible.
> 

Potentially the user of the pagevec can be preempted and another process
modify the per-cpu pagevecs in parallel. Note even that users of a pagevec
in a loop may call cond_resched so preemption is not even necessary if
the pagevec is being used for a large enough number of operations. The
risk is still marginal.

> > This patch ensures that pagevec is only drained once in a given lifecycle
> > without increasing the cache footprint of the pagevec structure. Only
> 
> Well, strictly speaking it does prevent decreasing the cache footprint
> by removing the 'cold' field later :)
> 

Debatable. Even freeing a cold page if it was handled properly still has
a cache footprint impact because the struct page fields are accessed.
Maybe that's not what you meant and you are referring to the size of the
structure itself. If so, note that I change the type of the two fields so
they should fit in the same size as an unsigned long in many cases.

As an aside, I did at one point have a patch that removed the drained
field as well and increased the number of pagevec entries but it didn't
work out in terms of overall performance so I dropped it.

> > sparsetruncate tiny is shown here as large files have many exceptional
> > entries and calls pagecache_release less frequently.
> > 
> > sparsetruncate (tiny)
> >                               4.14.0-rc4             4.14.0-rc4
> >                         batchshadow-v1r1          onedrain-v1r1
> > Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
> > 1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> > 2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
> > 3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
> > Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
> > Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
> > Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
> > Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
> > Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
> > Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
> > Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
> > Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
> > Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
> > Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
> > Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
> > Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
> > Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)
> > 
> > The impact on bonnie is marginal and within the noise because a significant
> > percentage of the file being truncated has been reclaimed and consists of
> > shadow entries which reduce the hotness of the pagevec_release path.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> > ---
> >  include/linux/pagevec.h | 4 +++-
> >  mm/swap.c               | 5 ++++-
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> > index 4dcd5506f1ed..4231979be982 100644
> > --- a/include/linux/pagevec.h
> > +++ b/include/linux/pagevec.h
> > @@ -16,7 +16,8 @@ struct address_space;
> >  
> >  struct pagevec {
> >  	unsigned long nr;
> > -	unsigned long cold;
> > +	bool cold;
> > +	bool drained;
> 
> 'drained' sounds a bit misleading to me, I would expect it to refer to
> *this* pagevec. What about e.g. "lru_drained"?
> 

It's not draining the LRU as such. How about the following patch on top
of the series? If another full series repost is necessary, I'll fold it
in.

---8<---
mm, pagevec: Rename pagevec drained field

According to Vlastimil Babka, the drained field in pagevec is potentially
misleading because it might be interpreted as draining this pagevec instead
of the percpu lru pagevecs. Rename the field for clarity.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/pagevec.h | 4 ++--
 mm/swap.c               | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 8dcde51e80ff..ba5dc27ef6bb 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,7 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	bool drained;
+	bool percpu_pvec_drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -52,7 +52,7 @@ static inline unsigned pagevec_lookup_tag(struct pagevec *pvec,
 static inline void pagevec_init(struct pagevec *pvec)
 {
 	pvec->nr = 0;
-	pvec->drained = false;
+	pvec->percpu_pvec_drained = false;
 }
 
 static inline void pagevec_reinit(struct pagevec *pvec)
diff --git a/mm/swap.c b/mm/swap.c
index b480279c760c..38e1b6374a97 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -833,9 +833,9 @@ EXPORT_SYMBOL(release_pages);
  */
 void __pagevec_release(struct pagevec *pvec)
 {
-	if (!pvec->drained) {
+	if (!pvec->percpu_pvec_drained) {
 		lru_add_drain();
-		pvec->drained = true;
+		pvec->percpu_pvec_drained = true;
 	}
 	release_pages(pvec->pages, pagevec_count(pvec));
 	pagevec_reinit(pvec);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page*
  2017-10-18  7:59   ` Mel Gorman
@ 2017-10-19 13:12     ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:12 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> Most callers users of free_hot_cold_page claim the pages being released are
> cache hot. The exception is the page reclaim paths where it is likely that
> enough pages will be freed in the near future that the per-cpu lists are
> going to be recycled and the cache hotness information is lost.

Maybe it would make sense for reclaim to skip pcplists? (out of scope of
this series, of course).

> As no one
> really cares about the hotness of pages being released to the allocator,
> just ditch the parameter.
> 
> The APIs are renamed to indicate that it's no longer about hot/cold pages. It
> should also be less confusing as there are subtle differences between them.
> __free_pages drops a reference and frees a page when the refcount reaches
> zero. free_hot_cold_page handled pages whose refcount was already zero
> which is non-obvious from the name. free_unref_page should be more obvious.
> 
> No performance impact is expected as the overhead is marginal. The parameter
> is removed simply because it is a bit stupid to have a useless parameter
> copied everywhere.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

A comment below, though.

...
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 167e163cf733..13582efc57a0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2590,7 +2590,7 @@ void mark_free_pages(struct zone *zone)
>  }
>  #endif /* CONFIG_PM */
>  
> -static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
> +static bool free_unref_page_prepare(struct page *page, unsigned long pfn)
>  {
>  	int migratetype;
>  
> @@ -2602,8 +2602,7 @@ static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
>  	return true;
>  }
>  
> -static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
> -				bool cold)
> +static void free_unref_page_commit(struct page *page, unsigned long pfn)
>  {
>  	struct zone *zone = page_zone(page);
>  	struct per_cpu_pages *pcp;
> @@ -2628,10 +2627,7 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
>  	}
>  
>  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
> -	if (!cold)
> -		list_add(&page->lru, &pcp->lists[migratetype]);
> -	else
> -		list_add_tail(&page->lru, &pcp->lists[migratetype]);
> +	list_add_tail(&page->lru, &pcp->lists[migratetype]);

Did you intentionally use the cold version here? Patch 8/8 uses the hot
version in __rmqueue_pcplist() and that makes more sense to me. It
should be either negligible or better, not worse.

>  	pcp->count++;
>  	if (pcp->count >= pcp->high) {
>  		unsigned long batch = READ_ONCE(pcp->batch);

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page*
@ 2017-10-19 13:12     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:12 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> Most callers users of free_hot_cold_page claim the pages being released are
> cache hot. The exception is the page reclaim paths where it is likely that
> enough pages will be freed in the near future that the per-cpu lists are
> going to be recycled and the cache hotness information is lost.

Maybe it would make sense for reclaim to skip pcplists? (out of scope of
this series, of course).

> As no one
> really cares about the hotness of pages being released to the allocator,
> just ditch the parameter.
> 
> The APIs are renamed to indicate that it's no longer about hot/cold pages. It
> should also be less confusing as there are subtle differences between them.
> __free_pages drops a reference and frees a page when the refcount reaches
> zero. free_hot_cold_page handled pages whose refcount was already zero
> which is non-obvious from the name. free_unref_page should be more obvious.
> 
> No performance impact is expected as the overhead is marginal. The parameter
> is removed simply because it is a bit stupid to have a useless parameter
> copied everywhere.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

A comment below, though.

...
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 167e163cf733..13582efc57a0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2590,7 +2590,7 @@ void mark_free_pages(struct zone *zone)
>  }
>  #endif /* CONFIG_PM */
>  
> -static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
> +static bool free_unref_page_prepare(struct page *page, unsigned long pfn)
>  {
>  	int migratetype;
>  
> @@ -2602,8 +2602,7 @@ static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
>  	return true;
>  }
>  
> -static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
> -				bool cold)
> +static void free_unref_page_commit(struct page *page, unsigned long pfn)
>  {
>  	struct zone *zone = page_zone(page);
>  	struct per_cpu_pages *pcp;
> @@ -2628,10 +2627,7 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
>  	}
>  
>  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
> -	if (!cold)
> -		list_add(&page->lru, &pcp->lists[migratetype]);
> -	else
> -		list_add_tail(&page->lru, &pcp->lists[migratetype]);
> +	list_add_tail(&page->lru, &pcp->lists[migratetype]);

Did you intentionally use the cold version here? Patch 8/8 uses the hot
version in __rmqueue_pcplist() and that makes more sense to me. It
should be either negligible or better, not worse.

>  	pcp->count++;
>  	if (pcp->count >= pcp->high) {
>  		unsigned long batch = READ_ONCE(pcp->batch);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 8/8] mm: Remove __GFP_COLD
  2017-10-18  7:59   ` Mel Gorman
@ 2017-10-19 13:18     ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:18 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> As the page free path makes no distinction between cache hot and cold
> pages, there is no real useful ordering of pages in the free list that
> allocation requests can take advantage of. Juding from the users of
> __GFP_COLD, it is likely that a number of them are the result of copying
> other sites instead of actually measuring the impact. Remove the
> __GFP_COLD parameter which simplifies a number of paths in the page
> allocator.
> 
> This is potentially controversial but bear in mind that the size of the
> per-cpu pagelists versus modern cache sizes means that the whole per-cpu
> list can often fit in the L3 cache. Hence, there is only a potential benefit
> for microbenchmarks that alloc/free pages in a tight loop. It's even worse
> when THP is taken into account which has little or no chance of getting a
> cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
> pages will thrash the cache anyway.
> 
> The truncate microbenchmarks are not shown as this patch affects the
> allocation path and not the free path. A page fault microbenchmark was
> tested but it showed no sigificant difference which is not surprising given
> that the __GFP_COLD branches are a miniscule percentage of the fault path.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 8/8] mm: Remove __GFP_COLD
@ 2017-10-19 13:18     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:18 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> As the page free path makes no distinction between cache hot and cold
> pages, there is no real useful ordering of pages in the free list that
> allocation requests can take advantage of. Juding from the users of
> __GFP_COLD, it is likely that a number of them are the result of copying
> other sites instead of actually measuring the impact. Remove the
> __GFP_COLD parameter which simplifies a number of paths in the page
> allocator.
> 
> This is potentially controversial but bear in mind that the size of the
> per-cpu pagelists versus modern cache sizes means that the whole per-cpu
> list can often fit in the L3 cache. Hence, there is only a potential benefit
> for microbenchmarks that alloc/free pages in a tight loop. It's even worse
> when THP is taken into account which has little or no chance of getting a
> cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
> pages will thrash the cache anyway.
> 
> The truncate microbenchmarks are not shown as this patch affects the
> allocation path and not the free path. A page fault microbenchmark was
> tested but it showed no sigificant difference which is not surprising given
> that the __GFP_COLD branches are a miniscule percentage of the fault path.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
  2017-10-19  9:33       ` Mel Gorman
@ 2017-10-19 13:21         ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On 10/19/2017 11:33 AM, Mel Gorman wrote:
> On Thu, Oct 19, 2017 at 11:12:52AM +0200, Vlastimil Babka wrote:
>> On 10/18/2017 09:59 AM, Mel Gorman wrote:
>>> When a pagevec is initialised on the stack, it is generally used multiple
>>> times over a range of pages, looking up entries and then releasing them.
>>> On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
>>> on the grounds the page being released may be on those queues and the
>>> pages may be cache hot. In many cases only the first drain is necessary
>>> as it's unlikely that the range of pages being walked is racing against
>>> LRU addition.  Even if there is such a race, the impact is marginal where
>>> as constantly redraining the lru pagevecs costs.
>>
>> Right, the drain is only to a local cpu, not all of them, so that kind
>> of "racing" shouldn't be even possible.
>>
> 
> Potentially the user of the pagevec can be preempted and another process
> modify the per-cpu pagevecs in parallel. Note even that users of a pagevec
> in a loop may call cond_resched so preemption is not even necessary if
> the pagevec is being used for a large enough number of operations. The
> risk is still marginal.
> 
>>> This patch ensures that pagevec is only drained once in a given lifecycle
>>> without increasing the cache footprint of the pagevec structure. Only
>>
>> Well, strictly speaking it does prevent decreasing the cache footprint
>> by removing the 'cold' field later :)
>>
> 
> Debatable. Even freeing a cold page if it was handled properly still has
> a cache footprint impact because the struct page fields are accessed.
> Maybe that's not what you meant and you are referring to the size of the
> structure itself.

Yes, I meant the size, sorry.

> If so, note that I change the type of the two fields so
> they should fit in the same size as an unsigned long in many cases.

Yeah.

> It's not draining the LRU as such. How about the following patch on top
> of the series? If another full series repost is necessary, I'll fold it
> in.

Looks good, thanks! The series is already in mmots so I expect Andrew
will fold it to the original patch.

> ---8<---
> mm, pagevec: Rename pagevec drained field
> 
> According to Vlastimil Babka, the drained field in pagevec is potentially
> misleading because it might be interpreted as draining this pagevec instead
> of the percpu lru pagevecs. Rename the field for clarity.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  include/linux/pagevec.h | 4 ++--
>  mm/swap.c               | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> index 8dcde51e80ff..ba5dc27ef6bb 100644
> --- a/include/linux/pagevec.h
> +++ b/include/linux/pagevec.h
> @@ -16,7 +16,7 @@ struct address_space;
>  
>  struct pagevec {
>  	unsigned long nr;
> -	bool drained;
> +	bool percpu_pvec_drained;
>  	struct page *pages[PAGEVEC_SIZE];
>  };
>  
> @@ -52,7 +52,7 @@ static inline unsigned pagevec_lookup_tag(struct pagevec *pvec,
>  static inline void pagevec_init(struct pagevec *pvec)
>  {
>  	pvec->nr = 0;
> -	pvec->drained = false;
> +	pvec->percpu_pvec_drained = false;
>  }
>  
>  static inline void pagevec_reinit(struct pagevec *pvec)
> diff --git a/mm/swap.c b/mm/swap.c
> index b480279c760c..38e1b6374a97 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -833,9 +833,9 @@ EXPORT_SYMBOL(release_pages);
>   */
>  void __pagevec_release(struct pagevec *pvec)
>  {
> -	if (!pvec->drained) {
> +	if (!pvec->percpu_pvec_drained) {
>  		lru_add_drain();
> -		pvec->drained = true;
> +		pvec->percpu_pvec_drained = true;
>  	}
>  	release_pages(pvec->pages, pagevec_count(pvec));
>  	pagevec_reinit(pvec);
> 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
@ 2017-10-19 13:21         ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On 10/19/2017 11:33 AM, Mel Gorman wrote:
> On Thu, Oct 19, 2017 at 11:12:52AM +0200, Vlastimil Babka wrote:
>> On 10/18/2017 09:59 AM, Mel Gorman wrote:
>>> When a pagevec is initialised on the stack, it is generally used multiple
>>> times over a range of pages, looking up entries and then releasing them.
>>> On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
>>> on the grounds the page being released may be on those queues and the
>>> pages may be cache hot. In many cases only the first drain is necessary
>>> as it's unlikely that the range of pages being walked is racing against
>>> LRU addition.  Even if there is such a race, the impact is marginal where
>>> as constantly redraining the lru pagevecs costs.
>>
>> Right, the drain is only to a local cpu, not all of them, so that kind
>> of "racing" shouldn't be even possible.
>>
> 
> Potentially the user of the pagevec can be preempted and another process
> modify the per-cpu pagevecs in parallel. Note even that users of a pagevec
> in a loop may call cond_resched so preemption is not even necessary if
> the pagevec is being used for a large enough number of operations. The
> risk is still marginal.
> 
>>> This patch ensures that pagevec is only drained once in a given lifecycle
>>> without increasing the cache footprint of the pagevec structure. Only
>>
>> Well, strictly speaking it does prevent decreasing the cache footprint
>> by removing the 'cold' field later :)
>>
> 
> Debatable. Even freeing a cold page if it was handled properly still has
> a cache footprint impact because the struct page fields are accessed.
> Maybe that's not what you meant and you are referring to the size of the
> structure itself.

Yes, I meant the size, sorry.

> If so, note that I change the type of the two fields so
> they should fit in the same size as an unsigned long in many cases.

Yeah.

> It's not draining the LRU as such. How about the following patch on top
> of the series? If another full series repost is necessary, I'll fold it
> in.

Looks good, thanks! The series is already in mmots so I expect Andrew
will fold it to the original patch.

> ---8<---
> mm, pagevec: Rename pagevec drained field
> 
> According to Vlastimil Babka, the drained field in pagevec is potentially
> misleading because it might be interpreted as draining this pagevec instead
> of the percpu lru pagevecs. Rename the field for clarity.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> ---
>  include/linux/pagevec.h | 4 ++--
>  mm/swap.c               | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
> index 8dcde51e80ff..ba5dc27ef6bb 100644
> --- a/include/linux/pagevec.h
> +++ b/include/linux/pagevec.h
> @@ -16,7 +16,7 @@ struct address_space;
>  
>  struct pagevec {
>  	unsigned long nr;
> -	bool drained;
> +	bool percpu_pvec_drained;
>  	struct page *pages[PAGEVEC_SIZE];
>  };
>  
> @@ -52,7 +52,7 @@ static inline unsigned pagevec_lookup_tag(struct pagevec *pvec,
>  static inline void pagevec_init(struct pagevec *pvec)
>  {
>  	pvec->nr = 0;
> -	pvec->drained = false;
> +	pvec->percpu_pvec_drained = false;
>  }
>  
>  static inline void pagevec_reinit(struct pagevec *pvec)
> diff --git a/mm/swap.c b/mm/swap.c
> index b480279c760c..38e1b6374a97 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -833,9 +833,9 @@ EXPORT_SYMBOL(release_pages);
>   */
>  void __pagevec_release(struct pagevec *pvec)
>  {
> -	if (!pvec->drained) {
> +	if (!pvec->percpu_pvec_drained) {
>  		lru_add_drain();
> -		pvec->drained = true;
> +		pvec->percpu_pvec_drained = true;
>  	}
>  	release_pages(pvec->pages, pagevec_count(pvec));
>  	pagevec_reinit(pvec);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 8/8] mm: Remove __GFP_COLD
  2017-10-18  7:59   ` Mel Gorman
  (?)
@ 2017-10-19 13:42     ` Vlastimil Babka
  -1 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:42 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> As the page free path makes no distinction between cache hot and cold
> pages, there is no real useful ordering of pages in the free list that
> allocation requests can take advantage of. Juding from the users of
> __GFP_COLD, it is likely that a number of them are the result of copying
> other sites instead of actually measuring the impact. Remove the
> __GFP_COLD parameter which simplifies a number of paths in the page
> allocator.
> 
> This is potentially controversial but bear in mind that the size of the
> per-cpu pagelists versus modern cache sizes means that the whole per-cpu
> list can often fit in the L3 cache. Hence, there is only a potential benefit
> for microbenchmarks that alloc/free pages in a tight loop. It's even worse
> when THP is taken into account which has little or no chance of getting a
> cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
> pages will thrash the cache anyway.
> 
> The truncate microbenchmarks are not shown as this patch affects the
> allocation path and not the free path. A page fault microbenchmark was
> tested but it showed no sigificant difference which is not surprising given
> that the __GFP_COLD branches are a miniscule percentage of the fault path.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

I've updated patch https://marc.info/?l=linux-mm&m=150831216224521&w=2 on top
of this. It's a small non-functional change, so it might even be folded.

----8<----
>From b002266c1a826805a50087db851f93e7a87ceb2f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 17 Oct 2017 16:03:02 +0200
Subject: [PATCH] mm, page_alloc: simplify list handling in rmqueue_bulk()

The rmqueue_bulk() function fills an empty pcplist with pages from the free
list. It tries to preserve increasing order by pfn to the caller, because it
leads to better performance with some I/O controllers, as explained in
e084b2d95e48 ("page-allocator: preserve PFN ordering when __GFP_COLD is set").

To preserve the order, it's sufficient to add pages to the tail of the list
as they are retrieved. The current code instead adds to the head of the list,
but then updates the list head pointer to the last added page, in each step.
This does result in the same order, but is needlessly confusing and potentially
wasteful, with no apparent benefit. This patch simplifies the code and adjusts
comment accordingly.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e60d8a37e3f..4e1b3c686cc2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2330,16 +2330,16 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			continue;
 
 		/*
-		 * Split buddy pages returned by expand() are received here
-		 * in physical page order. The page is added to the callers and
-		 * list and the list head then moves forward. From the callers
-		 * perspective, the linked list is ordered by page number in
-		 * some conditions. This is useful for IO devices that can
-		 * merge IO requests if the physical pages are ordered
-		 * properly.
+		 * Split buddy pages returned by expand() are received here in
+		 * physical page order. The page is added to the tail of
+		 * caller's list. From the callers perspective, the linked list
+		 * is ordered by page number under some conditions. This is
+		 * useful for IO devices that can forward direction from the
+		 * head, thus also in the physical page order. This is useful
+		 * for IO devices that can merge IO requests if the physical
+		 * pages are ordered properly.
 		 */
-		list_add(&page->lru, list);
-		list = &page->lru;
+		list_add_tail(&page->lru, list);
 		alloced++;
 		if (is_migrate_cma(get_pcppage_migratetype(page)))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-- 
2.14.2

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 8/8] mm: Remove __GFP_COLD
@ 2017-10-19 13:42     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:42 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> As the page free path makes no distinction between cache hot and cold
> pages, there is no real useful ordering of pages in the free list that
> allocation requests can take advantage of. Juding from the users of
> __GFP_COLD, it is likely that a number of them are the result of copying
> other sites instead of actually measuring the impact. Remove the
> __GFP_COLD parameter which simplifies a number of paths in the page
> allocator.
> 
> This is potentially controversial but bear in mind that the size of the
> per-cpu pagelists versus modern cache sizes means that the whole per-cpu
> list can often fit in the L3 cache. Hence, there is only a potential benefit
> for microbenchmarks that alloc/free pages in a tight loop. It's even worse
> when THP is taken into account which has little or no chance of getting a
> cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
> pages will thrash the cache anyway.
> 
> The truncate microbenchmarks are not shown as this patch affects the
> allocation path and not the free path. A page fault microbenchmark was
> tested but it showed no sigificant difference which is not surprising given
> that the __GFP_COLD branches are a miniscule percentage of the fault path.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

I've updated patch https://marc.info/?l=linux-mm&m=150831216224521&w=2 on top
of this. It's a small non-functional change, so it might even be folded.

----8<----
>From b002266c1a826805a50087db851f93e7a87ceb2f Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Tue, 17 Oct 2017 16:03:02 +0200
Subject: [PATCH] mm, page_alloc: simplify list handling in rmqueue_bulk()

The rmqueue_bulk() function fills an empty pcplist with pages from the free
list. It tries to preserve increasing order by pfn to the caller, because it
leads to better performance with some I/O controllers, as explained in
e084b2d95e48 ("page-allocator: preserve PFN ordering when __GFP_COLD is set").

To preserve the order, it's sufficient to add pages to the tail of the list
as they are retrieved. The current code instead adds to the head of the list,
but then updates the list head pointer to the last added page, in each step.
This does result in the same order, but is needlessly confusing and potentially
wasteful, with no apparent benefit. This patch simplifies the code and adjusts
comment accordingly.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/page_alloc.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e60d8a37e3f..4e1b3c686cc2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2330,16 +2330,16 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			continue;
 
 		/*
-		 * Split buddy pages returned by expand() are received here
-		 * in physical page order. The page is added to the callers and
-		 * list and the list head then moves forward. From the callers
-		 * perspective, the linked list is ordered by page number in
-		 * some conditions. This is useful for IO devices that can
-		 * merge IO requests if the physical pages are ordered
-		 * properly.
+		 * Split buddy pages returned by expand() are received here in
+		 * physical page order. The page is added to the tail of
+		 * caller's list. From the callers perspective, the linked list
+		 * is ordered by page number under some conditions. This is
+		 * useful for IO devices that can forward direction from the
+		 * head, thus also in the physical page order. This is useful
+		 * for IO devices that can merge IO requests if the physical
+		 * pages are ordered properly.
 		 */
-		list_add(&page->lru, list);
-		list = &page->lru;
+		list_add_tail(&page->lru, list);
 		alloced++;
 		if (is_migrate_cma(get_pcppage_migratetype(page)))
 			__mod_zone_page_state(zone, NR_FREE_CMA_PAGES,
-- 
2.14.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 8/8] mm: Remove __GFP_COLD
@ 2017-10-19 13:42     ` Vlastimil Babka
  0 siblings, 0 replies; 47+ messages in thread
From: Vlastimil Babka @ 2017-10-19 13:42 UTC (permalink / raw)
  To: Mel Gorman, Andrew Morton
  Cc: Linux-MM, Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner

On 10/18/2017 09:59 AM, Mel Gorman wrote:
> As the page free path makes no distinction between cache hot and cold
> pages, there is no real useful ordering of pages in the free list that
> allocation requests can take advantage of. Juding from the users of
> __GFP_COLD, it is likely that a number of them are the result of copying
> other sites instead of actually measuring the impact. Remove the
> __GFP_COLD parameter which simplifies a number of paths in the page
> allocator.
> 
> This is potentially controversial but bear in mind that the size of the
> per-cpu pagelists versus modern cache sizes means that the whole per-cpu
> list can often fit in the L3 cache. Hence, there is only a potential benefit
> for microbenchmarks that alloc/free pages in a tight loop. It's even worse
> when THP is taken into account which has little or no chance of getting a
> cache-hot page as the per-cpu list is bypassed and the zeroing of multiple
> pages will thrash the cache anyway.
> 
> The truncate microbenchmarks are not shown as this patch affects the
> allocation path and not the free path. A page fault microbenchmark was
> tested but it showed no sigificant difference which is not surprising given
> that the __GFP_COLD branches are a miniscule percentage of the fault path.
> 
> Signed-off-by: Mel Gorman <mgorman@techsingularity.net>

I've updated patch https://marc.info/?l=linux-mm&m=150831216224521&w=2 on top
of this. It's a small non-functional change, so it might even be folded.

----8<----

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 8/8] mm: Remove __GFP_COLD
  2017-10-19 13:42     ` Vlastimil Babka
@ 2017-10-19 14:32       ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-19 14:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Thu, Oct 19, 2017 at 03:42:12PM +0200, Vlastimil Babka wrote:
> From b002266c1a826805a50087db851f93e7a87ceb2f Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 17 Oct 2017 16:03:02 +0200
> Subject: [PATCH] mm, page_alloc: simplify list handling in rmqueue_bulk()
> 
> The rmqueue_bulk() function fills an empty pcplist with pages from the free
> list. It tries to preserve increasing order by pfn to the caller, because it
> leads to better performance with some I/O controllers, as explained in
> e084b2d95e48 ("page-allocator: preserve PFN ordering when __GFP_COLD is set").
> 
> To preserve the order, it's sufficient to add pages to the tail of the list
> as they are retrieved. The current code instead adds to the head of the list,
> but then updates the list head pointer to the last added page, in each step.
> This does result in the same order, but is needlessly confusing and potentially
> wasteful, with no apparent benefit. This patch simplifies the code and adjusts
> comment accordingly.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 8/8] mm: Remove __GFP_COLD
@ 2017-10-19 14:32       ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-19 14:32 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Thu, Oct 19, 2017 at 03:42:12PM +0200, Vlastimil Babka wrote:
> From b002266c1a826805a50087db851f93e7a87ceb2f Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Tue, 17 Oct 2017 16:03:02 +0200
> Subject: [PATCH] mm, page_alloc: simplify list handling in rmqueue_bulk()
> 
> The rmqueue_bulk() function fills an empty pcplist with pages from the free
> list. It tries to preserve increasing order by pfn to the caller, because it
> leads to better performance with some I/O controllers, as explained in
> e084b2d95e48 ("page-allocator: preserve PFN ordering when __GFP_COLD is set").
> 
> To preserve the order, it's sufficient to add pages to the tail of the list
> as they are retrieved. The current code instead adds to the head of the list,
> but then updates the list head pointer to the last added page, in each step.
> This does result in the same order, but is needlessly confusing and potentially
> wasteful, with no apparent benefit. This patch simplifies the code and adjusts
> comment accordingly.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

Acked-by: Mel Gorman <mgorman@techsingularity.net>

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page*
  2017-10-19 13:12     ` Vlastimil Babka
@ 2017-10-19 15:43       ` Mel Gorman
  -1 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-19 15:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Thu, Oct 19, 2017 at 03:12:33PM +0200, Vlastimil Babka wrote:
> On 10/18/2017 09:59 AM, Mel Gorman wrote:
> > Most callers users of free_hot_cold_page claim the pages being released are
> > cache hot. The exception is the page reclaim paths where it is likely that
> > enough pages will be freed in the near future that the per-cpu lists are
> > going to be recycled and the cache hotness information is lost.
> 
> Maybe it would make sense for reclaim to skip pcplists? (out of scope of
> this series, of course).
> 

Maybe, but it's a bit risky. The PCP lists are preserved but the number of
zone->lock acquire/releases increases as now every 14 pages reclaimed will
be an acquire/release instead of every pcp->high number of pages reclaimed.
That is a definite cost versus a possibility that the next page allocated no
that CPU will still be cache hot. That in itself may not happen as
scanning lots of pages for reclaim may have filled the cache with
useless information anyway.

> > As no one
> > really cares about the hotness of pages being released to the allocator,
> > just ditch the parameter.
> > 
> > The APIs are renamed to indicate that it's no longer about hot/cold pages. It
> > should also be less confusing as there are subtle differences between them.
> > __free_pages drops a reference and frees a page when the refcount reaches
> > zero. free_hot_cold_page handled pages whose refcount was already zero
> > which is non-obvious from the name. free_unref_page should be more obvious.
> > 
> > No performance impact is expected as the overhead is marginal. The parameter
> > is removed simply because it is a bit stupid to have a useless parameter
> > copied everywhere.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> A comment below, though.
> 
> ...
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 167e163cf733..13582efc57a0 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2590,7 +2590,7 @@ void mark_free_pages(struct zone *zone)
> >  }
> >  #endif /* CONFIG_PM */
> >  
> > -static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
> > +static bool free_unref_page_prepare(struct page *page, unsigned long pfn)
> >  {
> >  	int migratetype;
> >  
> > @@ -2602,8 +2602,7 @@ static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
> >  	return true;
> >  }
> >  
> > -static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
> > -				bool cold)
> > +static void free_unref_page_commit(struct page *page, unsigned long pfn)
> >  {
> >  	struct zone *zone = page_zone(page);
> >  	struct per_cpu_pages *pcp;
> > @@ -2628,10 +2627,7 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
> >  	}
> >  
> >  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
> > -	if (!cold)
> > -		list_add(&page->lru, &pcp->lists[migratetype]);
> > -	else
> > -		list_add_tail(&page->lru, &pcp->lists[migratetype]);
> > +	list_add_tail(&page->lru, &pcp->lists[migratetype]);
> 
> Did you intentionally use the cold version here? Patch 8/8 uses the hot
> version in __rmqueue_pcplist() and that makes more sense to me. It
> should be either negligible or better, not worse.
> 

This was unintentional, thanks. The fix is below

---8<---
mm, Remove cold parameter from free_hot_cold_page* -fix

As pointed out by Vlastimil Babka, the pages being freed should be added
to the head, no the tail, of the pcpu list.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13582efc57a0..06461553a115 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2627,7 +2627,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
 	}
 
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
-	list_add_tail(&page->lru, &pcp->lists[migratetype]);
+	list_add(&page->lru, &pcp->lists[migratetype]);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
 		unsigned long batch = READ_ONCE(pcp->batch);

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page*
@ 2017-10-19 15:43       ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-19 15:43 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Linux-MM, Linux-FSDevel, LKML, Jan Kara,
	Andi Kleen, Dave Hansen, Dave Chinner

On Thu, Oct 19, 2017 at 03:12:33PM +0200, Vlastimil Babka wrote:
> On 10/18/2017 09:59 AM, Mel Gorman wrote:
> > Most callers users of free_hot_cold_page claim the pages being released are
> > cache hot. The exception is the page reclaim paths where it is likely that
> > enough pages will be freed in the near future that the per-cpu lists are
> > going to be recycled and the cache hotness information is lost.
> 
> Maybe it would make sense for reclaim to skip pcplists? (out of scope of
> this series, of course).
> 

Maybe, but it's a bit risky. The PCP lists are preserved but the number of
zone->lock acquire/releases increases as now every 14 pages reclaimed will
be an acquire/release instead of every pcp->high number of pages reclaimed.
That is a definite cost versus a possibility that the next page allocated no
that CPU will still be cache hot. That in itself may not happen as
scanning lots of pages for reclaim may have filled the cache with
useless information anyway.

> > As no one
> > really cares about the hotness of pages being released to the allocator,
> > just ditch the parameter.
> > 
> > The APIs are renamed to indicate that it's no longer about hot/cold pages. It
> > should also be less confusing as there are subtle differences between them.
> > __free_pages drops a reference and frees a page when the refcount reaches
> > zero. free_hot_cold_page handled pages whose refcount was already zero
> > which is non-obvious from the name. free_unref_page should be more obvious.
> > 
> > No performance impact is expected as the overhead is marginal. The parameter
> > is removed simply because it is a bit stupid to have a useless parameter
> > copied everywhere.
> > 
> > Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>
> 
> A comment below, though.
> 
> ...
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 167e163cf733..13582efc57a0 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2590,7 +2590,7 @@ void mark_free_pages(struct zone *zone)
> >  }
> >  #endif /* CONFIG_PM */
> >  
> > -static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
> > +static bool free_unref_page_prepare(struct page *page, unsigned long pfn)
> >  {
> >  	int migratetype;
> >  
> > @@ -2602,8 +2602,7 @@ static bool free_hot_cold_page_prepare(struct page *page, unsigned long pfn)
> >  	return true;
> >  }
> >  
> > -static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
> > -				bool cold)
> > +static void free_unref_page_commit(struct page *page, unsigned long pfn)
> >  {
> >  	struct zone *zone = page_zone(page);
> >  	struct per_cpu_pages *pcp;
> > @@ -2628,10 +2627,7 @@ static void free_hot_cold_page_commit(struct page *page, unsigned long pfn,
> >  	}
> >  
> >  	pcp = &this_cpu_ptr(zone->pageset)->pcp;
> > -	if (!cold)
> > -		list_add(&page->lru, &pcp->lists[migratetype]);
> > -	else
> > -		list_add_tail(&page->lru, &pcp->lists[migratetype]);
> > +	list_add_tail(&page->lru, &pcp->lists[migratetype]);
> 
> Did you intentionally use the cold version here? Patch 8/8 uses the hot
> version in __rmqueue_pcplist() and that makes more sense to me. It
> should be either negligible or better, not worse.
> 

This was unintentional, thanks. The fix is below

---8<---
mm, Remove cold parameter from free_hot_cold_page* -fix

As pointed out by Vlastimil Babka, the pages being freed should be added
to the head, no the tail, of the pcpu list.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 13582efc57a0..06461553a115 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2627,7 +2627,7 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn)
 	}
 
 	pcp = &this_cpu_ptr(zone->pageset)->pcp;
-	list_add_tail(&page->lru, &pcp->lists[migratetype]);
+	list_add(&page->lru, &pcp->lists[migratetype]);
 	pcp->count++;
 	if (pcp->count >= pcp->high) {
 		unsigned long batch = READ_ONCE(pcp->batch);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
  2017-10-12  9:30 [PATCH 0/8] Follow-up for speed up page cache truncation Mel Gorman
@ 2017-10-12  9:30   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-12  9:30 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

When a pagevec is initialised on the stack, it is generally used multiple
times over a range of pages, looking up entries and then releasing them.
On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
on the grounds the page being released may be on those queues and the
pages may be cache hot. In many cases only the first drain is necessary
as it's unlikely that the range of pages being walked is racing against
LRU addition.  Even if there is such a race, the impact is marginal where
as constantly redraining the lru pagevecs costs.

This patch ensures that pagevec is only drained once in a given lifecycle
without increasing the cache footprint of the pagevec structure. Only
sparsetruncate tiny is shown here as large files have many exceptional
entries and calls pagecache_release less frequently.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                        batchshadow-v1r1          onedrain-v1r1
Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)

The impact on bonnie is marginal and within the noise because a significant
percentage of the file being truncated has been reclaimed and consists of
shadow entries which reduce the hotness of the pagevec_release path.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/pagevec.h | 4 +++-
 mm/swap.c               | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 4dcd5506f1ed..4231979be982 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,8 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	unsigned long cold;
+	bool cold;
+	bool drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -45,6 +46,7 @@ static inline void pagevec_init(struct pagevec *pvec, int cold)
 {
 	pvec->nr = 0;
 	pvec->cold = cold;
+	pvec->drained = false;
 }
 
 static inline void pagevec_reinit(struct pagevec *pvec)
diff --git a/mm/swap.c b/mm/swap.c
index a77d68f2c1b6..31bd9d8a5db7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -833,7 +833,10 @@ EXPORT_SYMBOL(release_pages);
  */
 void __pagevec_release(struct pagevec *pvec)
 {
-	lru_add_drain();
+	if (!pvec->drained) {
+		lru_add_drain();
+		pvec->drained = true;
+	}
 	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
 	pagevec_reinit(pvec);
 }
-- 
2.14.0

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage
@ 2017-10-12  9:30   ` Mel Gorman
  0 siblings, 0 replies; 47+ messages in thread
From: Mel Gorman @ 2017-10-12  9:30 UTC (permalink / raw)
  To: Linux-MM
  Cc: Linux-FSDevel, LKML, Jan Kara, Andi Kleen, Dave Hansen,
	Dave Chinner, Mel Gorman

When a pagevec is initialised on the stack, it is generally used multiple
times over a range of pages, looking up entries and then releasing them.
On each pagevec_release, the per-cpu deferred LRU pagevecs are drained
on the grounds the page being released may be on those queues and the
pages may be cache hot. In many cases only the first drain is necessary
as it's unlikely that the range of pages being walked is racing against
LRU addition.  Even if there is such a race, the impact is marginal where
as constantly redraining the lru pagevecs costs.

This patch ensures that pagevec is only drained once in a given lifecycle
without increasing the cache footprint of the pagevec structure. Only
sparsetruncate tiny is shown here as large files have many exceptional
entries and calls pagecache_release less frequently.

sparsetruncate (tiny)
                              4.14.0-rc4             4.14.0-rc4
                        batchshadow-v1r1          onedrain-v1r1
Min          Time      141.00 (   0.00%)      141.00 (   0.00%)
1st-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
Max-95%      Time      146.00 (   0.00%)      145.00 (   0.68%)
Max-99%      Time      198.00 (   0.00%)      194.00 (   2.02%)
Max          Time      254.00 (   0.00%)      208.00 (  18.11%)
Amean        Time      145.12 (   0.00%)      144.30 (   0.56%)
Stddev       Time       12.74 (   0.00%)        9.62 (  24.49%)
Coeff        Time        8.78 (   0.00%)        6.67 (  24.06%)
Best99%Amean Time      144.29 (   0.00%)      143.82 (   0.32%)
Best95%Amean Time      142.68 (   0.00%)      142.31 (   0.26%)
Best90%Amean Time      142.52 (   0.00%)      142.19 (   0.24%)
Best75%Amean Time      142.26 (   0.00%)      141.98 (   0.20%)
Best50%Amean Time      141.90 (   0.00%)      141.71 (   0.13%)
Best25%Amean Time      141.80 (   0.00%)      141.43 (   0.26%)

The impact on bonnie is marginal and within the noise because a significant
percentage of the file being truncated has been reclaimed and consists of
shadow entries which reduce the hotness of the pagevec_release path.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
---
 include/linux/pagevec.h | 4 +++-
 mm/swap.c               | 5 ++++-
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/linux/pagevec.h b/include/linux/pagevec.h
index 4dcd5506f1ed..4231979be982 100644
--- a/include/linux/pagevec.h
+++ b/include/linux/pagevec.h
@@ -16,7 +16,8 @@ struct address_space;
 
 struct pagevec {
 	unsigned long nr;
-	unsigned long cold;
+	bool cold;
+	bool drained;
 	struct page *pages[PAGEVEC_SIZE];
 };
 
@@ -45,6 +46,7 @@ static inline void pagevec_init(struct pagevec *pvec, int cold)
 {
 	pvec->nr = 0;
 	pvec->cold = cold;
+	pvec->drained = false;
 }
 
 static inline void pagevec_reinit(struct pagevec *pvec)
diff --git a/mm/swap.c b/mm/swap.c
index a77d68f2c1b6..31bd9d8a5db7 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -833,7 +833,10 @@ EXPORT_SYMBOL(release_pages);
  */
 void __pagevec_release(struct pagevec *pvec)
 {
-	lru_add_drain();
+	if (!pvec->drained) {
+		lru_add_drain();
+		pvec->drained = true;
+	}
 	release_pages(pvec->pages, pagevec_count(pvec), pvec->cold);
 	pagevec_reinit(pvec);
 }
-- 
2.14.0

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2017-10-19 15:43 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-18  7:59 [PATCH 0/8] Follow-up for speed up page cache truncation v2 Mel Gorman
2017-10-18  7:59 ` Mel Gorman
2017-10-18  7:59 ` [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-18  9:02   ` Vlastimil Babka
2017-10-18  9:02     ` Vlastimil Babka
2017-10-18 10:15     ` Mel Gorman
2017-10-18 10:15       ` Mel Gorman
2017-10-18  7:59 ` [PATCH 2/8] mm, truncate: Do not check mapping for every page being truncated Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-19  8:11   ` Jan Kara
2017-10-19  8:11     ` Jan Kara
2017-10-18  7:59 ` [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-18  7:59 ` [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-19  9:12   ` Vlastimil Babka
2017-10-19  9:12     ` Vlastimil Babka
2017-10-19  9:33     ` Mel Gorman
2017-10-19  9:33       ` Mel Gorman
2017-10-19 13:21       ` Vlastimil Babka
2017-10-19 13:21         ` Vlastimil Babka
2017-10-18  7:59 ` [PATCH 5/8] mm, pagevec: Remove cold parameter for pagevecs Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-19  9:24   ` Vlastimil Babka
2017-10-19  9:24     ` Vlastimil Babka
2017-10-18  7:59 ` [PATCH 6/8] mm: Remove cold parameter for release_pages Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-19  9:26   ` Vlastimil Babka
2017-10-19  9:26     ` Vlastimil Babka
2017-10-18  7:59 ` [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page* Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-19 13:12   ` Vlastimil Babka
2017-10-19 13:12     ` Vlastimil Babka
2017-10-19 15:43     ` Mel Gorman
2017-10-19 15:43       ` Mel Gorman
2017-10-18  7:59 ` [PATCH 8/8] mm: Remove __GFP_COLD Mel Gorman
2017-10-18  7:59   ` Mel Gorman
2017-10-19 13:18   ` Vlastimil Babka
2017-10-19 13:18     ` Vlastimil Babka
2017-10-19 13:42   ` Vlastimil Babka
2017-10-19 13:42     ` Vlastimil Babka
2017-10-19 13:42     ` Vlastimil Babka
2017-10-19 14:32     ` Mel Gorman
2017-10-19 14:32       ` Mel Gorman
  -- strict thread matches above, loose matches on Subject: below --
2017-10-12  9:30 [PATCH 0/8] Follow-up for speed up page cache truncation Mel Gorman
2017-10-12  9:30 ` [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage Mel Gorman
2017-10-12  9:30   ` Mel Gorman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.