All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-01-30  5:51 ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

Hi,

We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
solving the issues, jemalloc can't use the MADV_FREE feature.
- Doesn't support system without swap enabled. Because if swap is off, we can't
  or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
  with other anonymous pages, we can't reclaim MADV_FREE pages. In current
  implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
  But in our environment, a lot of machines don't enable swap. This will prevent
  our setup using MADV_FREE.
- Increases memory pressure. page reclaim bias file pages reclaim against
  anonymous pages. This doesn't make sense for MADV_FREE pages, because those
  pages could be freed easily and refilled with very slight penality. Even page
  reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
  pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
  page, we probably must scan a lot of other anonymous pages, which is
  inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
  without it.
- RSS accounting. MADV_FREE pages are accounted as normal anon pages and
  reclaimed lazily, so application's RSS becomes bigger. This confuses our
  workloads. We have monitoring daemon running and if it finds applications' RSS
  becomes abnormal, the daemon will kill the applications even kernel can reclaim
  the memory easily. Currently we don't export separate RSS accounting for
  MADV_FREE pages. This will prevent our setup using MADV_FREE too.

For the first two issues, introducing a new LRU list for MADV_FREE pages could
solve the issues. We can directly reclaim MADV_FREE pages without writting them
out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
the new list, page reclaim can easily reclaim such pages without interference
of file or anonymous pages. The memory pressure issue will disappear.

Actually Minchan posted patches to add the LRU list before, but he didn't
pursue. So I picked up them and the patches are based on Minchan's previous
patches. The main difference between my patches and Minchan previous patches is
page reclaim policy. Minchan's patches introduces a knob to balance the reclaim
of MADV_FREE pages and anon/file pages, while the patches always reclaim
MADV_FREE pages first if there are. I described the reason in patch 5.

For the third issue, we can add a separate RSS count for MADV_FREE pages. The
count will be increased in madvise syscall and decreased in page reclaim (eg,
unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
active page there. But there isn't mm_struct context at that place. Iterating
vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
somebody can share a hint how to fix this issue.

Thanks,
Shaohua

Minchan previous patches:
http://marc.info/?l=linux-mm&m=144800657002763&w=2

Shaohua Li (6):
  mm: add wrap for page accouting index
  mm: add lazyfree page flag
  mm: add LRU_LAZYFREE lru list
  mm: move MADV_FREE pages into LRU_LAZYFREE list
  mm: reclaim lazyfree pages
  mm: enable MADV_FREE for swapless system

 drivers/base/node.c                       |  2 +
 drivers/staging/android/lowmemorykiller.c |  3 +-
 fs/proc/meminfo.c                         |  1 +
 fs/proc/task_mmu.c                        |  8 ++-
 include/linux/mm_inline.h                 | 41 +++++++++++++
 include/linux/mmzone.h                    |  9 +++
 include/linux/page-flags.h                |  6 ++
 include/linux/swap.h                      |  2 +-
 include/linux/vm_event_item.h             |  2 +-
 include/trace/events/mmflags.h            |  1 +
 include/trace/events/vmscan.h             | 31 +++++-----
 kernel/power/snapshot.c                   |  1 +
 mm/compaction.c                           | 11 ++--
 mm/huge_memory.c                          |  6 +-
 mm/khugepaged.c                           |  6 +-
 mm/madvise.c                              | 11 +---
 mm/memcontrol.c                           |  4 ++
 mm/memory-failure.c                       |  3 +-
 mm/memory_hotplug.c                       |  3 +-
 mm/mempolicy.c                            |  3 +-
 mm/migrate.c                              | 29 ++++------
 mm/page_alloc.c                           | 10 ++++
 mm/rmap.c                                 |  7 ++-
 mm/swap.c                                 | 51 +++++++++-------
 mm/vmscan.c                               | 96 +++++++++++++++++++++++--------
 mm/vmstat.c                               |  4 ++
 26 files changed, 242 insertions(+), 109 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-01-30  5:51 ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

Hi,

We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
solving the issues, jemalloc can't use the MADV_FREE feature.
- Doesn't support system without swap enabled. Because if swap is off, we can't
  or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
  with other anonymous pages, we can't reclaim MADV_FREE pages. In current
  implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
  But in our environment, a lot of machines don't enable swap. This will prevent
  our setup using MADV_FREE.
- Increases memory pressure. page reclaim bias file pages reclaim against
  anonymous pages. This doesn't make sense for MADV_FREE pages, because those
  pages could be freed easily and refilled with very slight penality. Even page
  reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
  pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
  page, we probably must scan a lot of other anonymous pages, which is
  inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
  without it.
- RSS accounting. MADV_FREE pages are accounted as normal anon pages and
  reclaimed lazily, so application's RSS becomes bigger. This confuses our
  workloads. We have monitoring daemon running and if it finds applications' RSS
  becomes abnormal, the daemon will kill the applications even kernel can reclaim
  the memory easily. Currently we don't export separate RSS accounting for
  MADV_FREE pages. This will prevent our setup using MADV_FREE too.

For the first two issues, introducing a new LRU list for MADV_FREE pages could
solve the issues. We can directly reclaim MADV_FREE pages without writting them
out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
the new list, page reclaim can easily reclaim such pages without interference
of file or anonymous pages. The memory pressure issue will disappear.

Actually Minchan posted patches to add the LRU list before, but he didn't
pursue. So I picked up them and the patches are based on Minchan's previous
patches. The main difference between my patches and Minchan previous patches is
page reclaim policy. Minchan's patches introduces a knob to balance the reclaim
of MADV_FREE pages and anon/file pages, while the patches always reclaim
MADV_FREE pages first if there are. I described the reason in patch 5.

For the third issue, we can add a separate RSS count for MADV_FREE pages. The
count will be increased in madvise syscall and decreased in page reclaim (eg,
unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
active page there. But there isn't mm_struct context at that place. Iterating
vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
somebody can share a hint how to fix this issue.

Thanks,
Shaohua

Minchan previous patches:
http://marc.info/?l=linux-mm&m=144800657002763&w=2

Shaohua Li (6):
  mm: add wrap for page accouting index
  mm: add lazyfree page flag
  mm: add LRU_LAZYFREE lru list
  mm: move MADV_FREE pages into LRU_LAZYFREE list
  mm: reclaim lazyfree pages
  mm: enable MADV_FREE for swapless system

 drivers/base/node.c                       |  2 +
 drivers/staging/android/lowmemorykiller.c |  3 +-
 fs/proc/meminfo.c                         |  1 +
 fs/proc/task_mmu.c                        |  8 ++-
 include/linux/mm_inline.h                 | 41 +++++++++++++
 include/linux/mmzone.h                    |  9 +++
 include/linux/page-flags.h                |  6 ++
 include/linux/swap.h                      |  2 +-
 include/linux/vm_event_item.h             |  2 +-
 include/trace/events/mmflags.h            |  1 +
 include/trace/events/vmscan.h             | 31 +++++-----
 kernel/power/snapshot.c                   |  1 +
 mm/compaction.c                           | 11 ++--
 mm/huge_memory.c                          |  6 +-
 mm/khugepaged.c                           |  6 +-
 mm/madvise.c                              | 11 +---
 mm/memcontrol.c                           |  4 ++
 mm/memory-failure.c                       |  3 +-
 mm/memory_hotplug.c                       |  3 +-
 mm/mempolicy.c                            |  3 +-
 mm/migrate.c                              | 29 ++++------
 mm/page_alloc.c                           | 10 ++++
 mm/rmap.c                                 |  7 ++-
 mm/swap.c                                 | 51 +++++++++-------
 mm/vmscan.c                               | 96 +++++++++++++++++++++++--------
 mm/vmstat.c                               |  4 ++
 26 files changed, 242 insertions(+), 109 deletions(-)

-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC 1/6] mm: add wrap for page accouting index
  2017-01-30  5:51 ` Shaohua Li
@ 2017-01-30  5:51   ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

We calculate page/lru accouting index with checking if the page/lru is
file. This will be a problem when we introduce a new LRU list. So add a
wrap for the calculation.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/mm_inline.h     | 26 ++++++++++++++++++++++++++
 include/trace/events/vmscan.h | 23 ++++++++++++-----------
 mm/compaction.c               |  3 +--
 mm/khugepaged.c               |  6 ++----
 mm/memory-failure.c           |  3 +--
 mm/memory_hotplug.c           |  3 +--
 mm/mempolicy.c                |  3 +--
 mm/migrate.c                  | 27 +++++++++------------------
 mm/vmscan.c                   | 19 ++++++++++---------
 9 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index e030a68..0dddc2c 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -124,6 +124,32 @@ static __always_inline enum lru_list page_lru(struct page *page)
 	return lru;
 }
 
+/*
+ * lru_isolate_index - which item should a lru be accounted for
+ * @lru: the lru list
+ *
+ * Returns the accounting item index of the lru
+ */
+static inline int lru_isolate_index(enum lru_list lru)
+{
+	if (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE)
+		return NR_ISOLATED_FILE;
+	return NR_ISOLATED_ANON;
+}
+
+/*
+ * page_isolate_index - which item should a page be accounted for
+ * @page: the page to test
+ *
+ * Returns the accounting item index of the page
+ */
+static inline int page_isolate_index(struct page *page)
+{
+	if (!PageSwapBacked(page))
+		return NR_ISOLATED_FILE;
+	return NR_ISOLATED_ANON;
+}
+
 #define lru_to_page(head) (list_entry((head)->prev, struct page, lru))
 
 #endif
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 27e8a5c..fab386d 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -31,9 +31,10 @@
 	(RECLAIM_WB_ASYNC) \
 	)
 
-#define trace_shrink_flags(file) \
+#define trace_shrink_flags(isolate_index) \
 	( \
-		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
+		(isolate_index == NR_ISOLATED_FILE ? RECLAIM_WB_FILE : \
+			RECLAIM_WB_ANON) | \
 		(RECLAIM_WB_ASYNC) \
 	)
 
@@ -345,11 +346,11 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 		unsigned long nr_congested, unsigned long nr_immediate,
 		unsigned long nr_activate, unsigned long nr_ref_keep,
 		unsigned long nr_unmap_fail,
-		int priority, int file),
+		int priority, int isolate_index),
 
 	TP_ARGS(nid, nr_scanned, nr_reclaimed, nr_dirty, nr_writeback,
 		nr_congested, nr_immediate, nr_activate, nr_ref_keep,
-		nr_unmap_fail, priority, file),
+		nr_unmap_fail, priority, isolate_index),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
@@ -378,7 +379,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 		__entry->nr_ref_keep = nr_ref_keep;
 		__entry->nr_unmap_fail = nr_unmap_fail;
 		__entry->priority = priority;
-		__entry->reclaim_flags = trace_shrink_flags(file);
+		__entry->reclaim_flags = trace_shrink_flags(isolate_index);
 	),
 
 	TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate=%ld nr_ref_keep=%ld nr_unmap_fail=%ld priority=%d flags=%s",
@@ -395,9 +396,9 @@ TRACE_EVENT(mm_vmscan_lru_shrink_active,
 
 	TP_PROTO(int nid, unsigned long nr_taken,
 		unsigned long nr_active, unsigned long nr_deactivated,
-		unsigned long nr_referenced, int priority, int file),
+		unsigned long nr_referenced, int priority, int isolate_index),
 
-	TP_ARGS(nid, nr_taken, nr_active, nr_deactivated, nr_referenced, priority, file),
+	TP_ARGS(nid, nr_taken, nr_active, nr_deactivated, nr_referenced, priority, isolate_index),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
@@ -416,7 +417,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_active,
 		__entry->nr_deactivated = nr_deactivated;
 		__entry->nr_referenced = nr_referenced;
 		__entry->priority = priority;
-		__entry->reclaim_flags = trace_shrink_flags(file);
+		__entry->reclaim_flags = trace_shrink_flags(isolate_index);
 	),
 
 	TP_printk("nid=%d nr_taken=%ld nr_active=%ld nr_deactivated=%ld nr_referenced=%ld priority=%d flags=%s",
@@ -432,9 +433,9 @@ TRACE_EVENT(mm_vmscan_inactive_list_is_low,
 	TP_PROTO(int nid, int reclaim_idx,
 		unsigned long total_inactive, unsigned long inactive,
 		unsigned long total_active, unsigned long active,
-		unsigned long ratio, int file),
+		unsigned long ratio, int isolate_index),
 
-	TP_ARGS(nid, reclaim_idx, total_inactive, inactive, total_active, active, ratio, file),
+	TP_ARGS(nid, reclaim_idx, total_inactive, inactive, total_active, active, ratio, isolate_index),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
@@ -455,7 +456,7 @@ TRACE_EVENT(mm_vmscan_inactive_list_is_low,
 		__entry->total_active = total_active;
 		__entry->active = active;
 		__entry->ratio = ratio;
-		__entry->reclaim_flags = trace_shrink_flags(file) & RECLAIM_WB_LRU;
+		__entry->reclaim_flags = trace_shrink_flags(isolate_index) & RECLAIM_WB_LRU;
 	),
 
 	TP_printk("nid=%d reclaim_idx=%d total_inactive=%ld inactive=%ld total_active=%ld active=%ld ratio=%ld flags=%s",
diff --git a/mm/compaction.c b/mm/compaction.c
index 0aa2757..3918c48 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -857,8 +857,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/* Successfully isolated */
 		del_page_from_lru_list(page, lruvec, page_lru(page));
-		inc_node_page_state(page,
-				NR_ISOLATED_ANON + page_is_file_cache(page));
+		inc_node_page_state(page, page_isolate_index(page));
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 34bce5c..fd43a0a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -481,8 +481,7 @@ void __khugepaged_exit(struct mm_struct *mm)
 
 static void release_pte_page(struct page *page)
 {
-	/* 0 stands for page_is_file_cache(page) == false */
-	dec_node_page_state(page, NR_ISOLATED_ANON + 0);
+	dec_node_page_state(page, page_isolate_index(page));
 	unlock_page(page);
 	putback_lru_page(page);
 }
@@ -577,8 +576,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			result = SCAN_DEL_PAGE_LRU;
 			goto out;
 		}
-		/* 0 stands for page_is_file_cache(page) == false */
-		inc_node_page_state(page, NR_ISOLATED_ANON + 0);
+		inc_node_page_state(page, page_isolate_index(page));
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3f3cfd4..695ecb72 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1667,8 +1667,7 @@ static int __soft_offline_page(struct page *page, int flags)
 		 * cannot have PAGE_MAPPING_MOVABLE.
 		 */
 		if (!__PageMovable(page))
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-						page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3e3db7a..e2115c8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1620,8 +1620,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			put_page(page);
 			list_add_tail(&page->lru, &source);
 			move_pages--;
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 
 		} else {
 #ifdef CONFIG_DEBUG_VM
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1e7873e..c894925 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -964,8 +964,7 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
 	if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(page) == 1) {
 		if (!isolate_lru_page(page)) {
 			list_add_tail(&page->lru, pagelist);
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 		}
 	}
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index 87f4d0f..502ebea 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -184,8 +184,7 @@ void putback_movable_pages(struct list_head *l)
 			put_page(page);
 		} else {
 			putback_lru_page(page);
-			dec_node_page_state(page, NR_ISOLATED_ANON +
-					page_is_file_cache(page));
+			dec_node_page_state(page, page_isolate_index(page));
 		}
 	}
 }
@@ -1130,8 +1129,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		 * as __PageMovable
 		 */
 		if (likely(!__PageMovable(page)))
-			dec_node_page_state(page, NR_ISOLATED_ANON +
-					page_is_file_cache(page));
+			dec_node_page_state(page, page_isolate_index(page));
 	}
 
 	/*
@@ -1471,8 +1469,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 		err = isolate_lru_page(page);
 		if (!err) {
 			list_add_tail(&page->lru, &pagelist);
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 		}
 put_and_set:
 		/*
@@ -1816,8 +1813,6 @@ static bool numamigrate_update_ratelimit(pg_data_t *pgdat,
 
 static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 {
-	int page_lru;
-
 	VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page);
 
 	/* Avoid migrating to a node that is nearly full */
@@ -1839,8 +1834,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 		return 0;
 	}
 
-	page_lru = page_is_file_cache(page);
-	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru,
+	mod_node_page_state(page_pgdat(page), page_isolate_index(page),
 				hpage_nr_pages(page));
 
 	/*
@@ -1898,8 +1892,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 	if (nr_remaining) {
 		if (!list_empty(&migratepages)) {
 			list_del(&page->lru);
-			dec_node_page_state(page, NR_ISOLATED_ANON +
-					page_is_file_cache(page));
+			dec_node_page_state(page, page_isolate_index(page));
 			putback_lru_page(page);
 		}
 		isolated = 0;
@@ -1929,7 +1922,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	pg_data_t *pgdat = NODE_DATA(node);
 	int isolated = 0;
 	struct page *new_page = NULL;
-	int page_lru = page_is_file_cache(page);
+	int isolate_index = page_isolate_index(page);
 	unsigned long mmun_start = address & HPAGE_PMD_MASK;
 	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
 	pmd_t orig_entry;
@@ -1991,8 +1984,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		/* Retake the callers reference and putback on LRU */
 		get_page(page);
 		putback_lru_page(page);
-		mod_node_page_state(page_pgdat(page),
-			 NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR);
+		mod_node_page_state(page_pgdat(page), isolate_index,
+			-HPAGE_PMD_NR);
 
 		goto out_unlock;
 	}
@@ -2042,9 +2035,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR);
 	count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR);
 
-	mod_node_page_state(page_pgdat(page),
-			NR_ISOLATED_ANON + page_lru,
-			-HPAGE_PMD_NR);
+	mod_node_page_state(page_pgdat(page), isolate_index, -HPAGE_PMD_NR);
 	return isolated;
 
 out_fail:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 947ab6f..abb64b7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1736,7 +1736,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list,
 				     &nr_scanned, sc, isolate_mode, lru);
 
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), nr_taken);
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	if (global_reclaim(sc)) {
@@ -1765,7 +1765,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 
 	putback_inactive_pages(lruvec, &page_list);
 
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), -nr_taken);
 
 	spin_unlock_irq(&pgdat->lru_lock);
 
@@ -1843,7 +1843,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 			stat.nr_congested, stat.nr_immediate,
 			stat.nr_activate, stat.nr_ref_keep,
 			stat.nr_unmap_fail,
-			sc->priority, file);
+			sc->priority, lru_isolate_index(lru));
 	return nr_reclaimed;
 }
 
@@ -1940,7 +1940,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold,
 				     &nr_scanned, sc, isolate_mode, lru);
 
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), nr_taken);
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	if (global_reclaim(sc))
@@ -2003,13 +2003,13 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	nr_activate = move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
 	nr_deactivate = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), -nr_taken);
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&l_hold);
 	free_hot_cold_page_list(&l_hold, true);
 	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate,
-			nr_deactivate, nr_rotated, sc->priority, file);
+		nr_deactivate, nr_rotated, sc->priority, lru_isolate_index(lru));
 }
 
 /*
@@ -2038,11 +2038,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
  *    1TB     101        10GB
  *   10TB     320        32GB
  */
-static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
+static bool inactive_list_is_low(struct lruvec *lruvec, enum lru_list lru,
 						struct scan_control *sc, bool trace)
 {
 	unsigned long inactive_ratio;
 	unsigned long inactive, active;
+	bool file = is_file_lru(lru);
 	enum lru_list inactive_lru = file * LRU_FILE;
 	enum lru_list active_lru = file * LRU_FILE + LRU_ACTIVE;
 	unsigned long gb;
@@ -2068,7 +2069,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 				sc->reclaim_idx,
 				lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
 				lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,
-				inactive_ratio, file);
+				inactive_ratio, lru_isolate_index(lru));
 
 	return inactive * inactive_ratio < active;
 }
@@ -2077,7 +2078,7 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 				 struct lruvec *lruvec, struct scan_control *sc)
 {
 	if (is_active_lru(lru)) {
-		if (inactive_list_is_low(lruvec, is_file_lru(lru), sc, true))
+		if (inactive_list_is_low(lruvec, lru, sc, true))
 			shrink_active_list(nr_to_scan, lruvec, sc, lru);
 		return 0;
 	}
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 1/6] mm: add wrap for page accouting index
@ 2017-01-30  5:51   ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

We calculate page/lru accouting index with checking if the page/lru is
file. This will be a problem when we introduce a new LRU list. So add a
wrap for the calculation.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/mm_inline.h     | 26 ++++++++++++++++++++++++++
 include/trace/events/vmscan.h | 23 ++++++++++++-----------
 mm/compaction.c               |  3 +--
 mm/khugepaged.c               |  6 ++----
 mm/memory-failure.c           |  3 +--
 mm/memory_hotplug.c           |  3 +--
 mm/mempolicy.c                |  3 +--
 mm/migrate.c                  | 27 +++++++++------------------
 mm/vmscan.c                   | 19 ++++++++++---------
 9 files changed, 63 insertions(+), 50 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index e030a68..0dddc2c 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -124,6 +124,32 @@ static __always_inline enum lru_list page_lru(struct page *page)
 	return lru;
 }
 
+/*
+ * lru_isolate_index - which item should a lru be accounted for
+ * @lru: the lru list
+ *
+ * Returns the accounting item index of the lru
+ */
+static inline int lru_isolate_index(enum lru_list lru)
+{
+	if (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE)
+		return NR_ISOLATED_FILE;
+	return NR_ISOLATED_ANON;
+}
+
+/*
+ * page_isolate_index - which item should a page be accounted for
+ * @page: the page to test
+ *
+ * Returns the accounting item index of the page
+ */
+static inline int page_isolate_index(struct page *page)
+{
+	if (!PageSwapBacked(page))
+		return NR_ISOLATED_FILE;
+	return NR_ISOLATED_ANON;
+}
+
 #define lru_to_page(head) (list_entry((head)->prev, struct page, lru))
 
 #endif
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index 27e8a5c..fab386d 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -31,9 +31,10 @@
 	(RECLAIM_WB_ASYNC) \
 	)
 
-#define trace_shrink_flags(file) \
+#define trace_shrink_flags(isolate_index) \
 	( \
-		(file ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
+		(isolate_index == NR_ISOLATED_FILE ? RECLAIM_WB_FILE : \
+			RECLAIM_WB_ANON) | \
 		(RECLAIM_WB_ASYNC) \
 	)
 
@@ -345,11 +346,11 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 		unsigned long nr_congested, unsigned long nr_immediate,
 		unsigned long nr_activate, unsigned long nr_ref_keep,
 		unsigned long nr_unmap_fail,
-		int priority, int file),
+		int priority, int isolate_index),
 
 	TP_ARGS(nid, nr_scanned, nr_reclaimed, nr_dirty, nr_writeback,
 		nr_congested, nr_immediate, nr_activate, nr_ref_keep,
-		nr_unmap_fail, priority, file),
+		nr_unmap_fail, priority, isolate_index),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
@@ -378,7 +379,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive,
 		__entry->nr_ref_keep = nr_ref_keep;
 		__entry->nr_unmap_fail = nr_unmap_fail;
 		__entry->priority = priority;
-		__entry->reclaim_flags = trace_shrink_flags(file);
+		__entry->reclaim_flags = trace_shrink_flags(isolate_index);
 	),
 
 	TP_printk("nid=%d nr_scanned=%ld nr_reclaimed=%ld nr_dirty=%ld nr_writeback=%ld nr_congested=%ld nr_immediate=%ld nr_activate=%ld nr_ref_keep=%ld nr_unmap_fail=%ld priority=%d flags=%s",
@@ -395,9 +396,9 @@ TRACE_EVENT(mm_vmscan_lru_shrink_active,
 
 	TP_PROTO(int nid, unsigned long nr_taken,
 		unsigned long nr_active, unsigned long nr_deactivated,
-		unsigned long nr_referenced, int priority, int file),
+		unsigned long nr_referenced, int priority, int isolate_index),
 
-	TP_ARGS(nid, nr_taken, nr_active, nr_deactivated, nr_referenced, priority, file),
+	TP_ARGS(nid, nr_taken, nr_active, nr_deactivated, nr_referenced, priority, isolate_index),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
@@ -416,7 +417,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_active,
 		__entry->nr_deactivated = nr_deactivated;
 		__entry->nr_referenced = nr_referenced;
 		__entry->priority = priority;
-		__entry->reclaim_flags = trace_shrink_flags(file);
+		__entry->reclaim_flags = trace_shrink_flags(isolate_index);
 	),
 
 	TP_printk("nid=%d nr_taken=%ld nr_active=%ld nr_deactivated=%ld nr_referenced=%ld priority=%d flags=%s",
@@ -432,9 +433,9 @@ TRACE_EVENT(mm_vmscan_inactive_list_is_low,
 	TP_PROTO(int nid, int reclaim_idx,
 		unsigned long total_inactive, unsigned long inactive,
 		unsigned long total_active, unsigned long active,
-		unsigned long ratio, int file),
+		unsigned long ratio, int isolate_index),
 
-	TP_ARGS(nid, reclaim_idx, total_inactive, inactive, total_active, active, ratio, file),
+	TP_ARGS(nid, reclaim_idx, total_inactive, inactive, total_active, active, ratio, isolate_index),
 
 	TP_STRUCT__entry(
 		__field(int, nid)
@@ -455,7 +456,7 @@ TRACE_EVENT(mm_vmscan_inactive_list_is_low,
 		__entry->total_active = total_active;
 		__entry->active = active;
 		__entry->ratio = ratio;
-		__entry->reclaim_flags = trace_shrink_flags(file) & RECLAIM_WB_LRU;
+		__entry->reclaim_flags = trace_shrink_flags(isolate_index) & RECLAIM_WB_LRU;
 	),
 
 	TP_printk("nid=%d reclaim_idx=%d total_inactive=%ld inactive=%ld total_active=%ld active=%ld ratio=%ld flags=%s",
diff --git a/mm/compaction.c b/mm/compaction.c
index 0aa2757..3918c48 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -857,8 +857,7 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 
 		/* Successfully isolated */
 		del_page_from_lru_list(page, lruvec, page_lru(page));
-		inc_node_page_state(page,
-				NR_ISOLATED_ANON + page_is_file_cache(page));
+		inc_node_page_state(page, page_isolate_index(page));
 
 isolate_success:
 		list_add(&page->lru, &cc->migratepages);
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 34bce5c..fd43a0a 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -481,8 +481,7 @@ void __khugepaged_exit(struct mm_struct *mm)
 
 static void release_pte_page(struct page *page)
 {
-	/* 0 stands for page_is_file_cache(page) == false */
-	dec_node_page_state(page, NR_ISOLATED_ANON + 0);
+	dec_node_page_state(page, page_isolate_index(page));
 	unlock_page(page);
 	putback_lru_page(page);
 }
@@ -577,8 +576,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			result = SCAN_DEL_PAGE_LRU;
 			goto out;
 		}
-		/* 0 stands for page_is_file_cache(page) == false */
-		inc_node_page_state(page, NR_ISOLATED_ANON + 0);
+		inc_node_page_state(page, page_isolate_index(page));
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3f3cfd4..695ecb72 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -1667,8 +1667,7 @@ static int __soft_offline_page(struct page *page, int flags)
 		 * cannot have PAGE_MAPPING_MOVABLE.
 		 */
 		if (!__PageMovable(page))
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-						page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 		list_add(&page->lru, &pagelist);
 		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
 					MIGRATE_SYNC, MR_MEMORY_FAILURE);
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 3e3db7a..e2115c8 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1620,8 +1620,7 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			put_page(page);
 			list_add_tail(&page->lru, &source);
 			move_pages--;
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 
 		} else {
 #ifdef CONFIG_DEBUG_VM
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1e7873e..c894925 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -964,8 +964,7 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist,
 	if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(page) == 1) {
 		if (!isolate_lru_page(page)) {
 			list_add_tail(&page->lru, pagelist);
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 		}
 	}
 }
diff --git a/mm/migrate.c b/mm/migrate.c
index 87f4d0f..502ebea 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -184,8 +184,7 @@ void putback_movable_pages(struct list_head *l)
 			put_page(page);
 		} else {
 			putback_lru_page(page);
-			dec_node_page_state(page, NR_ISOLATED_ANON +
-					page_is_file_cache(page));
+			dec_node_page_state(page, page_isolate_index(page));
 		}
 	}
 }
@@ -1130,8 +1129,7 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page,
 		 * as __PageMovable
 		 */
 		if (likely(!__PageMovable(page)))
-			dec_node_page_state(page, NR_ISOLATED_ANON +
-					page_is_file_cache(page));
+			dec_node_page_state(page, page_isolate_index(page));
 	}
 
 	/*
@@ -1471,8 +1469,7 @@ static int do_move_page_to_node_array(struct mm_struct *mm,
 		err = isolate_lru_page(page);
 		if (!err) {
 			list_add_tail(&page->lru, &pagelist);
-			inc_node_page_state(page, NR_ISOLATED_ANON +
-					    page_is_file_cache(page));
+			inc_node_page_state(page, page_isolate_index(page));
 		}
 put_and_set:
 		/*
@@ -1816,8 +1813,6 @@ static bool numamigrate_update_ratelimit(pg_data_t *pgdat,
 
 static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 {
-	int page_lru;
-
 	VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page);
 
 	/* Avoid migrating to a node that is nearly full */
@@ -1839,8 +1834,7 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page)
 		return 0;
 	}
 
-	page_lru = page_is_file_cache(page);
-	mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru,
+	mod_node_page_state(page_pgdat(page), page_isolate_index(page),
 				hpage_nr_pages(page));
 
 	/*
@@ -1898,8 +1892,7 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma,
 	if (nr_remaining) {
 		if (!list_empty(&migratepages)) {
 			list_del(&page->lru);
-			dec_node_page_state(page, NR_ISOLATED_ANON +
-					page_is_file_cache(page));
+			dec_node_page_state(page, page_isolate_index(page));
 			putback_lru_page(page);
 		}
 		isolated = 0;
@@ -1929,7 +1922,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	pg_data_t *pgdat = NODE_DATA(node);
 	int isolated = 0;
 	struct page *new_page = NULL;
-	int page_lru = page_is_file_cache(page);
+	int isolate_index = page_isolate_index(page);
 	unsigned long mmun_start = address & HPAGE_PMD_MASK;
 	unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE;
 	pmd_t orig_entry;
@@ -1991,8 +1984,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 		/* Retake the callers reference and putback on LRU */
 		get_page(page);
 		putback_lru_page(page);
-		mod_node_page_state(page_pgdat(page),
-			 NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR);
+		mod_node_page_state(page_pgdat(page), isolate_index,
+			-HPAGE_PMD_NR);
 
 		goto out_unlock;
 	}
@@ -2042,9 +2035,7 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 	count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR);
 	count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR);
 
-	mod_node_page_state(page_pgdat(page),
-			NR_ISOLATED_ANON + page_lru,
-			-HPAGE_PMD_NR);
+	mod_node_page_state(page_pgdat(page), isolate_index, -HPAGE_PMD_NR);
 	return isolated;
 
 out_fail:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 947ab6f..abb64b7 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1736,7 +1736,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list,
 				     &nr_scanned, sc, isolate_mode, lru);
 
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), nr_taken);
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	if (global_reclaim(sc)) {
@@ -1765,7 +1765,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 
 	putback_inactive_pages(lruvec, &page_list);
 
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), -nr_taken);
 
 	spin_unlock_irq(&pgdat->lru_lock);
 
@@ -1843,7 +1843,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 			stat.nr_congested, stat.nr_immediate,
 			stat.nr_activate, stat.nr_ref_keep,
 			stat.nr_unmap_fail,
-			sc->priority, file);
+			sc->priority, lru_isolate_index(lru));
 	return nr_reclaimed;
 }
 
@@ -1940,7 +1940,7 @@ static void shrink_active_list(unsigned long nr_to_scan,
 	nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold,
 				     &nr_scanned, sc, isolate_mode, lru);
 
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), nr_taken);
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	if (global_reclaim(sc))
@@ -2003,13 +2003,13 @@ static void shrink_active_list(unsigned long nr_to_scan,
 
 	nr_activate = move_active_pages_to_lru(lruvec, &l_active, &l_hold, lru);
 	nr_deactivate = move_active_pages_to_lru(lruvec, &l_inactive, &l_hold, lru - LRU_ACTIVE);
-	__mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken);
+	__mod_node_page_state(pgdat, lru_isolate_index(lru), -nr_taken);
 	spin_unlock_irq(&pgdat->lru_lock);
 
 	mem_cgroup_uncharge_list(&l_hold);
 	free_hot_cold_page_list(&l_hold, true);
 	trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate,
-			nr_deactivate, nr_rotated, sc->priority, file);
+		nr_deactivate, nr_rotated, sc->priority, lru_isolate_index(lru));
 }
 
 /*
@@ -2038,11 +2038,12 @@ static void shrink_active_list(unsigned long nr_to_scan,
  *    1TB     101        10GB
  *   10TB     320        32GB
  */
-static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
+static bool inactive_list_is_low(struct lruvec *lruvec, enum lru_list lru,
 						struct scan_control *sc, bool trace)
 {
 	unsigned long inactive_ratio;
 	unsigned long inactive, active;
+	bool file = is_file_lru(lru);
 	enum lru_list inactive_lru = file * LRU_FILE;
 	enum lru_list active_lru = file * LRU_FILE + LRU_ACTIVE;
 	unsigned long gb;
@@ -2068,7 +2069,7 @@ static bool inactive_list_is_low(struct lruvec *lruvec, bool file,
 				sc->reclaim_idx,
 				lruvec_lru_size(lruvec, inactive_lru, MAX_NR_ZONES), inactive,
 				lruvec_lru_size(lruvec, active_lru, MAX_NR_ZONES), active,
-				inactive_ratio, file);
+				inactive_ratio, lru_isolate_index(lru));
 
 	return inactive * inactive_ratio < active;
 }
@@ -2077,7 +2078,7 @@ static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
 				 struct lruvec *lruvec, struct scan_control *sc)
 {
 	if (is_active_lru(lru)) {
-		if (inactive_list_is_low(lruvec, is_file_lru(lru), sc, true))
+		if (inactive_list_is_low(lruvec, lru, sc, true))
 			shrink_active_list(nr_to_scan, lruvec, sc, lru);
 		return 0;
 	}
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 2/6] mm: add lazyfree page flag
  2017-01-30  5:51 ` Shaohua Li
@ 2017-01-30  5:51   ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

We are going to add MADV_FREE pages into a new LRU list. Add a new flag
to indicate such pages. Note, we are reusing PG_mappedtodisk for the new
flag. This is ok because no anonymous pages have this flag set.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li<shli@fb.com>
---
 fs/proc/task_mmu.c         | 8 +++++++-
 include/linux/mm_inline.h  | 5 +++++
 include/linux/page-flags.h | 6 ++++++
 mm/huge_memory.c           | 1 +
 mm/migrate.c               | 2 ++
 5 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ee3efb2..813d3aa 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -440,6 +440,7 @@ struct mem_size_stats {
 	unsigned long private_dirty;
 	unsigned long referenced;
 	unsigned long anonymous;
+	unsigned long lazyfree;
 	unsigned long anonymous_thp;
 	unsigned long shmem_thp;
 	unsigned long swap;
@@ -456,8 +457,11 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	int i, nr = compound ? 1 << compound_order(page) : 1;
 	unsigned long size = nr * PAGE_SIZE;
 
-	if (PageAnon(page))
+	if (PageAnon(page)) {
 		mss->anonymous += size;
+		if (PageLazyFree(page))
+			mss->lazyfree += size;
+	}
 
 	mss->resident += size;
 	/* Accumulate the size in pages that have been accessed. */
@@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "Private_Dirty:  %8lu kB\n"
 		   "Referenced:     %8lu kB\n"
 		   "Anonymous:      %8lu kB\n"
+		   "LazyFree:       %8lu kB\n"
 		   "AnonHugePages:  %8lu kB\n"
 		   "ShmemPmdMapped: %8lu kB\n"
 		   "Shared_Hugetlb: %8lu kB\n"
@@ -788,6 +793,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   mss.private_dirty >> 10,
 		   mss.referenced >> 10,
 		   mss.anonymous >> 10,
+		   mss.lazyfree >> 10,
 		   mss.anonymous_thp >> 10,
 		   mss.shmem_thp >> 10,
 		   mss.shared_hugetlb >> 10,
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 0dddc2c..828e813 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -22,6 +22,11 @@ static inline int page_is_file_cache(struct page *page)
 	return !PageSwapBacked(page);
 }
 
+static inline bool page_is_lazyfree(struct page *page)
+{
+	return PageSwapBacked(page) && PageLazyFree(page);
+}
+
 static __always_inline void __update_lru_size(struct lruvec *lruvec,
 				enum lru_list lru, enum zone_type zid,
 				int nr_pages)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..e8ea378 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -107,6 +107,9 @@ enum pageflags {
 #endif
 	__NR_PAGEFLAGS,
 
+	/* MADV_FREE */
+	PG_lazyfree = PG_mappedtodisk,
+
 	/* Filesystems */
 	PG_checked = PG_owner_priv_1,
 
@@ -428,6 +431,9 @@ TESTPAGEFLAG_FALSE(Ksm)
 
 u64 stable_page_flags(struct page *page);
 
+PAGEFLAG(LazyFree, lazyfree, PF_ANY)
+	__CLEARPAGEFLAG(LazyFree, lazyfree, PF_ANY)
+
 static inline int PageUptodate(struct page *page)
 {
 	int ret;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40bd376..ffa7ed5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1918,6 +1918,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 			 (1L << PG_swapbacked) |
 			 (1L << PG_mlocked) |
 			 (1L << PG_uptodate) |
+			 (1L << PG_lazyfree) |
 			 (1L << PG_active) |
 			 (1L << PG_locked) |
 			 (1L << PG_unevictable) |
diff --git a/mm/migrate.c b/mm/migrate.c
index 502ebea..496105c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -641,6 +641,8 @@ void migrate_page_copy(struct page *newpage, struct page *page)
 		SetPageChecked(newpage);
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
+	if (PageLazyFree(page))
+		SetPageLazyFree(newpage);
 
 	/* Move dirty on pages not done by migrate_page_move_mapping() */
 	if (PageDirty(page))
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 2/6] mm: add lazyfree page flag
@ 2017-01-30  5:51   ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

We are going to add MADV_FREE pages into a new LRU list. Add a new flag
to indicate such pages. Note, we are reusing PG_mappedtodisk for the new
flag. This is ok because no anonymous pages have this flag set.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li<shli@fb.com>
---
 fs/proc/task_mmu.c         | 8 +++++++-
 include/linux/mm_inline.h  | 5 +++++
 include/linux/page-flags.h | 6 ++++++
 mm/huge_memory.c           | 1 +
 mm/migrate.c               | 2 ++
 5 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ee3efb2..813d3aa 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -440,6 +440,7 @@ struct mem_size_stats {
 	unsigned long private_dirty;
 	unsigned long referenced;
 	unsigned long anonymous;
+	unsigned long lazyfree;
 	unsigned long anonymous_thp;
 	unsigned long shmem_thp;
 	unsigned long swap;
@@ -456,8 +457,11 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	int i, nr = compound ? 1 << compound_order(page) : 1;
 	unsigned long size = nr * PAGE_SIZE;
 
-	if (PageAnon(page))
+	if (PageAnon(page)) {
 		mss->anonymous += size;
+		if (PageLazyFree(page))
+			mss->lazyfree += size;
+	}
 
 	mss->resident += size;
 	/* Accumulate the size in pages that have been accessed. */
@@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "Private_Dirty:  %8lu kB\n"
 		   "Referenced:     %8lu kB\n"
 		   "Anonymous:      %8lu kB\n"
+		   "LazyFree:       %8lu kB\n"
 		   "AnonHugePages:  %8lu kB\n"
 		   "ShmemPmdMapped: %8lu kB\n"
 		   "Shared_Hugetlb: %8lu kB\n"
@@ -788,6 +793,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   mss.private_dirty >> 10,
 		   mss.referenced >> 10,
 		   mss.anonymous >> 10,
+		   mss.lazyfree >> 10,
 		   mss.anonymous_thp >> 10,
 		   mss.shmem_thp >> 10,
 		   mss.shared_hugetlb >> 10,
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 0dddc2c..828e813 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -22,6 +22,11 @@ static inline int page_is_file_cache(struct page *page)
 	return !PageSwapBacked(page);
 }
 
+static inline bool page_is_lazyfree(struct page *page)
+{
+	return PageSwapBacked(page) && PageLazyFree(page);
+}
+
 static __always_inline void __update_lru_size(struct lruvec *lruvec,
 				enum lru_list lru, enum zone_type zid,
 				int nr_pages)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6b5818d..e8ea378 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -107,6 +107,9 @@ enum pageflags {
 #endif
 	__NR_PAGEFLAGS,
 
+	/* MADV_FREE */
+	PG_lazyfree = PG_mappedtodisk,
+
 	/* Filesystems */
 	PG_checked = PG_owner_priv_1,
 
@@ -428,6 +431,9 @@ TESTPAGEFLAG_FALSE(Ksm)
 
 u64 stable_page_flags(struct page *page);
 
+PAGEFLAG(LazyFree, lazyfree, PF_ANY)
+	__CLEARPAGEFLAG(LazyFree, lazyfree, PF_ANY)
+
 static inline int PageUptodate(struct page *page)
 {
 	int ret;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 40bd376..ffa7ed5 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1918,6 +1918,7 @@ static void __split_huge_page_tail(struct page *head, int tail,
 			 (1L << PG_swapbacked) |
 			 (1L << PG_mlocked) |
 			 (1L << PG_uptodate) |
+			 (1L << PG_lazyfree) |
 			 (1L << PG_active) |
 			 (1L << PG_locked) |
 			 (1L << PG_unevictable) |
diff --git a/mm/migrate.c b/mm/migrate.c
index 502ebea..496105c 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -641,6 +641,8 @@ void migrate_page_copy(struct page *newpage, struct page *page)
 		SetPageChecked(newpage);
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
+	if (PageLazyFree(page))
+		SetPageLazyFree(newpage);
 
 	/* Move dirty on pages not done by migrate_page_move_mapping() */
 	if (PageDirty(page))
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 3/6] mm: add LRU_LAZYFREE lru list
  2017-01-30  5:51 ` Shaohua Li
@ 2017-01-30  5:51   ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

MADV_FREE pages are in anonymous LRU list currently, there are several problems:
- Doesn't support system without swap enabled. Because if swap is off,
  we can't or can't efficiently age anonymous pages. And since MADV_FREE
  pages are mixed with other anonymous pages, we can't reclaim MADV_FREE pages
- Increases memory pressure. page reclaim bias file pages reclaim
  against anonymous pages. This doesn't make sense for MADV_FREE pages,
  because those pages could be freed easily with very slight penality.
  Even page reclaim doesn't bias file pages, there is still an issue,
  because MADV_FREE pages and other anonymous pages are mixed together.
  To reclaim a MADV_FREE page, we probably must scan a lot of other
  anonymous pages, which is inefficient.

Introducing a new LRU list for MADV_FREE pages could solve the issues.
If only MADV_FREE pages are in the new list, page reclaim can easily
reclaim such pages without interference of file or anonymous pages.

This patch adds a LRU_LAZYFREE lru list. It's a dedicated LRU list for
MADV_FREE pages.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/base/node.c                       |  2 ++
 drivers/staging/android/lowmemorykiller.c |  3 ++-
 fs/proc/meminfo.c                         |  1 +
 include/linux/mm_inline.h                 | 10 ++++++++++
 include/linux/mmzone.h                    |  9 +++++++++
 include/linux/vm_event_item.h             |  2 +-
 include/trace/events/mmflags.h            |  1 +
 include/trace/events/vmscan.h             | 10 +++++++---
 kernel/power/snapshot.c                   |  1 +
 mm/compaction.c                           |  8 +++++---
 mm/memcontrol.c                           |  4 ++++
 mm/page_alloc.c                           | 10 ++++++++++
 mm/vmscan.c                               | 21 ++++++++++++++-------
 mm/vmstat.c                               |  4 ++++
 14 files changed, 71 insertions(+), 15 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f96..5c09b67 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -70,6 +70,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       "Node %d Inactive(anon): %8lu kB\n"
 		       "Node %d Active(file):   %8lu kB\n"
 		       "Node %d Inactive(file): %8lu kB\n"
+		       "Node %d LazyFree:	%8lu kB\n"
 		       "Node %d Unevictable:    %8lu kB\n"
 		       "Node %d Mlocked:        %8lu kB\n",
 		       nid, K(i.totalram),
@@ -83,6 +84,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_INACTIVE_ANON)),
 		       nid, K(node_page_state(pgdat, NR_ACTIVE_FILE)),
 		       nid, K(node_page_state(pgdat, NR_INACTIVE_FILE)),
+		       nid, K(node_page_state(pgdat, NR_LAZYFREE)),
 		       nid, K(node_page_state(pgdat, NR_UNEVICTABLE)),
 		       nid, K(sum_zone_node_page_state(nid, NR_MLOCK)));
 
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index ec3b665..2648872 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -75,7 +75,8 @@ static unsigned long lowmem_count(struct shrinker *s,
 	return global_node_page_state(NR_ACTIVE_ANON) +
 		global_node_page_state(NR_ACTIVE_FILE) +
 		global_node_page_state(NR_INACTIVE_ANON) +
-		global_node_page_state(NR_INACTIVE_FILE);
+		global_node_page_state(NR_INACTIVE_FILE) +
+		global_node_page_state(NR_LAZYFREE);
 }
 
 static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8a42849..7803d33 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -79,6 +79,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Inactive(anon): ", pages[LRU_INACTIVE_ANON]);
 	show_val_kb(m, "Active(file):   ", pages[LRU_ACTIVE_FILE]);
 	show_val_kb(m, "Inactive(file): ", pages[LRU_INACTIVE_FILE]);
+	show_val_kb(m, "LazyFree:       ", pages[LRU_LAZYFREE]);
 	show_val_kb(m, "Unevictable:    ", pages[LRU_UNEVICTABLE]);
 	show_val_kb(m, "Mlocked:        ", global_page_state(NR_MLOCK));
 
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 828e813..5f22c93 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -81,6 +81,8 @@ static inline enum lru_list page_lru_base_type(struct page *page)
 {
 	if (page_is_file_cache(page))
 		return LRU_INACTIVE_FILE;
+	if (PageLazyFree(page))
+		return LRU_LAZYFREE;
 	return LRU_INACTIVE_ANON;
 }
 
@@ -100,6 +102,8 @@ static __always_inline enum lru_list page_off_lru(struct page *page)
 		lru = LRU_UNEVICTABLE;
 	} else {
 		lru = page_lru_base_type(page);
+		if (lru == LRU_LAZYFREE)
+			__ClearPageLazyFree(page);
 		if (PageActive(page)) {
 			__ClearPageActive(page);
 			lru += LRU_ACTIVE;
@@ -123,6 +127,8 @@ static __always_inline enum lru_list page_lru(struct page *page)
 		lru = LRU_UNEVICTABLE;
 	else {
 		lru = page_lru_base_type(page);
+		if (lru == LRU_LAZYFREE)
+			return lru;
 		if (PageActive(page))
 			lru += LRU_ACTIVE;
 	}
@@ -139,6 +145,8 @@ static inline int lru_isolate_index(enum lru_list lru)
 {
 	if (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE)
 		return NR_ISOLATED_FILE;
+	if (lru == LRU_LAZYFREE)
+		return NR_ISOLATED_LAZYFREE;
 	return NR_ISOLATED_ANON;
 }
 
@@ -152,6 +160,8 @@ static inline int page_isolate_index(struct page *page)
 {
 	if (!PageSwapBacked(page))
 		return NR_ISOLATED_FILE;
+	else if (PageLazyFree(page))
+		return NR_ISOLATED_LAZYFREE;
 	return NR_ISOLATED_ANON;
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 338a786a..589a165 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -117,6 +117,7 @@ enum zone_stat_item {
 	NR_ZONE_ACTIVE_ANON,
 	NR_ZONE_INACTIVE_FILE,
 	NR_ZONE_ACTIVE_FILE,
+	NR_ZONE_LAZYFREE,
 	NR_ZONE_UNEVICTABLE,
 	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
 	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
@@ -146,9 +147,11 @@ enum node_stat_item {
 	NR_ACTIVE_ANON,		/*  "     "     "   "       "         */
 	NR_INACTIVE_FILE,	/*  "     "     "   "       "         */
 	NR_ACTIVE_FILE,		/*  "     "     "   "       "         */
+	NR_LAZYFREE,		/*  "     "     "   "       "         */
 	NR_UNEVICTABLE,		/*  "     "     "   "       "         */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
+	NR_ISOLATED_LAZYFREE,	/* Temporary isolated pages from lazyfree lru */
 	NR_PAGES_SCANNED,	/* pages scanned since last reclaim */
 	WORKINGSET_REFAULT,
 	WORKINGSET_ACTIVATE,
@@ -190,6 +193,7 @@ enum lru_list {
 	LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
 	LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
 	LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
+	LRU_LAZYFREE,
 	LRU_UNEVICTABLE,
 	NR_LRU_LISTS
 };
@@ -203,6 +207,11 @@ static inline int is_file_lru(enum lru_list lru)
 	return (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE);
 }
 
+static inline int is_anon_lru(enum lru_list lru)
+{
+	return lru <= LRU_ACTIVE_ANON;
+}
+
 static inline int is_active_lru(enum lru_list lru)
 {
 	return (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE);
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 6aa1b6c..94e58da 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -25,7 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
 		FOR_ALL_ZONES(ALLOCSTALL),
 		FOR_ALL_ZONES(PGSCAN_SKIP),
-		PGFREE, PGACTIVATE, PGDEACTIVATE,
+		PGFREE, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE,
 		PGFAULT, PGMAJFAULT,
 		PGLAZYFREED,
 		PGREFILL,
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 12cd88c..058b799 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -244,6 +244,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 		EM (LRU_ACTIVE_ANON, "active_anon") \
 		EM (LRU_INACTIVE_FILE, "inactive_file") \
 		EM (LRU_ACTIVE_FILE, "active_file") \
+		EM (LRU_LAZYFREE, "lazyfree") \
 		EMe(LRU_UNEVICTABLE, "unevictable")
 
 /*
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index fab386d..7ece3ab 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -12,8 +12,9 @@
 
 #define RECLAIM_WB_ANON		0x0001u
 #define RECLAIM_WB_FILE		0x0002u
+#define RECLAIM_WB_LAZYFREE	0x0004u
 #define RECLAIM_WB_MIXED	0x0010u
-#define RECLAIM_WB_SYNC		0x0004u /* Unused, all reclaim async */
+#define RECLAIM_WB_SYNC		0x0020u /* Unused, all reclaim async */
 #define RECLAIM_WB_ASYNC	0x0008u
 #define RECLAIM_WB_LRU		(RECLAIM_WB_ANON|RECLAIM_WB_FILE)
 
@@ -21,20 +22,23 @@
 	(flags) ? __print_flags(flags, "|",			\
 		{RECLAIM_WB_ANON,	"RECLAIM_WB_ANON"},	\
 		{RECLAIM_WB_FILE,	"RECLAIM_WB_FILE"},	\
+		{RECLAIM_WB_LAZYFREE,	"RECLAIM_WB_LAZYFREE"},	\
 		{RECLAIM_WB_MIXED,	"RECLAIM_WB_MIXED"},	\
 		{RECLAIM_WB_SYNC,	"RECLAIM_WB_SYNC"},	\
 		{RECLAIM_WB_ASYNC,	"RECLAIM_WB_ASYNC"}	\
 		) : "RECLAIM_WB_NONE"
 
 #define trace_reclaim_flags(page) ( \
-	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
+	(page_is_file_cache(page) ? RECLAIM_WB_FILE : \
+	 PageLazyFree(page) ? RECLAIM_WB_LAZYFREE : RECLAIM_WB_ANON) | \
 	(RECLAIM_WB_ASYNC) \
 	)
 
 #define trace_shrink_flags(isolate_index) \
 	( \
 		(isolate_index == NR_ISOLATED_FILE ? RECLAIM_WB_FILE : \
-			RECLAIM_WB_ANON) | \
+			isolate_index == NR_ISOLATED_ANON ? RECLAIM_WB_ANON: \
+			RECLAIM_WB_LAZYFREE) | \
 		(RECLAIM_WB_ASYNC) \
 	)
 
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 2d8e2b2..6d50a48 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1653,6 +1653,7 @@ static unsigned long minimum_image_size(unsigned long saveable)
 		+ global_node_page_state(NR_INACTIVE_ANON)
 		+ global_node_page_state(NR_ACTIVE_FILE)
 		+ global_node_page_state(NR_INACTIVE_FILE)
+		+ global_node_page_state(NR_LAZYFREE)
 		- global_node_page_state(NR_FILE_MAPPED);
 
 	return saveable <= size ? 0 : saveable - size;
diff --git a/mm/compaction.c b/mm/compaction.c
index 3918c48..9c842b9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -637,16 +637,18 @@ isolate_freepages_range(struct compact_control *cc,
 /* Similar to reclaim, but different enough that they don't share logic */
 static bool too_many_isolated(struct zone *zone)
 {
-	unsigned long active, inactive, isolated;
+	unsigned long active, inactive, lazyfree, isolated;
 
 	inactive = node_page_state(zone->zone_pgdat, NR_INACTIVE_FILE) +
 			node_page_state(zone->zone_pgdat, NR_INACTIVE_ANON);
 	active = node_page_state(zone->zone_pgdat, NR_ACTIVE_FILE) +
 			node_page_state(zone->zone_pgdat, NR_ACTIVE_ANON);
+	lazyfree = node_page_state(zone->zone_pgdat, NR_LAZYFREE);
 	isolated = node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE) +
-			node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON);
+			node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON) +
+			node_page_state(zone->zone_pgdat, NR_ISOLATED_LAZYFREE);
 
-	return isolated > (inactive + active) / 2;
+	return isolated > (inactive + active + lazyfree) / 2;
 }
 
 /**
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b822e15..0113240 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -120,6 +120,7 @@ static const char * const mem_cgroup_lru_names[] = {
 	"active_anon",
 	"inactive_file",
 	"active_file",
+	"lazyfree",
 	"unevictable",
 };
 
@@ -1263,6 +1264,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 static bool test_mem_cgroup_node_reclaimable(struct mem_cgroup *memcg,
 		int nid, bool noswap)
 {
+	if (mem_cgroup_node_nr_lru_pages(memcg, nid, BIT(LRU_LAZYFREE)))
+		return true;
 	if (mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL_FILE))
 		return true;
 	if (noswap || !total_swap_pages)
@@ -3086,6 +3089,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v)
 		{ "total", LRU_ALL },
 		{ "file", LRU_ALL_FILE },
 		{ "anon", LRU_ALL_ANON },
+		{ "lazyfree", BIT(LRU_LAZYFREE) },
 		{ "unevictable", BIT(LRU_UNEVICTABLE) },
 	};
 	const struct numa_stat *stat;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 11b4cd4..d00b41e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4316,6 +4316,9 @@ long si_mem_available(void)
 	pagecache -= min(pagecache / 2, wmark_low);
 	available += pagecache;
 
+	/* lazyfree pages can be freed */
+	available += pages[LRU_LAZYFREE];
+
 	/*
 	 * Part of the reclaimable slab consists of items that are in use,
 	 * and cannot be freed. Cap this estimate at the low watermark.
@@ -4450,6 +4453,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 
 	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
+		" lazy_free:%lu isolated_lazy_free:%lu\n"
 		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
 		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
 		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
@@ -4460,6 +4464,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		global_node_page_state(NR_ACTIVE_FILE),
 		global_node_page_state(NR_INACTIVE_FILE),
 		global_node_page_state(NR_ISOLATED_FILE),
+		global_node_page_state(NR_LAZYFREE),
+		global_node_page_state(NR_ISOLATED_LAZYFREE),
 		global_node_page_state(NR_UNEVICTABLE),
 		global_node_page_state(NR_FILE_DIRTY),
 		global_node_page_state(NR_WRITEBACK),
@@ -4483,9 +4489,11 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			" inactive_anon:%lukB"
 			" active_file:%lukB"
 			" inactive_file:%lukB"
+			" lazy_free:%lukB"
 			" unevictable:%lukB"
 			" isolated(anon):%lukB"
 			" isolated(file):%lukB"
+			" isolated(lazy_free):%lukB"
 			" mapped:%lukB"
 			" dirty:%lukB"
 			" writeback:%lukB"
@@ -4505,9 +4513,11 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
 			K(node_page_state(pgdat, NR_ACTIVE_FILE)),
 			K(node_page_state(pgdat, NR_INACTIVE_FILE)),
+			K(node_page_state(pgdat, NR_LAZYFREE)),
 			K(node_page_state(pgdat, NR_UNEVICTABLE)),
 			K(node_page_state(pgdat, NR_ISOLATED_ANON)),
 			K(node_page_state(pgdat, NR_ISOLATED_FILE)),
+			K(node_page_state(pgdat, NR_ISOLATED_LAZYFREE)),
 			K(node_page_state(pgdat, NR_FILE_MAPPED)),
 			K(node_page_state(pgdat, NR_FILE_DIRTY)),
 			K(node_page_state(pgdat, NR_WRITEBACK)),
diff --git a/mm/vmscan.c b/mm/vmscan.c
index abb64b7..3a0d05b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -205,7 +205,8 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 	unsigned long nr;
 
 	nr = zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_FILE) +
-		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE);
+		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE) +
+		zone_page_state_snapshot(zone, NR_ZONE_LAZYFREE);
 	if (get_nr_swap_pages() > 0)
 		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
 			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
@@ -219,7 +220,9 @@ unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat)
 
 	nr = node_page_state_snapshot(pgdat, NR_ACTIVE_FILE) +
 	     node_page_state_snapshot(pgdat, NR_INACTIVE_FILE) +
-	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE);
+	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE) +
+	     node_page_state_snapshot(pgdat, NR_LAZYFREE) +
+	     node_page_state_snapshot(pgdat, NR_ISOLATED_LAZYFREE);
 
 	if (get_nr_swap_pages() > 0)
 		nr += node_page_state_snapshot(pgdat, NR_ACTIVE_ANON) +
@@ -1602,7 +1605,7 @@ int isolate_lru_page(struct page *page)
  * the LRU list will go small and be scanned faster than necessary, leading to
  * unnecessary swapping, thrashing and OOM.
  */
-static int too_many_isolated(struct pglist_data *pgdat, int file,
+static int too_many_isolated(struct pglist_data *pgdat, enum lru_list lru,
 		struct scan_control *sc)
 {
 	unsigned long inactive, isolated;
@@ -1613,12 +1616,15 @@ static int too_many_isolated(struct pglist_data *pgdat, int file,
 	if (!sane_reclaim(sc))
 		return 0;
 
-	if (file) {
+	if (is_file_lru(lru)) {
 		inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
 		isolated = node_page_state(pgdat, NR_ISOLATED_FILE);
-	} else {
+	} else if (is_anon_lru(lru)) {
 		inactive = node_page_state(pgdat, NR_INACTIVE_ANON);
 		isolated = node_page_state(pgdat, NR_ISOLATED_ANON);
+	} else {
+		inactive = node_page_state(pgdat, NR_LAZYFREE);
+		isolated = node_page_state(pgdat, NR_ISOLATED_LAZYFREE);
 	}
 
 	/*
@@ -1718,7 +1724,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 
-	while (unlikely(too_many_isolated(pgdat, file, sc))) {
+	while (unlikely(too_many_isolated(pgdat, lru, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/* We are about to die and free our memory. Return now. */
@@ -2498,7 +2504,8 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 	 * inactive lists are large enough, continue reclaiming
 	 */
 	pages_for_compaction = compact_gap(sc->order);
-	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
+	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE) +
+			node_page_state(pgdat, NR_LAZYFREE);
 	if (get_nr_swap_pages() > 0)
 		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
 	if (sc->nr_reclaimed < pages_for_compaction &&
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 69f9aff..86ffe2c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -925,6 +925,7 @@ const char * const vmstat_text[] = {
 	"nr_zone_active_anon",
 	"nr_zone_inactive_file",
 	"nr_zone_active_file",
+	"nr_zone_lazyfree",
 	"nr_zone_unevictable",
 	"nr_zone_write_pending",
 	"nr_mlock",
@@ -951,9 +952,11 @@ const char * const vmstat_text[] = {
 	"nr_active_anon",
 	"nr_inactive_file",
 	"nr_active_file",
+	"nr_lazyfree",
 	"nr_unevictable",
 	"nr_isolated_anon",
 	"nr_isolated_file",
+	"nr_isolated_lazyfree",
 	"nr_pages_scanned",
 	"workingset_refault",
 	"workingset_activate",
@@ -992,6 +995,7 @@ const char * const vmstat_text[] = {
 	"pgfree",
 	"pgactivate",
 	"pgdeactivate",
+	"pglazyfree",
 
 	"pgfault",
 	"pgmajfault",
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 3/6] mm: add LRU_LAZYFREE lru list
@ 2017-01-30  5:51   ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

MADV_FREE pages are in anonymous LRU list currently, there are several problems:
- Doesn't support system without swap enabled. Because if swap is off,
  we can't or can't efficiently age anonymous pages. And since MADV_FREE
  pages are mixed with other anonymous pages, we can't reclaim MADV_FREE pages
- Increases memory pressure. page reclaim bias file pages reclaim
  against anonymous pages. This doesn't make sense for MADV_FREE pages,
  because those pages could be freed easily with very slight penality.
  Even page reclaim doesn't bias file pages, there is still an issue,
  because MADV_FREE pages and other anonymous pages are mixed together.
  To reclaim a MADV_FREE page, we probably must scan a lot of other
  anonymous pages, which is inefficient.

Introducing a new LRU list for MADV_FREE pages could solve the issues.
If only MADV_FREE pages are in the new list, page reclaim can easily
reclaim such pages without interference of file or anonymous pages.

This patch adds a LRU_LAZYFREE lru list. It's a dedicated LRU list for
MADV_FREE pages.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/base/node.c                       |  2 ++
 drivers/staging/android/lowmemorykiller.c |  3 ++-
 fs/proc/meminfo.c                         |  1 +
 include/linux/mm_inline.h                 | 10 ++++++++++
 include/linux/mmzone.h                    |  9 +++++++++
 include/linux/vm_event_item.h             |  2 +-
 include/trace/events/mmflags.h            |  1 +
 include/trace/events/vmscan.h             | 10 +++++++---
 kernel/power/snapshot.c                   |  1 +
 mm/compaction.c                           |  8 +++++---
 mm/memcontrol.c                           |  4 ++++
 mm/page_alloc.c                           | 10 ++++++++++
 mm/vmscan.c                               | 21 ++++++++++++++-------
 mm/vmstat.c                               |  4 ++++
 14 files changed, 71 insertions(+), 15 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 5548f96..5c09b67 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -70,6 +70,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       "Node %d Inactive(anon): %8lu kB\n"
 		       "Node %d Active(file):   %8lu kB\n"
 		       "Node %d Inactive(file): %8lu kB\n"
+		       "Node %d LazyFree:	%8lu kB\n"
 		       "Node %d Unevictable:    %8lu kB\n"
 		       "Node %d Mlocked:        %8lu kB\n",
 		       nid, K(i.totalram),
@@ -83,6 +84,7 @@ static ssize_t node_read_meminfo(struct device *dev,
 		       nid, K(node_page_state(pgdat, NR_INACTIVE_ANON)),
 		       nid, K(node_page_state(pgdat, NR_ACTIVE_FILE)),
 		       nid, K(node_page_state(pgdat, NR_INACTIVE_FILE)),
+		       nid, K(node_page_state(pgdat, NR_LAZYFREE)),
 		       nid, K(node_page_state(pgdat, NR_UNEVICTABLE)),
 		       nid, K(sum_zone_node_page_state(nid, NR_MLOCK)));
 
diff --git a/drivers/staging/android/lowmemorykiller.c b/drivers/staging/android/lowmemorykiller.c
index ec3b665..2648872 100644
--- a/drivers/staging/android/lowmemorykiller.c
+++ b/drivers/staging/android/lowmemorykiller.c
@@ -75,7 +75,8 @@ static unsigned long lowmem_count(struct shrinker *s,
 	return global_node_page_state(NR_ACTIVE_ANON) +
 		global_node_page_state(NR_ACTIVE_FILE) +
 		global_node_page_state(NR_INACTIVE_ANON) +
-		global_node_page_state(NR_INACTIVE_FILE);
+		global_node_page_state(NR_INACTIVE_FILE) +
+		global_node_page_state(NR_LAZYFREE);
 }
 
 static unsigned long lowmem_scan(struct shrinker *s, struct shrink_control *sc)
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 8a42849..7803d33 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -79,6 +79,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	show_val_kb(m, "Inactive(anon): ", pages[LRU_INACTIVE_ANON]);
 	show_val_kb(m, "Active(file):   ", pages[LRU_ACTIVE_FILE]);
 	show_val_kb(m, "Inactive(file): ", pages[LRU_INACTIVE_FILE]);
+	show_val_kb(m, "LazyFree:       ", pages[LRU_LAZYFREE]);
 	show_val_kb(m, "Unevictable:    ", pages[LRU_UNEVICTABLE]);
 	show_val_kb(m, "Mlocked:        ", global_page_state(NR_MLOCK));
 
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 828e813..5f22c93 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -81,6 +81,8 @@ static inline enum lru_list page_lru_base_type(struct page *page)
 {
 	if (page_is_file_cache(page))
 		return LRU_INACTIVE_FILE;
+	if (PageLazyFree(page))
+		return LRU_LAZYFREE;
 	return LRU_INACTIVE_ANON;
 }
 
@@ -100,6 +102,8 @@ static __always_inline enum lru_list page_off_lru(struct page *page)
 		lru = LRU_UNEVICTABLE;
 	} else {
 		lru = page_lru_base_type(page);
+		if (lru == LRU_LAZYFREE)
+			__ClearPageLazyFree(page);
 		if (PageActive(page)) {
 			__ClearPageActive(page);
 			lru += LRU_ACTIVE;
@@ -123,6 +127,8 @@ static __always_inline enum lru_list page_lru(struct page *page)
 		lru = LRU_UNEVICTABLE;
 	else {
 		lru = page_lru_base_type(page);
+		if (lru == LRU_LAZYFREE)
+			return lru;
 		if (PageActive(page))
 			lru += LRU_ACTIVE;
 	}
@@ -139,6 +145,8 @@ static inline int lru_isolate_index(enum lru_list lru)
 {
 	if (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE)
 		return NR_ISOLATED_FILE;
+	if (lru == LRU_LAZYFREE)
+		return NR_ISOLATED_LAZYFREE;
 	return NR_ISOLATED_ANON;
 }
 
@@ -152,6 +160,8 @@ static inline int page_isolate_index(struct page *page)
 {
 	if (!PageSwapBacked(page))
 		return NR_ISOLATED_FILE;
+	else if (PageLazyFree(page))
+		return NR_ISOLATED_LAZYFREE;
 	return NR_ISOLATED_ANON;
 }
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 338a786a..589a165 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -117,6 +117,7 @@ enum zone_stat_item {
 	NR_ZONE_ACTIVE_ANON,
 	NR_ZONE_INACTIVE_FILE,
 	NR_ZONE_ACTIVE_FILE,
+	NR_ZONE_LAZYFREE,
 	NR_ZONE_UNEVICTABLE,
 	NR_ZONE_WRITE_PENDING,	/* Count of dirty, writeback and unstable pages */
 	NR_MLOCK,		/* mlock()ed pages found and moved off LRU */
@@ -146,9 +147,11 @@ enum node_stat_item {
 	NR_ACTIVE_ANON,		/*  "     "     "   "       "         */
 	NR_INACTIVE_FILE,	/*  "     "     "   "       "         */
 	NR_ACTIVE_FILE,		/*  "     "     "   "       "         */
+	NR_LAZYFREE,		/*  "     "     "   "       "         */
 	NR_UNEVICTABLE,		/*  "     "     "   "       "         */
 	NR_ISOLATED_ANON,	/* Temporary isolated pages from anon lru */
 	NR_ISOLATED_FILE,	/* Temporary isolated pages from file lru */
+	NR_ISOLATED_LAZYFREE,	/* Temporary isolated pages from lazyfree lru */
 	NR_PAGES_SCANNED,	/* pages scanned since last reclaim */
 	WORKINGSET_REFAULT,
 	WORKINGSET_ACTIVATE,
@@ -190,6 +193,7 @@ enum lru_list {
 	LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE,
 	LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE,
 	LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE,
+	LRU_LAZYFREE,
 	LRU_UNEVICTABLE,
 	NR_LRU_LISTS
 };
@@ -203,6 +207,11 @@ static inline int is_file_lru(enum lru_list lru)
 	return (lru == LRU_INACTIVE_FILE || lru == LRU_ACTIVE_FILE);
 }
 
+static inline int is_anon_lru(enum lru_list lru)
+{
+	return lru <= LRU_ACTIVE_ANON;
+}
+
 static inline int is_active_lru(enum lru_list lru)
 {
 	return (lru == LRU_ACTIVE_ANON || lru == LRU_ACTIVE_FILE);
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 6aa1b6c..94e58da 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -25,7 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
 		FOR_ALL_ZONES(ALLOCSTALL),
 		FOR_ALL_ZONES(PGSCAN_SKIP),
-		PGFREE, PGACTIVATE, PGDEACTIVATE,
+		PGFREE, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE,
 		PGFAULT, PGMAJFAULT,
 		PGLAZYFREED,
 		PGREFILL,
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 12cd88c..058b799 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -244,6 +244,7 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,	"softdirty"	)		\
 		EM (LRU_ACTIVE_ANON, "active_anon") \
 		EM (LRU_INACTIVE_FILE, "inactive_file") \
 		EM (LRU_ACTIVE_FILE, "active_file") \
+		EM (LRU_LAZYFREE, "lazyfree") \
 		EMe(LRU_UNEVICTABLE, "unevictable")
 
 /*
diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index fab386d..7ece3ab 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -12,8 +12,9 @@
 
 #define RECLAIM_WB_ANON		0x0001u
 #define RECLAIM_WB_FILE		0x0002u
+#define RECLAIM_WB_LAZYFREE	0x0004u
 #define RECLAIM_WB_MIXED	0x0010u
-#define RECLAIM_WB_SYNC		0x0004u /* Unused, all reclaim async */
+#define RECLAIM_WB_SYNC		0x0020u /* Unused, all reclaim async */
 #define RECLAIM_WB_ASYNC	0x0008u
 #define RECLAIM_WB_LRU		(RECLAIM_WB_ANON|RECLAIM_WB_FILE)
 
@@ -21,20 +22,23 @@
 	(flags) ? __print_flags(flags, "|",			\
 		{RECLAIM_WB_ANON,	"RECLAIM_WB_ANON"},	\
 		{RECLAIM_WB_FILE,	"RECLAIM_WB_FILE"},	\
+		{RECLAIM_WB_LAZYFREE,	"RECLAIM_WB_LAZYFREE"},	\
 		{RECLAIM_WB_MIXED,	"RECLAIM_WB_MIXED"},	\
 		{RECLAIM_WB_SYNC,	"RECLAIM_WB_SYNC"},	\
 		{RECLAIM_WB_ASYNC,	"RECLAIM_WB_ASYNC"}	\
 		) : "RECLAIM_WB_NONE"
 
 #define trace_reclaim_flags(page) ( \
-	(page_is_file_cache(page) ? RECLAIM_WB_FILE : RECLAIM_WB_ANON) | \
+	(page_is_file_cache(page) ? RECLAIM_WB_FILE : \
+	 PageLazyFree(page) ? RECLAIM_WB_LAZYFREE : RECLAIM_WB_ANON) | \
 	(RECLAIM_WB_ASYNC) \
 	)
 
 #define trace_shrink_flags(isolate_index) \
 	( \
 		(isolate_index == NR_ISOLATED_FILE ? RECLAIM_WB_FILE : \
-			RECLAIM_WB_ANON) | \
+			isolate_index == NR_ISOLATED_ANON ? RECLAIM_WB_ANON: \
+			RECLAIM_WB_LAZYFREE) | \
 		(RECLAIM_WB_ASYNC) \
 	)
 
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 2d8e2b2..6d50a48 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1653,6 +1653,7 @@ static unsigned long minimum_image_size(unsigned long saveable)
 		+ global_node_page_state(NR_INACTIVE_ANON)
 		+ global_node_page_state(NR_ACTIVE_FILE)
 		+ global_node_page_state(NR_INACTIVE_FILE)
+		+ global_node_page_state(NR_LAZYFREE)
 		- global_node_page_state(NR_FILE_MAPPED);
 
 	return saveable <= size ? 0 : saveable - size;
diff --git a/mm/compaction.c b/mm/compaction.c
index 3918c48..9c842b9 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -637,16 +637,18 @@ isolate_freepages_range(struct compact_control *cc,
 /* Similar to reclaim, but different enough that they don't share logic */
 static bool too_many_isolated(struct zone *zone)
 {
-	unsigned long active, inactive, isolated;
+	unsigned long active, inactive, lazyfree, isolated;
 
 	inactive = node_page_state(zone->zone_pgdat, NR_INACTIVE_FILE) +
 			node_page_state(zone->zone_pgdat, NR_INACTIVE_ANON);
 	active = node_page_state(zone->zone_pgdat, NR_ACTIVE_FILE) +
 			node_page_state(zone->zone_pgdat, NR_ACTIVE_ANON);
+	lazyfree = node_page_state(zone->zone_pgdat, NR_LAZYFREE);
 	isolated = node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE) +
-			node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON);
+			node_page_state(zone->zone_pgdat, NR_ISOLATED_ANON) +
+			node_page_state(zone->zone_pgdat, NR_ISOLATED_LAZYFREE);
 
-	return isolated > (inactive + active) / 2;
+	return isolated > (inactive + active + lazyfree) / 2;
 }
 
 /**
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b822e15..0113240 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -120,6 +120,7 @@ static const char * const mem_cgroup_lru_names[] = {
 	"active_anon",
 	"inactive_file",
 	"active_file",
+	"lazyfree",
 	"unevictable",
 };
 
@@ -1263,6 +1264,8 @@ static bool mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
 static bool test_mem_cgroup_node_reclaimable(struct mem_cgroup *memcg,
 		int nid, bool noswap)
 {
+	if (mem_cgroup_node_nr_lru_pages(memcg, nid, BIT(LRU_LAZYFREE)))
+		return true;
 	if (mem_cgroup_node_nr_lru_pages(memcg, nid, LRU_ALL_FILE))
 		return true;
 	if (noswap || !total_swap_pages)
@@ -3086,6 +3089,7 @@ static int memcg_numa_stat_show(struct seq_file *m, void *v)
 		{ "total", LRU_ALL },
 		{ "file", LRU_ALL_FILE },
 		{ "anon", LRU_ALL_ANON },
+		{ "lazyfree", BIT(LRU_LAZYFREE) },
 		{ "unevictable", BIT(LRU_UNEVICTABLE) },
 	};
 	const struct numa_stat *stat;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 11b4cd4..d00b41e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4316,6 +4316,9 @@ long si_mem_available(void)
 	pagecache -= min(pagecache / 2, wmark_low);
 	available += pagecache;
 
+	/* lazyfree pages can be freed */
+	available += pages[LRU_LAZYFREE];
+
 	/*
 	 * Part of the reclaimable slab consists of items that are in use,
 	 * and cannot be freed. Cap this estimate at the low watermark.
@@ -4450,6 +4453,7 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 
 	printk("active_anon:%lu inactive_anon:%lu isolated_anon:%lu\n"
 		" active_file:%lu inactive_file:%lu isolated_file:%lu\n"
+		" lazy_free:%lu isolated_lazy_free:%lu\n"
 		" unevictable:%lu dirty:%lu writeback:%lu unstable:%lu\n"
 		" slab_reclaimable:%lu slab_unreclaimable:%lu\n"
 		" mapped:%lu shmem:%lu pagetables:%lu bounce:%lu\n"
@@ -4460,6 +4464,8 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 		global_node_page_state(NR_ACTIVE_FILE),
 		global_node_page_state(NR_INACTIVE_FILE),
 		global_node_page_state(NR_ISOLATED_FILE),
+		global_node_page_state(NR_LAZYFREE),
+		global_node_page_state(NR_ISOLATED_LAZYFREE),
 		global_node_page_state(NR_UNEVICTABLE),
 		global_node_page_state(NR_FILE_DIRTY),
 		global_node_page_state(NR_WRITEBACK),
@@ -4483,9 +4489,11 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			" inactive_anon:%lukB"
 			" active_file:%lukB"
 			" inactive_file:%lukB"
+			" lazy_free:%lukB"
 			" unevictable:%lukB"
 			" isolated(anon):%lukB"
 			" isolated(file):%lukB"
+			" isolated(lazy_free):%lukB"
 			" mapped:%lukB"
 			" dirty:%lukB"
 			" writeback:%lukB"
@@ -4505,9 +4513,11 @@ void show_free_areas(unsigned int filter, nodemask_t *nodemask)
 			K(node_page_state(pgdat, NR_INACTIVE_ANON)),
 			K(node_page_state(pgdat, NR_ACTIVE_FILE)),
 			K(node_page_state(pgdat, NR_INACTIVE_FILE)),
+			K(node_page_state(pgdat, NR_LAZYFREE)),
 			K(node_page_state(pgdat, NR_UNEVICTABLE)),
 			K(node_page_state(pgdat, NR_ISOLATED_ANON)),
 			K(node_page_state(pgdat, NR_ISOLATED_FILE)),
+			K(node_page_state(pgdat, NR_ISOLATED_LAZYFREE)),
 			K(node_page_state(pgdat, NR_FILE_MAPPED)),
 			K(node_page_state(pgdat, NR_FILE_DIRTY)),
 			K(node_page_state(pgdat, NR_WRITEBACK)),
diff --git a/mm/vmscan.c b/mm/vmscan.c
index abb64b7..3a0d05b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -205,7 +205,8 @@ unsigned long zone_reclaimable_pages(struct zone *zone)
 	unsigned long nr;
 
 	nr = zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_FILE) +
-		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE);
+		zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_FILE) +
+		zone_page_state_snapshot(zone, NR_ZONE_LAZYFREE);
 	if (get_nr_swap_pages() > 0)
 		nr += zone_page_state_snapshot(zone, NR_ZONE_INACTIVE_ANON) +
 			zone_page_state_snapshot(zone, NR_ZONE_ACTIVE_ANON);
@@ -219,7 +220,9 @@ unsigned long pgdat_reclaimable_pages(struct pglist_data *pgdat)
 
 	nr = node_page_state_snapshot(pgdat, NR_ACTIVE_FILE) +
 	     node_page_state_snapshot(pgdat, NR_INACTIVE_FILE) +
-	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE);
+	     node_page_state_snapshot(pgdat, NR_ISOLATED_FILE) +
+	     node_page_state_snapshot(pgdat, NR_LAZYFREE) +
+	     node_page_state_snapshot(pgdat, NR_ISOLATED_LAZYFREE);
 
 	if (get_nr_swap_pages() > 0)
 		nr += node_page_state_snapshot(pgdat, NR_ACTIVE_ANON) +
@@ -1602,7 +1605,7 @@ int isolate_lru_page(struct page *page)
  * the LRU list will go small and be scanned faster than necessary, leading to
  * unnecessary swapping, thrashing and OOM.
  */
-static int too_many_isolated(struct pglist_data *pgdat, int file,
+static int too_many_isolated(struct pglist_data *pgdat, enum lru_list lru,
 		struct scan_control *sc)
 {
 	unsigned long inactive, isolated;
@@ -1613,12 +1616,15 @@ static int too_many_isolated(struct pglist_data *pgdat, int file,
 	if (!sane_reclaim(sc))
 		return 0;
 
-	if (file) {
+	if (is_file_lru(lru)) {
 		inactive = node_page_state(pgdat, NR_INACTIVE_FILE);
 		isolated = node_page_state(pgdat, NR_ISOLATED_FILE);
-	} else {
+	} else if (is_anon_lru(lru)) {
 		inactive = node_page_state(pgdat, NR_INACTIVE_ANON);
 		isolated = node_page_state(pgdat, NR_ISOLATED_ANON);
+	} else {
+		inactive = node_page_state(pgdat, NR_LAZYFREE);
+		isolated = node_page_state(pgdat, NR_ISOLATED_LAZYFREE);
 	}
 
 	/*
@@ -1718,7 +1724,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
 
-	while (unlikely(too_many_isolated(pgdat, file, sc))) {
+	while (unlikely(too_many_isolated(pgdat, lru, sc))) {
 		congestion_wait(BLK_RW_ASYNC, HZ/10);
 
 		/* We are about to die and free our memory. Return now. */
@@ -2498,7 +2504,8 @@ static inline bool should_continue_reclaim(struct pglist_data *pgdat,
 	 * inactive lists are large enough, continue reclaiming
 	 */
 	pages_for_compaction = compact_gap(sc->order);
-	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE);
+	inactive_lru_pages = node_page_state(pgdat, NR_INACTIVE_FILE) +
+			node_page_state(pgdat, NR_LAZYFREE);
 	if (get_nr_swap_pages() > 0)
 		inactive_lru_pages += node_page_state(pgdat, NR_INACTIVE_ANON);
 	if (sc->nr_reclaimed < pages_for_compaction &&
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 69f9aff..86ffe2c 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -925,6 +925,7 @@ const char * const vmstat_text[] = {
 	"nr_zone_active_anon",
 	"nr_zone_inactive_file",
 	"nr_zone_active_file",
+	"nr_zone_lazyfree",
 	"nr_zone_unevictable",
 	"nr_zone_write_pending",
 	"nr_mlock",
@@ -951,9 +952,11 @@ const char * const vmstat_text[] = {
 	"nr_active_anon",
 	"nr_inactive_file",
 	"nr_active_file",
+	"nr_lazyfree",
 	"nr_unevictable",
 	"nr_isolated_anon",
 	"nr_isolated_file",
+	"nr_isolated_lazyfree",
 	"nr_pages_scanned",
 	"workingset_refault",
 	"workingset_activate",
@@ -992,6 +995,7 @@ const char * const vmstat_text[] = {
 	"pgfree",
 	"pgactivate",
 	"pgdeactivate",
+	"pglazyfree",
 
 	"pgfault",
 	"pgmajfault",
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 4/6] mm: move MADV_FREE pages into LRU_LAZYFREE list
  2017-01-30  5:51 ` Shaohua Li
@ 2017-01-30  5:51   ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

Move the MADV_FREE pages into LRU_LAZYFREE list. The reason why we need
to do this is described in last patch. Next patch will reclaim the
pages.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/swap.h |  2 +-
 mm/huge_memory.c     |  5 ++---
 mm/madvise.c         |  3 +--
 mm/swap.c            | 51 +++++++++++++++++++++++++++++----------------------
 4 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45e91dd..e35bef5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -279,7 +279,7 @@ extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
-extern void deactivate_page(struct page *page);
+extern void move_page_to_lazyfree_list(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ffa7ed5..57daef7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1391,9 +1391,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		ClearPageDirty(page);
 	unlock_page(page);
 
-	if (PageActive(page))
-		deactivate_page(page);
-
 	if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
 		orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
 			tlb->fullmm);
@@ -1404,6 +1401,8 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		set_pmd_at(mm, addr, pmd, orig_pmd);
 		tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 	}
+
+	move_page_to_lazyfree_list(page);
 	ret = true;
 out:
 	spin_unlock(ptl);
diff --git a/mm/madvise.c b/mm/madvise.c
index c867d88..78b4b02 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -378,10 +378,9 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			ptent = pte_mkclean(ptent);
 			ptent = pte_wrprotect(ptent);
 			set_pte_at(mm, addr, pte, ptent);
-			if (PageActive(page))
-				deactivate_page(page);
 			tlb_remove_tlb_entry(tlb, pte, addr);
 		}
+		move_page_to_lazyfree_list(page);
 	}
 out:
 	if (nr_swap) {
diff --git a/mm/swap.c b/mm/swap.c
index c4910f1..f9e70e8 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -46,7 +46,7 @@ int page_cluster;
 static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
 #endif
@@ -268,6 +268,10 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
 		int lru = page_lru_base_type(page);
 
 		del_page_from_lru_list(page, lruvec, lru);
+		if (lru == LRU_LAZYFREE) {
+			ClearPageLazyFree(page);
+			lru = LRU_INACTIVE_ANON;
+		}
 		SetPageActive(page);
 		lru += LRU_ACTIVE;
 		add_page_to_lru_list(page, lruvec, lru);
@@ -455,6 +459,8 @@ void add_page_to_unevictable_list(struct page *page)
 	ClearPageActive(page);
 	SetPageUnevictable(page);
 	SetPageLRU(page);
+	if (page_is_lazyfree(page))
+		ClearPageLazyFree(page);
 	add_page_to_lru_list(page, lruvec, LRU_UNEVICTABLE);
 	spin_unlock_irq(&pgdat->lru_lock);
 }
@@ -561,20 +567,21 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
 }
 
 
-static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
+static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
 			    void *arg)
 {
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		int file = page_is_file_cache(page);
-		int lru = page_lru_base_type(page);
+	if (PageLRU(page) && PageSwapBacked(page) && !PageLazyFree(page) &&
+	    !PageUnevictable(page)) {
+		unsigned int nr_pages = PageTransHuge(page) ? HPAGE_PMD_NR : 1;
+		bool active = PageActive(page);
 
-		del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
+		del_page_from_lru_list(page, lruvec, LRU_INACTIVE_ANON + active);
 		ClearPageActive(page);
 		ClearPageReferenced(page);
-		add_page_to_lru_list(page, lruvec, lru);
+		SetPageLazyFree(page);
+		add_page_to_lru_list(page, lruvec, LRU_LAZYFREE);
 
-		__count_vm_event(PGDEACTIVATE);
-		update_page_reclaim_stat(lruvec, file, 0);
+		count_vm_events(PGLAZYFREE, nr_pages);
 	}
 }
 
@@ -604,9 +611,9 @@ void lru_add_drain_cpu(int cpu)
 	if (pagevec_count(pvec))
 		pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
 
-	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+	pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
 	if (pagevec_count(pvec))
-		pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+		pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
 
 	activate_page_drain(cpu);
 }
@@ -638,22 +645,22 @@ void deactivate_file_page(struct page *page)
 }
 
 /**
- * deactivate_page - deactivate a page
- * @page: page to deactivate
+ * move_page_to_lazyfree_list - move anon page to lazyfree list
+ * @page: page to move
  *
- * deactivate_page() moves @page to the inactive list if @page was on the active
- * list and was not an unevictable page.  This is done to accelerate the reclaim
- * of @page.
+ * This function moves @page to the lazyfree list after the page is the target
+ * of a MADV_FREE syscall. This is to accelerate the reclaim of the @page
  */
-void deactivate_page(struct page *page)
+void move_page_to_lazyfree_list(struct page *page)
 {
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+	if (PageLRU(page) && PageSwapBacked(page) && !PageLazyFree(page) &&
+	    !PageUnevictable(page)) {
+		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
 
 		get_page(page);
 		if (!pagevec_add(pvec, page) || PageCompound(page))
-			pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
-		put_cpu_var(lru_deactivate_pvecs);
+			pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
+		put_cpu_var(lru_lazyfree_pvecs);
 	}
 }
 
@@ -704,7 +711,7 @@ void lru_add_drain_all(void)
 		if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
 		    pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
 		    pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
-		    pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
+		    pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
 		    need_activate_page_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, lru_add_drain_wq, work);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 4/6] mm: move MADV_FREE pages into LRU_LAZYFREE list
@ 2017-01-30  5:51   ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

Move the MADV_FREE pages into LRU_LAZYFREE list. The reason why we need
to do this is described in last patch. Next patch will reclaim the
pages.

The patch is based on Minchan's previous patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/swap.h |  2 +-
 mm/huge_memory.c     |  5 ++---
 mm/madvise.c         |  3 +--
 mm/swap.c            | 51 +++++++++++++++++++++++++++++----------------------
 4 files changed, 33 insertions(+), 28 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45e91dd..e35bef5 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -279,7 +279,7 @@ extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
-extern void deactivate_page(struct page *page);
+extern void move_page_to_lazyfree_list(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index ffa7ed5..57daef7 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1391,9 +1391,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		ClearPageDirty(page);
 	unlock_page(page);
 
-	if (PageActive(page))
-		deactivate_page(page);
-
 	if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
 		orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
 			tlb->fullmm);
@@ -1404,6 +1401,8 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		set_pmd_at(mm, addr, pmd, orig_pmd);
 		tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 	}
+
+	move_page_to_lazyfree_list(page);
 	ret = true;
 out:
 	spin_unlock(ptl);
diff --git a/mm/madvise.c b/mm/madvise.c
index c867d88..78b4b02 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -378,10 +378,9 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			ptent = pte_mkclean(ptent);
 			ptent = pte_wrprotect(ptent);
 			set_pte_at(mm, addr, pte, ptent);
-			if (PageActive(page))
-				deactivate_page(page);
 			tlb_remove_tlb_entry(tlb, pte, addr);
 		}
+		move_page_to_lazyfree_list(page);
 	}
 out:
 	if (nr_swap) {
diff --git a/mm/swap.c b/mm/swap.c
index c4910f1..f9e70e8 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -46,7 +46,7 @@ int page_cluster;
 static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
 #endif
@@ -268,6 +268,10 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
 		int lru = page_lru_base_type(page);
 
 		del_page_from_lru_list(page, lruvec, lru);
+		if (lru == LRU_LAZYFREE) {
+			ClearPageLazyFree(page);
+			lru = LRU_INACTIVE_ANON;
+		}
 		SetPageActive(page);
 		lru += LRU_ACTIVE;
 		add_page_to_lru_list(page, lruvec, lru);
@@ -455,6 +459,8 @@ void add_page_to_unevictable_list(struct page *page)
 	ClearPageActive(page);
 	SetPageUnevictable(page);
 	SetPageLRU(page);
+	if (page_is_lazyfree(page))
+		ClearPageLazyFree(page);
 	add_page_to_lru_list(page, lruvec, LRU_UNEVICTABLE);
 	spin_unlock_irq(&pgdat->lru_lock);
 }
@@ -561,20 +567,21 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
 }
 
 
-static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
+static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
 			    void *arg)
 {
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		int file = page_is_file_cache(page);
-		int lru = page_lru_base_type(page);
+	if (PageLRU(page) && PageSwapBacked(page) && !PageLazyFree(page) &&
+	    !PageUnevictable(page)) {
+		unsigned int nr_pages = PageTransHuge(page) ? HPAGE_PMD_NR : 1;
+		bool active = PageActive(page);
 
-		del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
+		del_page_from_lru_list(page, lruvec, LRU_INACTIVE_ANON + active);
 		ClearPageActive(page);
 		ClearPageReferenced(page);
-		add_page_to_lru_list(page, lruvec, lru);
+		SetPageLazyFree(page);
+		add_page_to_lru_list(page, lruvec, LRU_LAZYFREE);
 
-		__count_vm_event(PGDEACTIVATE);
-		update_page_reclaim_stat(lruvec, file, 0);
+		count_vm_events(PGLAZYFREE, nr_pages);
 	}
 }
 
@@ -604,9 +611,9 @@ void lru_add_drain_cpu(int cpu)
 	if (pagevec_count(pvec))
 		pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
 
-	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+	pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
 	if (pagevec_count(pvec))
-		pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+		pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
 
 	activate_page_drain(cpu);
 }
@@ -638,22 +645,22 @@ void deactivate_file_page(struct page *page)
 }
 
 /**
- * deactivate_page - deactivate a page
- * @page: page to deactivate
+ * move_page_to_lazyfree_list - move anon page to lazyfree list
+ * @page: page to move
  *
- * deactivate_page() moves @page to the inactive list if @page was on the active
- * list and was not an unevictable page.  This is done to accelerate the reclaim
- * of @page.
+ * This function moves @page to the lazyfree list after the page is the target
+ * of a MADV_FREE syscall. This is to accelerate the reclaim of the @page
  */
-void deactivate_page(struct page *page)
+void move_page_to_lazyfree_list(struct page *page)
 {
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+	if (PageLRU(page) && PageSwapBacked(page) && !PageLazyFree(page) &&
+	    !PageUnevictable(page)) {
+		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
 
 		get_page(page);
 		if (!pagevec_add(pvec, page) || PageCompound(page))
-			pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
-		put_cpu_var(lru_deactivate_pvecs);
+			pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
+		put_cpu_var(lru_lazyfree_pvecs);
 	}
 }
 
@@ -704,7 +711,7 @@ void lru_add_drain_all(void)
 		if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
 		    pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
 		    pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
-		    pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
+		    pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
 		    need_activate_page_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, lru_add_drain_wq, work);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 5/6] mm: reclaim lazyfree pages
  2017-01-30  5:51 ` Shaohua Li
@ 2017-01-30  5:51   ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

When memory pressure is high, we must free lazyfree pages. If we free
lazyfree pages, the cost reaccessing the pages is a page fault and page
allocation. The cost is much lower than swapin a page or refill a file
page cache because refilling anon/file page includes the same cost plus
extra IO cost, which is very high.

The policy to determine when to free lazyfree pages is controversial.
Some think lazyfree pages should be reclaimed first before any other
anon/file pages, because userspace already indicates the pages are not
important at all and the cost to refill lazyfree pages is much lower
than refilling anon/file page cache. Others think userspace could still
use the MADV_FREE pages otherwise userspace will directly use
MADV_DISCARD to free the pages. If page cache won't be used again, there
is no refill cost for page cache and thus in this case reclaiming
MADV_FREE pages doesn't make sense because refill MADV_FREE pages still
has cost.

This patch doesn't choose the latter. It's possible released page cache
never gets refilled, but the opposite case could happen very likely too.
Considering the refill cost of file/anon pages is much higher than
refill cost of MADV_FREE pages, it doesn't make sense to retain lazyfree
pages.

For the implementation, this is targeted for swapless system, so we
don't allocate a swap entry for lazyfree pages. If the pages can't be
reclaimed directly, they are put back into anon lru list and reclaimed
in normal way.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/rmap.c   |  7 ++++++-
 mm/vmscan.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index c48e9c1..f9b1023 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1546,13 +1546,18 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 * Store the swap location in the pte.
 		 * See handle_pte_fault() ...
 		 */
-		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+		VM_BUG_ON_PAGE(!PageSwapCache(page) && !PageLazyFree(page),
+			page);
 
 		if (!PageDirty(page) && (flags & TTU_LZFREE)) {
 			/* It's a freeable page by MADV_FREE */
 			dec_mm_counter(mm, MM_ANONPAGES);
 			rp->lazyfreed++;
 			goto discard;
+		} else if (flags & TTU_LZFREE) {
+			set_pte_at(mm, address, pte, pteval);
+			ret = SWAP_FAIL;
+			goto out_unmap;
 		}
 
 		if (swap_duplicate(entry) < 0) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3a0d05b..f809f04 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -974,7 +974,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		int may_enter_fs;
 		enum page_references references = PAGEREF_RECLAIM_CLEAN;
 		bool dirty, writeback;
-		bool lazyfree = false;
+		bool lazyfree;
 		int ret = SWAP_SUCCESS;
 
 		cond_resched();
@@ -989,6 +989,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		sc->nr_scanned++;
 
+		lazyfree = page_is_lazyfree(page);
+
 		if (unlikely(!page_evictable(page)))
 			goto cull_mlocked;
 
@@ -996,7 +998,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto keep_locked;
 
 		/* Double the slab pressure for mapped and swapcache pages */
-		if (page_mapped(page) || PageSwapCache(page))
+		if ((page_mapped(page) || PageSwapCache(page)) && !lazyfree)
 			sc->nr_scanned++;
 
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
@@ -1110,6 +1112,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			; /* try to reclaim the page below */
 		}
 
+		/* lazyfree page could be freed directly */
+		if (lazyfree) {
+			if (unlikely(PageTransHuge(page)) &&
+			    split_huge_page_to_list(page, page_list))
+				goto keep_locked;
+			goto unmap_page;
+		}
+
 		/*
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
@@ -1119,7 +1129,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			if (!add_to_swap(page, page_list))
 				goto activate_locked;
-			lazyfree = true;
 			may_enter_fs = 1;
 
 			/* Adding to swap updated mapping */
@@ -1130,13 +1139,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 		}
 
+unmap_page:
 		VM_BUG_ON_PAGE(PageTransHuge(page), page);
 
 		/*
 		 * The page is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.
 		 */
-		if (page_mapped(page) && mapping) {
+		if (page_mapped(page) && (mapping || lazyfree)) {
 			switch (ret = try_to_unmap(page, lazyfree ?
 				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
 				(ttu_flags | TTU_BATCH_FLUSH))) {
@@ -1148,7 +1158,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			case SWAP_MLOCK:
 				goto cull_mlocked;
 			case SWAP_LZFREE:
-				goto lazyfree;
+				if (page_ref_freeze(page, 1)) {
+					if (!PageDirty(page))
+						goto lazyfree;
+					else
+						page_ref_unfreeze(page, 1);
+				}
+				goto keep_locked;
 			case SWAP_SUCCESS:
 				; /* try to free the page below */
 			}
@@ -1260,10 +1276,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-lazyfree:
 		if (!mapping || !__remove_mapping(mapping, page, true))
 			goto keep_locked;
 
+lazyfree:
 		/*
 		 * At this point, we have no other references and there is
 		 * no way to pick any more up (removed from LRU, removed
@@ -1288,6 +1304,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 cull_mlocked:
 		if (PageSwapCache(page))
 			try_to_free_swap(page);
+		if (lazyfree)
+			ClearPageLazyFree(page);
 		unlock_page(page);
 		list_add(&page->lru, &ret_pages);
 		continue;
@@ -1297,6 +1315,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (PageSwapCache(page) && mem_cgroup_swap_full(page))
 			try_to_free_swap(page);
 		VM_BUG_ON_PAGE(PageActive(page), page);
+		if (lazyfree)
+			ClearPageLazyFree(page);
 		SetPageActive(page);
 		pgactivate++;
 keep_locked:
@@ -1743,6 +1763,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 				     &nr_scanned, sc, isolate_mode, lru);
 
 	__mod_node_page_state(pgdat, lru_isolate_index(lru), nr_taken);
+	/* LAZYFREE pages will be charged into anon recent_scanned */
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	if (global_reclaim(sc)) {
@@ -1830,7 +1851,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (stat.nr_immediate && current_may_throttle())
+		if (stat.nr_immediate && current_may_throttle() &&
+		    lru != LRU_LAZYFREE)
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
 
@@ -1840,7 +1862,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	 * unqueued dirty pages or cycling through the LRU too quickly.
 	 */
 	if (!sc->hibernation_mode && !current_is_kswapd() &&
-	    current_may_throttle())
+	    current_may_throttle() && lru != LRU_LAZYFREE)
 		wait_iff_congested(pgdat, BLK_RW_ASYNC, HZ/10);
 
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
@@ -2342,6 +2364,24 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
 	struct blk_plug plug;
 	bool scan_adjusted;
 
+	/* reclaim all lazyfree pages so don't apply priority  */
+	nr[LRU_LAZYFREE] = lruvec_lru_size(lruvec, LRU_LAZYFREE, sc->reclaim_idx);
+	while (nr[LRU_LAZYFREE]) {
+		nr_to_scan = min(nr[LRU_LAZYFREE], SWAP_CLUSTER_MAX);
+		nr[LRU_LAZYFREE] -= nr_to_scan;
+		nr_reclaimed += shrink_inactive_list(nr_to_scan, lruvec, sc,
+			LRU_LAZYFREE);
+
+		if (nr_reclaimed >= nr_to_reclaim)
+			break;
+		cond_resched();
+	}
+
+	if (nr_reclaimed >= nr_to_reclaim) {
+		sc->nr_reclaimed += nr_reclaimed;
+		return;
+	}
+
 	get_scan_count(lruvec, memcg, sc, nr, lru_pages);
 
 	/* Record the original scan target for proportional adjustments later */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 5/6] mm: reclaim lazyfree pages
@ 2017-01-30  5:51   ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

When memory pressure is high, we must free lazyfree pages. If we free
lazyfree pages, the cost reaccessing the pages is a page fault and page
allocation. The cost is much lower than swapin a page or refill a file
page cache because refilling anon/file page includes the same cost plus
extra IO cost, which is very high.

The policy to determine when to free lazyfree pages is controversial.
Some think lazyfree pages should be reclaimed first before any other
anon/file pages, because userspace already indicates the pages are not
important at all and the cost to refill lazyfree pages is much lower
than refilling anon/file page cache. Others think userspace could still
use the MADV_FREE pages otherwise userspace will directly use
MADV_DISCARD to free the pages. If page cache won't be used again, there
is no refill cost for page cache and thus in this case reclaiming
MADV_FREE pages doesn't make sense because refill MADV_FREE pages still
has cost.

This patch doesn't choose the latter. It's possible released page cache
never gets refilled, but the opposite case could happen very likely too.
Considering the refill cost of file/anon pages is much higher than
refill cost of MADV_FREE pages, it doesn't make sense to retain lazyfree
pages.

For the implementation, this is targeted for swapless system, so we
don't allocate a swap entry for lazyfree pages. If the pages can't be
reclaimed directly, they are put back into anon lru list and reclaimed
in normal way.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/rmap.c   |  7 ++++++-
 mm/vmscan.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index c48e9c1..f9b1023 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1546,13 +1546,18 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 * Store the swap location in the pte.
 		 * See handle_pte_fault() ...
 		 */
-		VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+		VM_BUG_ON_PAGE(!PageSwapCache(page) && !PageLazyFree(page),
+			page);
 
 		if (!PageDirty(page) && (flags & TTU_LZFREE)) {
 			/* It's a freeable page by MADV_FREE */
 			dec_mm_counter(mm, MM_ANONPAGES);
 			rp->lazyfreed++;
 			goto discard;
+		} else if (flags & TTU_LZFREE) {
+			set_pte_at(mm, address, pte, pteval);
+			ret = SWAP_FAIL;
+			goto out_unmap;
 		}
 
 		if (swap_duplicate(entry) < 0) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 3a0d05b..f809f04 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -974,7 +974,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		int may_enter_fs;
 		enum page_references references = PAGEREF_RECLAIM_CLEAN;
 		bool dirty, writeback;
-		bool lazyfree = false;
+		bool lazyfree;
 		int ret = SWAP_SUCCESS;
 
 		cond_resched();
@@ -989,6 +989,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		sc->nr_scanned++;
 
+		lazyfree = page_is_lazyfree(page);
+
 		if (unlikely(!page_evictable(page)))
 			goto cull_mlocked;
 
@@ -996,7 +998,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto keep_locked;
 
 		/* Double the slab pressure for mapped and swapcache pages */
-		if (page_mapped(page) || PageSwapCache(page))
+		if ((page_mapped(page) || PageSwapCache(page)) && !lazyfree)
 			sc->nr_scanned++;
 
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
@@ -1110,6 +1112,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			; /* try to reclaim the page below */
 		}
 
+		/* lazyfree page could be freed directly */
+		if (lazyfree) {
+			if (unlikely(PageTransHuge(page)) &&
+			    split_huge_page_to_list(page, page_list))
+				goto keep_locked;
+			goto unmap_page;
+		}
+
 		/*
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
@@ -1119,7 +1129,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			if (!add_to_swap(page, page_list))
 				goto activate_locked;
-			lazyfree = true;
 			may_enter_fs = 1;
 
 			/* Adding to swap updated mapping */
@@ -1130,13 +1139,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 		}
 
+unmap_page:
 		VM_BUG_ON_PAGE(PageTransHuge(page), page);
 
 		/*
 		 * The page is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.
 		 */
-		if (page_mapped(page) && mapping) {
+		if (page_mapped(page) && (mapping || lazyfree)) {
 			switch (ret = try_to_unmap(page, lazyfree ?
 				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
 				(ttu_flags | TTU_BATCH_FLUSH))) {
@@ -1148,7 +1158,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			case SWAP_MLOCK:
 				goto cull_mlocked;
 			case SWAP_LZFREE:
-				goto lazyfree;
+				if (page_ref_freeze(page, 1)) {
+					if (!PageDirty(page))
+						goto lazyfree;
+					else
+						page_ref_unfreeze(page, 1);
+				}
+				goto keep_locked;
 			case SWAP_SUCCESS:
 				; /* try to free the page below */
 			}
@@ -1260,10 +1276,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-lazyfree:
 		if (!mapping || !__remove_mapping(mapping, page, true))
 			goto keep_locked;
 
+lazyfree:
 		/*
 		 * At this point, we have no other references and there is
 		 * no way to pick any more up (removed from LRU, removed
@@ -1288,6 +1304,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 cull_mlocked:
 		if (PageSwapCache(page))
 			try_to_free_swap(page);
+		if (lazyfree)
+			ClearPageLazyFree(page);
 		unlock_page(page);
 		list_add(&page->lru, &ret_pages);
 		continue;
@@ -1297,6 +1315,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (PageSwapCache(page) && mem_cgroup_swap_full(page))
 			try_to_free_swap(page);
 		VM_BUG_ON_PAGE(PageActive(page), page);
+		if (lazyfree)
+			ClearPageLazyFree(page);
 		SetPageActive(page);
 		pgactivate++;
 keep_locked:
@@ -1743,6 +1763,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 				     &nr_scanned, sc, isolate_mode, lru);
 
 	__mod_node_page_state(pgdat, lru_isolate_index(lru), nr_taken);
+	/* LAZYFREE pages will be charged into anon recent_scanned */
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	if (global_reclaim(sc)) {
@@ -1830,7 +1851,8 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 		 * that pages are cycling through the LRU faster than
 		 * they are written so also forcibly stall.
 		 */
-		if (stat.nr_immediate && current_may_throttle())
+		if (stat.nr_immediate && current_may_throttle() &&
+		    lru != LRU_LAZYFREE)
 			congestion_wait(BLK_RW_ASYNC, HZ/10);
 	}
 
@@ -1840,7 +1862,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	 * unqueued dirty pages or cycling through the LRU too quickly.
 	 */
 	if (!sc->hibernation_mode && !current_is_kswapd() &&
-	    current_may_throttle())
+	    current_may_throttle() && lru != LRU_LAZYFREE)
 		wait_iff_congested(pgdat, BLK_RW_ASYNC, HZ/10);
 
 	trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
@@ -2342,6 +2364,24 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
 	struct blk_plug plug;
 	bool scan_adjusted;
 
+	/* reclaim all lazyfree pages so don't apply priority  */
+	nr[LRU_LAZYFREE] = lruvec_lru_size(lruvec, LRU_LAZYFREE, sc->reclaim_idx);
+	while (nr[LRU_LAZYFREE]) {
+		nr_to_scan = min(nr[LRU_LAZYFREE], SWAP_CLUSTER_MAX);
+		nr[LRU_LAZYFREE] -= nr_to_scan;
+		nr_reclaimed += shrink_inactive_list(nr_to_scan, lruvec, sc,
+			LRU_LAZYFREE);
+
+		if (nr_reclaimed >= nr_to_reclaim)
+			break;
+		cond_resched();
+	}
+
+	if (nr_reclaimed >= nr_to_reclaim) {
+		sc->nr_reclaimed += nr_reclaimed;
+		return;
+	}
+
 	get_scan_count(lruvec, memcg, sc, nr, lru_pages);
 
 	/* Record the original scan target for proportional adjustments later */
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 6/6] mm: enable MADV_FREE for swapless system
  2017-01-30  5:51 ` Shaohua Li
@ 2017-01-30  5:51   ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

Now MADV_FREE pages can be easily reclaimed even for swapless system. We
can safely enable MADV_FREE for all systems.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/madvise.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 78b4b02..047cfd4 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -579,13 +579,7 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	case MADV_WILLNEED:
 		return madvise_willneed(vma, prev, start, end);
 	case MADV_FREE:
-		/*
-		 * XXX: In this implementation, MADV_FREE works like
-		 * MADV_DONTNEED on swapless system or full swap.
-		 */
-		if (get_nr_swap_pages() > 0)
-			return madvise_free(vma, prev, start, end);
-		/* passthrough */
+		return madvise_free(vma, prev, start, end);
 	case MADV_DONTNEED:
 		return madvise_dontneed(vma, prev, start, end);
 	default:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC 6/6] mm: enable MADV_FREE for swapless system
@ 2017-01-30  5:51   ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-30  5:51 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman

Now MADV_FREE pages can be easily reclaimed even for swapless system. We
can safely enable MADV_FREE for all systems.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/madvise.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 78b4b02..047cfd4 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -579,13 +579,7 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	case MADV_WILLNEED:
 		return madvise_willneed(vma, prev, start, end);
 	case MADV_FREE:
-		/*
-		 * XXX: In this implementation, MADV_FREE works like
-		 * MADV_DONTNEED on swapless system or full swap.
-		 */
-		if (get_nr_swap_pages() > 0)
-			return madvise_free(vma, prev, start, end);
-		/* passthrough */
+		return madvise_free(vma, prev, start, end);
 	case MADV_DONTNEED:
 		return madvise_dontneed(vma, prev, start, end);
 	default:
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-01-30  5:51 ` Shaohua Li
@ 2017-01-31 18:59   ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2017-01-31 18:59 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

Hi Shaohua,

On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> solving the issues, jemalloc can't use the MADV_FREE feature.
> - Doesn't support system without swap enabled. Because if swap is off, we can't
>   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
>   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
>   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
>   But in our environment, a lot of machines don't enable swap. This will prevent
>   our setup using MADV_FREE.
> - Increases memory pressure. page reclaim bias file pages reclaim against
>   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
>   pages could be freed easily and refilled with very slight penality. Even page
>   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
>   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
>   page, we probably must scan a lot of other anonymous pages, which is
>   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
>   without it.

Fully agreed, the anon LRU is a bad place for these pages.

> For the first two issues, introducing a new LRU list for MADV_FREE pages could
> solve the issues. We can directly reclaim MADV_FREE pages without writting them
> out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> the new list, page reclaim can easily reclaim such pages without interference
> of file or anonymous pages. The memory pressure issue will disappear.

Do we actually need a new page flag and a special LRU for them? These
pages are basically like clean cache pages at that point. What do you
think about clearing their PG_swapbacked flag on MADV_FREE and moving
them to the inactive file list? The way isolate+putback works should
not even need much modification, something like clear_page_mlock().

When the reclaim scanner finds anon && dirty && !swapbacked, it can
again set PG_swapbacked and goto keep_locked to move the page back
into the anon LRU to get reclaimed according to swapping rules.

> For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> count will be increased in madvise syscall and decreased in page reclaim (eg,
> unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> active page there. But there isn't mm_struct context at that place. Iterating
> vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> somebody can share a hint how to fix this issue.

This problem also goes away if we use the file LRUs.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-01-31 18:59   ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2017-01-31 18:59 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

Hi Shaohua,

On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> solving the issues, jemalloc can't use the MADV_FREE feature.
> - Doesn't support system without swap enabled. Because if swap is off, we can't
>   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
>   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
>   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
>   But in our environment, a lot of machines don't enable swap. This will prevent
>   our setup using MADV_FREE.
> - Increases memory pressure. page reclaim bias file pages reclaim against
>   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
>   pages could be freed easily and refilled with very slight penality. Even page
>   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
>   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
>   page, we probably must scan a lot of other anonymous pages, which is
>   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
>   without it.

Fully agreed, the anon LRU is a bad place for these pages.

> For the first two issues, introducing a new LRU list for MADV_FREE pages could
> solve the issues. We can directly reclaim MADV_FREE pages without writting them
> out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> the new list, page reclaim can easily reclaim such pages without interference
> of file or anonymous pages. The memory pressure issue will disappear.

Do we actually need a new page flag and a special LRU for them? These
pages are basically like clean cache pages at that point. What do you
think about clearing their PG_swapbacked flag on MADV_FREE and moving
them to the inactive file list? The way isolate+putback works should
not even need much modification, something like clear_page_mlock().

When the reclaim scanner finds anon && dirty && !swapbacked, it can
again set PG_swapbacked and goto keep_locked to move the page back
into the anon LRU to get reclaimed according to swapping rules.

> For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> count will be increased in madvise syscall and decreased in page reclaim (eg,
> unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> active page there. But there isn't mm_struct context at that place. Iterating
> vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> somebody can share a hint how to fix this issue.

This problem also goes away if we use the file LRUs.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-01-31 18:59   ` Johannes Weiner
@ 2017-01-31 19:45     ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-31 19:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> Hi Shaohua,
> 
> On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > solving the issues, jemalloc can't use the MADV_FREE feature.
> > - Doesn't support system without swap enabled. Because if swap is off, we can't
> >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> >   But in our environment, a lot of machines don't enable swap. This will prevent
> >   our setup using MADV_FREE.
> > - Increases memory pressure. page reclaim bias file pages reclaim against
> >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> >   pages could be freed easily and refilled with very slight penality. Even page
> >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> >   page, we probably must scan a lot of other anonymous pages, which is
> >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> >   without it.
> 
> Fully agreed, the anon LRU is a bad place for these pages.
> 
> > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > the new list, page reclaim can easily reclaim such pages without interference
> > of file or anonymous pages. The memory pressure issue will disappear.
> 
> Do we actually need a new page flag and a special LRU for them? These
> pages are basically like clean cache pages at that point. What do you
> think about clearing their PG_swapbacked flag on MADV_FREE and moving
> them to the inactive file list? The way isolate+putback works should
> not even need much modification, something like clear_page_mlock().
> 
> When the reclaim scanner finds anon && dirty && !swapbacked, it can
> again set PG_swapbacked and goto keep_locked to move the page back
> into the anon LRU to get reclaimed according to swapping rules.

Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
are mixed with inactive file pages, page reclaim need to reclaim a lot of file
pages first before reclaim the MADV_FREE pages. This doesn't look good. The
point of a separate LRU is to avoid scan other anon/file pages.
 
> > For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> > count will be increased in madvise syscall and decreased in page reclaim (eg,
> > unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> > active page there. But there isn't mm_struct context at that place. Iterating
> > vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> > somebody can share a hint how to fix this issue.
> 
> This problem also goes away if we use the file LRUs.

Can you elaborate this please? Maybe you mean charge them to MM_FILEPAGES? But
that doesn't solve the problem. 'statm' proc file will still report a big RSS.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-01-31 19:45     ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-01-31 19:45 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> Hi Shaohua,
> 
> On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > solving the issues, jemalloc can't use the MADV_FREE feature.
> > - Doesn't support system without swap enabled. Because if swap is off, we can't
> >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> >   But in our environment, a lot of machines don't enable swap. This will prevent
> >   our setup using MADV_FREE.
> > - Increases memory pressure. page reclaim bias file pages reclaim against
> >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> >   pages could be freed easily and refilled with very slight penality. Even page
> >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> >   page, we probably must scan a lot of other anonymous pages, which is
> >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> >   without it.
> 
> Fully agreed, the anon LRU is a bad place for these pages.
> 
> > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > the new list, page reclaim can easily reclaim such pages without interference
> > of file or anonymous pages. The memory pressure issue will disappear.
> 
> Do we actually need a new page flag and a special LRU for them? These
> pages are basically like clean cache pages at that point. What do you
> think about clearing their PG_swapbacked flag on MADV_FREE and moving
> them to the inactive file list? The way isolate+putback works should
> not even need much modification, something like clear_page_mlock().
> 
> When the reclaim scanner finds anon && dirty && !swapbacked, it can
> again set PG_swapbacked and goto keep_locked to move the page back
> into the anon LRU to get reclaimed according to swapping rules.

Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
are mixed with inactive file pages, page reclaim need to reclaim a lot of file
pages first before reclaim the MADV_FREE pages. This doesn't look good. The
point of a separate LRU is to avoid scan other anon/file pages.
 
> > For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> > count will be increased in madvise syscall and decreased in page reclaim (eg,
> > unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> > active page there. But there isn't mm_struct context at that place. Iterating
> > vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> > somebody can share a hint how to fix this issue.
> 
> This problem also goes away if we use the file LRUs.

Can you elaborate this please? Maybe you mean charge them to MM_FILEPAGES? But
that doesn't solve the problem. 'statm' proc file will still report a big RSS.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-01-31 19:45     ` Shaohua Li
@ 2017-01-31 21:38       ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2017-01-31 21:38 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > Hi Shaohua,
> > 
> > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > >   our setup using MADV_FREE.
> > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > >   pages could be freed easily and refilled with very slight penality. Even page
> > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > >   page, we probably must scan a lot of other anonymous pages, which is
> > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > >   without it.
> > 
> > Fully agreed, the anon LRU is a bad place for these pages.
> > 
> > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > the new list, page reclaim can easily reclaim such pages without interference
> > > of file or anonymous pages. The memory pressure issue will disappear.
> > 
> > Do we actually need a new page flag and a special LRU for them? These
> > pages are basically like clean cache pages at that point. What do you
> > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > them to the inactive file list? The way isolate+putback works should
> > not even need much modification, something like clear_page_mlock().
> > 
> > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > again set PG_swapbacked and goto keep_locked to move the page back
> > into the anon LRU to get reclaimed according to swapping rules.
> 
> Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> point of a separate LRU is to avoid scan other anon/file pages.

The LRU code and the rest of VM already use independent page type
distinctions. That's because shmem pages are !PageAnon - they have a
page->mapping that points to a real address space, not an anon_vma -
but they are swapbacked and thus go through the anon LRU. This would
just do the reverse: put PageAnon pages on the file LRU when they
don't contain valid data and are thus not swapbacked.

As far as mixing with inactive file pages goes, it'd be possible to
link the MADV_FREE pages to the tail of the inactive list, rather than
the head. That said, I'm not sure reclaiming use-once filesystem cache
before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
the sole purpose of reusing them in the (near) future. That is
actually a stronger reuse signal than we have for use-once file pages.
If somebody does continuous writes to a logfile or a one-off search
through one or more files, we should actually reclaim that cache
before we go after MADV_FREE pages that are temporarily invalidated.

> > > For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> > > count will be increased in madvise syscall and decreased in page reclaim (eg,
> > > unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> > > active page there. But there isn't mm_struct context at that place. Iterating
> > > vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> > > somebody can share a hint how to fix this issue.
> > 
> > This problem also goes away if we use the file LRUs.
> 
> Can you elaborate this please? Maybe you mean charge them to MM_FILEPAGES? But
> that doesn't solve the problem. 'statm' proc file will still report a big RSS.

Sorry, I was just referring to the activate_page(). If we use the file
LRUs, then page activation has a clear target. And we wouldn't have to
adjust any RSS counters when a lazyfreed page is activated.

If we have MM context everywhere else, can we add MM_LAZYPAGES or
something and exclude them from MM_ANONPAGES? The total RSS count will
still include everything (including mapped clean cache, which is also
easily reclaimable btw), but /proc/foo/status could provide a detailed
breakdown and allow the user to look at only RssAnon.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-01-31 21:38       ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2017-01-31 21:38 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > Hi Shaohua,
> > 
> > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > >   our setup using MADV_FREE.
> > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > >   pages could be freed easily and refilled with very slight penality. Even page
> > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > >   page, we probably must scan a lot of other anonymous pages, which is
> > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > >   without it.
> > 
> > Fully agreed, the anon LRU is a bad place for these pages.
> > 
> > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > the new list, page reclaim can easily reclaim such pages without interference
> > > of file or anonymous pages. The memory pressure issue will disappear.
> > 
> > Do we actually need a new page flag and a special LRU for them? These
> > pages are basically like clean cache pages at that point. What do you
> > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > them to the inactive file list? The way isolate+putback works should
> > not even need much modification, something like clear_page_mlock().
> > 
> > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > again set PG_swapbacked and goto keep_locked to move the page back
> > into the anon LRU to get reclaimed according to swapping rules.
> 
> Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> point of a separate LRU is to avoid scan other anon/file pages.

The LRU code and the rest of VM already use independent page type
distinctions. That's because shmem pages are !PageAnon - they have a
page->mapping that points to a real address space, not an anon_vma -
but they are swapbacked and thus go through the anon LRU. This would
just do the reverse: put PageAnon pages on the file LRU when they
don't contain valid data and are thus not swapbacked.

As far as mixing with inactive file pages goes, it'd be possible to
link the MADV_FREE pages to the tail of the inactive list, rather than
the head. That said, I'm not sure reclaiming use-once filesystem cache
before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
the sole purpose of reusing them in the (near) future. That is
actually a stronger reuse signal than we have for use-once file pages.
If somebody does continuous writes to a logfile or a one-off search
through one or more files, we should actually reclaim that cache
before we go after MADV_FREE pages that are temporarily invalidated.

> > > For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> > > count will be increased in madvise syscall and decreased in page reclaim (eg,
> > > unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> > > active page there. But there isn't mm_struct context at that place. Iterating
> > > vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> > > somebody can share a hint how to fix this issue.
> > 
> > This problem also goes away if we use the file LRUs.
> 
> Can you elaborate this please? Maybe you mean charge them to MM_FILEPAGES? But
> that doesn't solve the problem. 'statm' proc file will still report a big RSS.

Sorry, I was just referring to the activate_page(). If we use the file
LRUs, then page activation has a clear target. And we wouldn't have to
adjust any RSS counters when a lazyfreed page is activated.

If we have MM context everywhere else, can we add MM_LAZYPAGES or
something and exclude them from MM_ANONPAGES? The total RSS count will
still include everything (including mapped clean cache, which is also
easily reclaimable btw), but /proc/foo/status could provide a detailed
breakdown and allow the user to look at only RssAnon.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-01-30  5:51 ` Shaohua Li
@ 2017-02-01  5:47   ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2017-02-01  5:47 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, danielmicay

Hi Shaohua,

On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> Hi,
> 
> We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> solving the issues, jemalloc can't use the MADV_FREE feature.
> - Doesn't support system without swap enabled. Because if swap is off, we can't
>   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
>   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
>   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
>   But in our environment, a lot of machines don't enable swap. This will prevent
>   our setup using MADV_FREE.
> - Increases memory pressure. page reclaim bias file pages reclaim against
>   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
>   pages could be freed easily and refilled with very slight penality. Even page
>   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
>   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
>   page, we probably must scan a lot of other anonymous pages, which is
>   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
>   without it.
> - RSS accounting. MADV_FREE pages are accounted as normal anon pages and
>   reclaimed lazily, so application's RSS becomes bigger. This confuses our
>   workloads. We have monitoring daemon running and if it finds applications' RSS
>   becomes abnormal, the daemon will kill the applications even kernel can reclaim
>   the memory easily. Currently we don't export separate RSS accounting for
>   MADV_FREE pages. This will prevent our setup using MADV_FREE too.
> 
> For the first two issues, introducing a new LRU list for MADV_FREE pages could
> solve the issues. We can directly reclaim MADV_FREE pages without writting them
> out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> the new list, page reclaim can easily reclaim such pages without interference
> of file or anonymous pages. The memory pressure issue will disappear.
> 
> Actually Minchan posted patches to add the LRU list before, but he didn't
> pursue. So I picked up them and the patches are based on Minchan's previous
> patches. The main difference between my patches and Minchan previous patches is
> page reclaim policy. Minchan's patches introduces a knob to balance the reclaim
> of MADV_FREE pages and anon/file pages, while the patches always reclaim
> MADV_FREE pages first if there are. I described the reason in patch 5.

First of all, thanks for th effort to support MADV_FREE for swapless system,
Shaohua!

CCing Daniel,

The reason I have postponed is due to controverial part about balancing
used-once vs. madv_freed apges. I thought it doesn't make sense to reclaim
madv_freed pages first even if there are lots of used-once pages.

Recently, Johannes posted patches for balancing file/anon and it was based
on the cost model, IIRC. I wanted to base on it.

The idea is VM reclaims file-based pages and if refault happens, we can measure
refault distance and sizeof(LRU_LAZYFREE list). If refault distance is smaller
than lazyfree LRU list's size, it means the file-backed page have been kept
in memory if we have discarded lazyfree pages so it adds more cost to reclaim
lazyfree LRU list more agressively.

I tested your patch with simple MADV_FREE workload(just alloc and then repeated
touch/madv_free) with background stream-read process. In that case, the
MADV_FREE workload regressed in half without any gain for stream-read process.

I tested hacky code to simulate feedback loop I suggested idea and it restores
the performance regression. I'm not saying below hacky patch should merge in
but I think we should have used-once reclaim feedback logic to prevent
unnecessary purging for madv_freed pages.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 589a165..39d4bba 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -703,6 +703,7 @@ typedef struct pglist_data {
 	/* Per-node vmstats */
 	struct per_cpu_nodestat __percpu *per_cpu_nodestats;
 	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
+	bool lazyfree;
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f809f04..cf54b81 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2364,22 +2364,25 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
 	struct blk_plug plug;
 	bool scan_adjusted;
 
-	/* reclaim all lazyfree pages so don't apply priority  */
-	nr[LRU_LAZYFREE] = lruvec_lru_size(lruvec, LRU_LAZYFREE, sc->reclaim_idx);
-	while (nr[LRU_LAZYFREE]) {
-		nr_to_scan = min(nr[LRU_LAZYFREE], SWAP_CLUSTER_MAX);
-		nr[LRU_LAZYFREE] -= nr_to_scan;
-		nr_reclaimed += shrink_inactive_list(nr_to_scan, lruvec, sc,
-			LRU_LAZYFREE);
-
-		if (nr_reclaimed >= nr_to_reclaim)
-			break;
-		cond_resched();
-	}
+	if (pgdat->lazyfree) {
+		/* reclaim all lazyfree pages so don't apply priority  */
+		nr[LRU_LAZYFREE] = lruvec_lru_size(lruvec, LRU_LAZYFREE, sc->reclaim_idx);
+		while (nr[LRU_LAZYFREE]) {
+			nr_to_scan = min(nr[LRU_LAZYFREE], SWAP_CLUSTER_MAX);
+			nr[LRU_LAZYFREE] -= nr_to_scan;
+			nr_reclaimed += shrink_inactive_list(nr_to_scan, lruvec, sc,
+				LRU_LAZYFREE);
+
+			if (nr_reclaimed >= nr_to_reclaim)
+				break;
+			cond_resched();
+		}
 
-	if (nr_reclaimed >= nr_to_reclaim) {
-		sc->nr_reclaimed += nr_reclaimed;
-		return;
+		if (nr_reclaimed >= nr_to_reclaim) {
+			sc->nr_reclaimed += nr_reclaimed;
+			pgdat->lazyfree = false;
+			return;
+		}
 	}
 
 	get_scan_count(lruvec, memcg, sc, nr, lru_pages);
diff --git a/mm/workingset.c b/mm/workingset.c
index c573cb2..2a01b91 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -233,7 +233,7 @@ void *workingset_eviction(struct address_space *mapping, struct page *page)
 bool workingset_refault(void *shadow)
 {
 	unsigned long refault_distance;
-	unsigned long active_file;
+	unsigned long active_file, lazyfree;
 	struct mem_cgroup *memcg;
 	unsigned long eviction;
 	struct lruvec *lruvec;
@@ -268,6 +268,7 @@ bool workingset_refault(void *shadow)
 	lruvec = mem_cgroup_lruvec(pgdat, memcg);
 	refault = atomic_long_read(&lruvec->inactive_age);
 	active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES);
+	lazyfree = lruvec_lru_size(lruvec, LRU_LAZYFREE, MAX_NR_ZONES);
 	rcu_read_unlock();
 
 	/*
@@ -290,6 +291,9 @@ bool workingset_refault(void *shadow)
 
 	inc_node_state(pgdat, WORKINGSET_REFAULT);
 
+	if (refault_distance <= lazyfree)
+		pgdat->lazyfree = true;
+
 	if (refault_distance <= active_file) {
 		inc_node_state(pgdat, WORKINGSET_ACTIVATE);
 		return true;

> 
> For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> count will be increased in madvise syscall and decreased in page reclaim (eg,
> unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> active page there. But there isn't mm_struct context at that place. Iterating
> vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> somebody can share a hint how to fix this issue.
> 
> Thanks,
> Shaohua
> 
> Minchan previous patches:
> http://marc.info/?l=linux-mm&m=144800657002763&w=2
> 
> Shaohua Li (6):
>   mm: add wrap for page accouting index
>   mm: add lazyfree page flag
>   mm: add LRU_LAZYFREE lru list
>   mm: move MADV_FREE pages into LRU_LAZYFREE list
>   mm: reclaim lazyfree pages
>   mm: enable MADV_FREE for swapless system
> 
>  drivers/base/node.c                       |  2 +
>  drivers/staging/android/lowmemorykiller.c |  3 +-
>  fs/proc/meminfo.c                         |  1 +
>  fs/proc/task_mmu.c                        |  8 ++-
>  include/linux/mm_inline.h                 | 41 +++++++++++++
>  include/linux/mmzone.h                    |  9 +++
>  include/linux/page-flags.h                |  6 ++
>  include/linux/swap.h                      |  2 +-
>  include/linux/vm_event_item.h             |  2 +-
>  include/trace/events/mmflags.h            |  1 +
>  include/trace/events/vmscan.h             | 31 +++++-----
>  kernel/power/snapshot.c                   |  1 +
>  mm/compaction.c                           | 11 ++--
>  mm/huge_memory.c                          |  6 +-
>  mm/khugepaged.c                           |  6 +-
>  mm/madvise.c                              | 11 +---
>  mm/memcontrol.c                           |  4 ++
>  mm/memory-failure.c                       |  3 +-
>  mm/memory_hotplug.c                       |  3 +-
>  mm/mempolicy.c                            |  3 +-
>  mm/migrate.c                              | 29 ++++------
>  mm/page_alloc.c                           | 10 ++++
>  mm/rmap.c                                 |  7 ++-
>  mm/swap.c                                 | 51 +++++++++-------
>  mm/vmscan.c                               | 96 +++++++++++++++++++++++--------
>  mm/vmstat.c                               |  4 ++
>  26 files changed, 242 insertions(+), 109 deletions(-)
> 
> -- 
> 2.9.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-02-01  5:47   ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2017-02-01  5:47 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, danielmicay

Hi Shaohua,

On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> Hi,
> 
> We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> solving the issues, jemalloc can't use the MADV_FREE feature.
> - Doesn't support system without swap enabled. Because if swap is off, we can't
>   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
>   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
>   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
>   But in our environment, a lot of machines don't enable swap. This will prevent
>   our setup using MADV_FREE.
> - Increases memory pressure. page reclaim bias file pages reclaim against
>   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
>   pages could be freed easily and refilled with very slight penality. Even page
>   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
>   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
>   page, we probably must scan a lot of other anonymous pages, which is
>   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
>   without it.
> - RSS accounting. MADV_FREE pages are accounted as normal anon pages and
>   reclaimed lazily, so application's RSS becomes bigger. This confuses our
>   workloads. We have monitoring daemon running and if it finds applications' RSS
>   becomes abnormal, the daemon will kill the applications even kernel can reclaim
>   the memory easily. Currently we don't export separate RSS accounting for
>   MADV_FREE pages. This will prevent our setup using MADV_FREE too.
> 
> For the first two issues, introducing a new LRU list for MADV_FREE pages could
> solve the issues. We can directly reclaim MADV_FREE pages without writting them
> out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> the new list, page reclaim can easily reclaim such pages without interference
> of file or anonymous pages. The memory pressure issue will disappear.
> 
> Actually Minchan posted patches to add the LRU list before, but he didn't
> pursue. So I picked up them and the patches are based on Minchan's previous
> patches. The main difference between my patches and Minchan previous patches is
> page reclaim policy. Minchan's patches introduces a knob to balance the reclaim
> of MADV_FREE pages and anon/file pages, while the patches always reclaim
> MADV_FREE pages first if there are. I described the reason in patch 5.

First of all, thanks for th effort to support MADV_FREE for swapless system,
Shaohua!

CCing Daniel,

The reason I have postponed is due to controverial part about balancing
used-once vs. madv_freed apges. I thought it doesn't make sense to reclaim
madv_freed pages first even if there are lots of used-once pages.

Recently, Johannes posted patches for balancing file/anon and it was based
on the cost model, IIRC. I wanted to base on it.

The idea is VM reclaims file-based pages and if refault happens, we can measure
refault distance and sizeof(LRU_LAZYFREE list). If refault distance is smaller
than lazyfree LRU list's size, it means the file-backed page have been kept
in memory if we have discarded lazyfree pages so it adds more cost to reclaim
lazyfree LRU list more agressively.

I tested your patch with simple MADV_FREE workload(just alloc and then repeated
touch/madv_free) with background stream-read process. In that case, the
MADV_FREE workload regressed in half without any gain for stream-read process.

I tested hacky code to simulate feedback loop I suggested idea and it restores
the performance regression. I'm not saying below hacky patch should merge in
but I think we should have used-once reclaim feedback logic to prevent
unnecessary purging for madv_freed pages.

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 589a165..39d4bba 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -703,6 +703,7 @@ typedef struct pglist_data {
 	/* Per-node vmstats */
 	struct per_cpu_nodestat __percpu *per_cpu_nodestats;
 	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
+	bool lazyfree;
 } pg_data_t;
 
 #define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index f809f04..cf54b81 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2364,22 +2364,25 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
 	struct blk_plug plug;
 	bool scan_adjusted;
 
-	/* reclaim all lazyfree pages so don't apply priority  */
-	nr[LRU_LAZYFREE] = lruvec_lru_size(lruvec, LRU_LAZYFREE, sc->reclaim_idx);
-	while (nr[LRU_LAZYFREE]) {
-		nr_to_scan = min(nr[LRU_LAZYFREE], SWAP_CLUSTER_MAX);
-		nr[LRU_LAZYFREE] -= nr_to_scan;
-		nr_reclaimed += shrink_inactive_list(nr_to_scan, lruvec, sc,
-			LRU_LAZYFREE);
-
-		if (nr_reclaimed >= nr_to_reclaim)
-			break;
-		cond_resched();
-	}
+	if (pgdat->lazyfree) {
+		/* reclaim all lazyfree pages so don't apply priority  */
+		nr[LRU_LAZYFREE] = lruvec_lru_size(lruvec, LRU_LAZYFREE, sc->reclaim_idx);
+		while (nr[LRU_LAZYFREE]) {
+			nr_to_scan = min(nr[LRU_LAZYFREE], SWAP_CLUSTER_MAX);
+			nr[LRU_LAZYFREE] -= nr_to_scan;
+			nr_reclaimed += shrink_inactive_list(nr_to_scan, lruvec, sc,
+				LRU_LAZYFREE);
+
+			if (nr_reclaimed >= nr_to_reclaim)
+				break;
+			cond_resched();
+		}
 
-	if (nr_reclaimed >= nr_to_reclaim) {
-		sc->nr_reclaimed += nr_reclaimed;
-		return;
+		if (nr_reclaimed >= nr_to_reclaim) {
+			sc->nr_reclaimed += nr_reclaimed;
+			pgdat->lazyfree = false;
+			return;
+		}
 	}
 
 	get_scan_count(lruvec, memcg, sc, nr, lru_pages);
diff --git a/mm/workingset.c b/mm/workingset.c
index c573cb2..2a01b91 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -233,7 +233,7 @@ void *workingset_eviction(struct address_space *mapping, struct page *page)
 bool workingset_refault(void *shadow)
 {
 	unsigned long refault_distance;
-	unsigned long active_file;
+	unsigned long active_file, lazyfree;
 	struct mem_cgroup *memcg;
 	unsigned long eviction;
 	struct lruvec *lruvec;
@@ -268,6 +268,7 @@ bool workingset_refault(void *shadow)
 	lruvec = mem_cgroup_lruvec(pgdat, memcg);
 	refault = atomic_long_read(&lruvec->inactive_age);
 	active_file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES);
+	lazyfree = lruvec_lru_size(lruvec, LRU_LAZYFREE, MAX_NR_ZONES);
 	rcu_read_unlock();
 
 	/*
@@ -290,6 +291,9 @@ bool workingset_refault(void *shadow)
 
 	inc_node_state(pgdat, WORKINGSET_REFAULT);
 
+	if (refault_distance <= lazyfree)
+		pgdat->lazyfree = true;
+
 	if (refault_distance <= active_file) {
 		inc_node_state(pgdat, WORKINGSET_ACTIVATE);
 		return true;

> 
> For the third issue, we can add a separate RSS count for MADV_FREE pages. The
> count will be increased in madvise syscall and decreased in page reclaim (eg,
> unmap). One issue is activate_page(). A MADV_FREE page can be promoted to
> active page there. But there isn't mm_struct context at that place. Iterating
> vma there sounds too silly. The patchset don't fix this issue yet. Hopefully
> somebody can share a hint how to fix this issue.
> 
> Thanks,
> Shaohua
> 
> Minchan previous patches:
> http://marc.info/?l=linux-mm&m=144800657002763&w=2
> 
> Shaohua Li (6):
>   mm: add wrap for page accouting index
>   mm: add lazyfree page flag
>   mm: add LRU_LAZYFREE lru list
>   mm: move MADV_FREE pages into LRU_LAZYFREE list
>   mm: reclaim lazyfree pages
>   mm: enable MADV_FREE for swapless system
> 
>  drivers/base/node.c                       |  2 +
>  drivers/staging/android/lowmemorykiller.c |  3 +-
>  fs/proc/meminfo.c                         |  1 +
>  fs/proc/task_mmu.c                        |  8 ++-
>  include/linux/mm_inline.h                 | 41 +++++++++++++
>  include/linux/mmzone.h                    |  9 +++
>  include/linux/page-flags.h                |  6 ++
>  include/linux/swap.h                      |  2 +-
>  include/linux/vm_event_item.h             |  2 +-
>  include/trace/events/mmflags.h            |  1 +
>  include/trace/events/vmscan.h             | 31 +++++-----
>  kernel/power/snapshot.c                   |  1 +
>  mm/compaction.c                           | 11 ++--
>  mm/huge_memory.c                          |  6 +-
>  mm/khugepaged.c                           |  6 +-
>  mm/madvise.c                              | 11 +---
>  mm/memcontrol.c                           |  4 ++
>  mm/memory-failure.c                       |  3 +-
>  mm/memory_hotplug.c                       |  3 +-
>  mm/mempolicy.c                            |  3 +-
>  mm/migrate.c                              | 29 ++++------
>  mm/page_alloc.c                           | 10 ++++
>  mm/rmap.c                                 |  7 ++-
>  mm/swap.c                                 | 51 +++++++++-------
>  mm/vmscan.c                               | 96 +++++++++++++++++++++++--------
>  mm/vmstat.c                               |  4 ++
>  26 files changed, 242 insertions(+), 109 deletions(-)
> 
> -- 
> 2.9.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-01-31 21:38       ` Johannes Weiner
@ 2017-02-01  9:02         ` Michal Hocko
  -1 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2017-02-01  9:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, minchan, hughd,
	riel, mgorman

On Tue 31-01-17 16:38:10, Johannes Weiner wrote:
> On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > >   our setup using MADV_FREE.
> > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > >   without it.
> > > 
> > > Fully agreed, the anon LRU is a bad place for these pages.
> > > 
> > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > 
> > > Do we actually need a new page flag and a special LRU for them? These
> > > pages are basically like clean cache pages at that point. What do you
> > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > them to the inactive file list? The way isolate+putback works should
> > > not even need much modification, something like clear_page_mlock().
> > > 
> > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > again set PG_swapbacked and goto keep_locked to move the page back
> > > into the anon LRU to get reclaimed according to swapping rules.
> > 
> > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > point of a separate LRU is to avoid scan other anon/file pages.
> 
> The LRU code and the rest of VM already use independent page type
> distinctions. That's because shmem pages are !PageAnon - they have a
> page->mapping that points to a real address space, not an anon_vma -
> but they are swapbacked and thus go through the anon LRU. This would
> just do the reverse: put PageAnon pages on the file LRU when they
> don't contain valid data and are thus not swapbacked.
> 
> As far as mixing with inactive file pages goes, it'd be possible to
> link the MADV_FREE pages to the tail of the inactive list, rather than
> the head. That said, I'm not sure reclaiming use-once filesystem cache
> before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> the sole purpose of reusing them in the (near) future. That is
> actually a stronger reuse signal than we have for use-once file pages.
> If somebody does continuous writes to a logfile or a one-off search
> through one or more files, we should actually reclaim that cache
> before we go after MADV_FREE pages that are temporarily invalidated.

I completely agree here. LRU_*_FILE will be a bit misnomer (LRU_*CACHE
would sound more appropriate). I expect there would be few places which
account based on the LRU list but those shouldn't be that hard to fix.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-02-01  9:02         ` Michal Hocko
  0 siblings, 0 replies; 30+ messages in thread
From: Michal Hocko @ 2017-02-01  9:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, minchan, hughd,
	riel, mgorman

On Tue 31-01-17 16:38:10, Johannes Weiner wrote:
> On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > >   our setup using MADV_FREE.
> > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > >   without it.
> > > 
> > > Fully agreed, the anon LRU is a bad place for these pages.
> > > 
> > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > 
> > > Do we actually need a new page flag and a special LRU for them? These
> > > pages are basically like clean cache pages at that point. What do you
> > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > them to the inactive file list? The way isolate+putback works should
> > > not even need much modification, something like clear_page_mlock().
> > > 
> > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > again set PG_swapbacked and goto keep_locked to move the page back
> > > into the anon LRU to get reclaimed according to swapping rules.
> > 
> > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > point of a separate LRU is to avoid scan other anon/file pages.
> 
> The LRU code and the rest of VM already use independent page type
> distinctions. That's because shmem pages are !PageAnon - they have a
> page->mapping that points to a real address space, not an anon_vma -
> but they are swapbacked and thus go through the anon LRU. This would
> just do the reverse: put PageAnon pages on the file LRU when they
> don't contain valid data and are thus not swapbacked.
> 
> As far as mixing with inactive file pages goes, it'd be possible to
> link the MADV_FREE pages to the tail of the inactive list, rather than
> the head. That said, I'm not sure reclaiming use-once filesystem cache
> before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> the sole purpose of reusing them in the (near) future. That is
> actually a stronger reuse signal than we have for use-once file pages.
> If somebody does continuous writes to a logfile or a one-off search
> through one or more files, we should actually reclaim that cache
> before we go after MADV_FREE pages that are temporarily invalidated.

I completely agree here. LRU_*_FILE will be a bit misnomer (LRU_*CACHE
would sound more appropriate). I expect there would be few places which
account based on the LRU list but those shouldn't be that hard to fix.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-01-31 21:38       ` Johannes Weiner
@ 2017-02-01 16:37         ` Shaohua Li
  -1 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-02-01 16:37 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

On Tue, Jan 31, 2017 at 04:38:10PM -0500, Johannes Weiner wrote:
> On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > >   our setup using MADV_FREE.
> > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > >   without it.
> > > 
> > > Fully agreed, the anon LRU is a bad place for these pages.
> > > 
> > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > 
> > > Do we actually need a new page flag and a special LRU for them? These
> > > pages are basically like clean cache pages at that point. What do you
> > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > them to the inactive file list? The way isolate+putback works should
> > > not even need much modification, something like clear_page_mlock().
> > > 
> > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > again set PG_swapbacked and goto keep_locked to move the page back
> > > into the anon LRU to get reclaimed according to swapping rules.
> > 
> > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > point of a separate LRU is to avoid scan other anon/file pages.
> 
> The LRU code and the rest of VM already use independent page type
> distinctions. That's because shmem pages are !PageAnon - they have a
> page->mapping that points to a real address space, not an anon_vma -
> but they are swapbacked and thus go through the anon LRU. This would
> just do the reverse: put PageAnon pages on the file LRU when they
> don't contain valid data and are thus not swapbacked.
> 
> As far as mixing with inactive file pages goes, it'd be possible to
> link the MADV_FREE pages to the tail of the inactive list, rather than
> the head. That said, I'm not sure reclaiming use-once filesystem cache
> before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> the sole purpose of reusing them in the (near) future. That is
> actually a stronger reuse signal than we have for use-once file pages.
> If somebody does continuous writes to a logfile or a one-off search
> through one or more files, we should actually reclaim that cache
> before we go after MADV_FREE pages that are temporarily invalidated.

Thanks, I'll try this idea.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-02-01 16:37         ` Shaohua Li
  0 siblings, 0 replies; 30+ messages in thread
From: Shaohua Li @ 2017-02-01 16:37 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman

On Tue, Jan 31, 2017 at 04:38:10PM -0500, Johannes Weiner wrote:
> On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > >   our setup using MADV_FREE.
> > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > >   without it.
> > > 
> > > Fully agreed, the anon LRU is a bad place for these pages.
> > > 
> > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > 
> > > Do we actually need a new page flag and a special LRU for them? These
> > > pages are basically like clean cache pages at that point. What do you
> > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > them to the inactive file list? The way isolate+putback works should
> > > not even need much modification, something like clear_page_mlock().
> > > 
> > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > again set PG_swapbacked and goto keep_locked to move the page back
> > > into the anon LRU to get reclaimed according to swapping rules.
> > 
> > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > point of a separate LRU is to avoid scan other anon/file pages.
> 
> The LRU code and the rest of VM already use independent page type
> distinctions. That's because shmem pages are !PageAnon - they have a
> page->mapping that points to a real address space, not an anon_vma -
> but they are swapbacked and thus go through the anon LRU. This would
> just do the reverse: put PageAnon pages on the file LRU when they
> don't contain valid data and are thus not swapbacked.
> 
> As far as mixing with inactive file pages goes, it'd be possible to
> link the MADV_FREE pages to the tail of the inactive list, rather than
> the head. That said, I'm not sure reclaiming use-once filesystem cache
> before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> the sole purpose of reusing them in the (near) future. That is
> actually a stronger reuse signal than we have for use-once file pages.
> If somebody does continuous writes to a logfile or a one-off search
> through one or more files, we should actually reclaim that cache
> before we go after MADV_FREE pages that are temporarily invalidated.

Thanks, I'll try this idea.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-01-31 21:38       ` Johannes Weiner
@ 2017-02-02  5:14         ` Minchan Kim
  -1 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2017-02-02  5:14 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman

Hi Johannes,

On Tue, Jan 31, 2017 at 04:38:10PM -0500, Johannes Weiner wrote:
> On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > >   our setup using MADV_FREE.
> > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > >   without it.
> > > 
> > > Fully agreed, the anon LRU is a bad place for these pages.
> > > 
> > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > 
> > > Do we actually need a new page flag and a special LRU for them? These
> > > pages are basically like clean cache pages at that point. What do you
> > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > them to the inactive file list? The way isolate+putback works should
> > > not even need much modification, something like clear_page_mlock().
> > > 
> > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > again set PG_swapbacked and goto keep_locked to move the page back
> > > into the anon LRU to get reclaimed according to swapping rules.
> > 
> > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > point of a separate LRU is to avoid scan other anon/file pages.
> 
> The LRU code and the rest of VM already use independent page type
> distinctions. That's because shmem pages are !PageAnon - they have a
> page->mapping that points to a real address space, not an anon_vma -
> but they are swapbacked and thus go through the anon LRU. This would
> just do the reverse: put PageAnon pages on the file LRU when they
> don't contain valid data and are thus not swapbacked.
> 
> As far as mixing with inactive file pages goes, it'd be possible to
> link the MADV_FREE pages to the tail of the inactive list, rather than
> the head. That said, I'm not sure reclaiming use-once filesystem cache
> before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> the sole purpose of reusing them in the (near) future. That is
> actually a stronger reuse signal than we have for use-once file pages.
> If somebody does continuous writes to a logfile or a one-off search
> through one or more files, we should actually reclaim that cache
> before we go after MADV_FREE pages that are temporarily invalidated.

Yes, we should be careful on this issue. It was main arguable point.
How about moving them to head of inactive file, not tail if we want to
go with inactive file LRU?

With that, VM try to reclaim file pages first from the tail of list
and if pages reclaimed were workingset, it could be activated by
workingset_refault. Otherwise, we can discard use-once pages without
puring *madv_free* pages so I think it's good compromise.

What do you think?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-02-02  5:14         ` Minchan Kim
  0 siblings, 0 replies; 30+ messages in thread
From: Minchan Kim @ 2017-02-02  5:14 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman

Hi Johannes,

On Tue, Jan 31, 2017 at 04:38:10PM -0500, Johannes Weiner wrote:
> On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > >   our setup using MADV_FREE.
> > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > >   without it.
> > > 
> > > Fully agreed, the anon LRU is a bad place for these pages.
> > > 
> > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > 
> > > Do we actually need a new page flag and a special LRU for them? These
> > > pages are basically like clean cache pages at that point. What do you
> > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > them to the inactive file list? The way isolate+putback works should
> > > not even need much modification, something like clear_page_mlock().
> > > 
> > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > again set PG_swapbacked and goto keep_locked to move the page back
> > > into the anon LRU to get reclaimed according to swapping rules.
> > 
> > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > point of a separate LRU is to avoid scan other anon/file pages.
> 
> The LRU code and the rest of VM already use independent page type
> distinctions. That's because shmem pages are !PageAnon - they have a
> page->mapping that points to a real address space, not an anon_vma -
> but they are swapbacked and thus go through the anon LRU. This would
> just do the reverse: put PageAnon pages on the file LRU when they
> don't contain valid data and are thus not swapbacked.
> 
> As far as mixing with inactive file pages goes, it'd be possible to
> link the MADV_FREE pages to the tail of the inactive list, rather than
> the head. That said, I'm not sure reclaiming use-once filesystem cache
> before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> the sole purpose of reusing them in the (near) future. That is
> actually a stronger reuse signal than we have for use-once file pages.
> If somebody does continuous writes to a logfile or a one-off search
> through one or more files, we should actually reclaim that cache
> before we go after MADV_FREE pages that are temporarily invalidated.

Yes, we should be careful on this issue. It was main arguable point.
How about moving them to head of inactive file, not tail if we want to
go with inactive file LRU?

With that, VM try to reclaim file pages first from the tail of list
and if pages reclaimed were workingset, it could be activated by
workingset_refault. Otherwise, we can discard use-once pages without
puring *madv_free* pages so I think it's good compromise.

What do you think?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
  2017-02-02  5:14         ` Minchan Kim
@ 2017-02-02 19:28           ` Johannes Weiner
  -1 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2017-02-02 19:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman

On Thu, Feb 02, 2017 at 02:14:10PM +0900, Minchan Kim wrote:
> Hi Johannes,
> 
> On Tue, Jan 31, 2017 at 04:38:10PM -0500, Johannes Weiner wrote:
> > On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > > Hi Shaohua,
> > > > 
> > > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > > >   our setup using MADV_FREE.
> > > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > > >   without it.
> > > > 
> > > > Fully agreed, the anon LRU is a bad place for these pages.
> > > > 
> > > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > > 
> > > > Do we actually need a new page flag and a special LRU for them? These
> > > > pages are basically like clean cache pages at that point. What do you
> > > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > > them to the inactive file list? The way isolate+putback works should
> > > > not even need much modification, something like clear_page_mlock().
> > > > 
> > > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > > again set PG_swapbacked and goto keep_locked to move the page back
> > > > into the anon LRU to get reclaimed according to swapping rules.
> > > 
> > > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > > point of a separate LRU is to avoid scan other anon/file pages.
> > 
> > The LRU code and the rest of VM already use independent page type
> > distinctions. That's because shmem pages are !PageAnon - they have a
> > page->mapping that points to a real address space, not an anon_vma -
> > but they are swapbacked and thus go through the anon LRU. This would
> > just do the reverse: put PageAnon pages on the file LRU when they
> > don't contain valid data and are thus not swapbacked.
> > 
> > As far as mixing with inactive file pages goes, it'd be possible to
> > link the MADV_FREE pages to the tail of the inactive list, rather than
> > the head. That said, I'm not sure reclaiming use-once filesystem cache
> > before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> > the sole purpose of reusing them in the (near) future. That is
> > actually a stronger reuse signal than we have for use-once file pages.
> > If somebody does continuous writes to a logfile or a one-off search
> > through one or more files, we should actually reclaim that cache
> > before we go after MADV_FREE pages that are temporarily invalidated.
> 
> Yes, we should be careful on this issue. It was main arguable point.
> How about moving them to head of inactive file, not tail if we want to
> go with inactive file LRU?
> 
> With that, VM try to reclaim file pages first from the tail of list
> and if pages reclaimed were workingset, it could be activated by
> workingset_refault. Otherwise, we can discard use-once pages without
> puring *madv_free* pages so I think it's good compromise.
> 
> What do you think?

That's what I tried to say. To address Shaohua's concern in two steps,
first, it *would* be possible to move MADV_FREE pages to the tail of
the inactive list. But then, taking a step back, I argued that this is
probably not be the reclaim policy we actually want.

So I agree with you. I think MADV_FREE should move these pages to the
*head* of the inactive cache list, so that we reclaim colder use-once
cache first. Workingset detection will make any necessary corrections.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC 0/6]mm: add new LRU list for MADV_FREE pages
@ 2017-02-02 19:28           ` Johannes Weiner
  0 siblings, 0 replies; 30+ messages in thread
From: Johannes Weiner @ 2017-02-02 19:28 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman

On Thu, Feb 02, 2017 at 02:14:10PM +0900, Minchan Kim wrote:
> Hi Johannes,
> 
> On Tue, Jan 31, 2017 at 04:38:10PM -0500, Johannes Weiner wrote:
> > On Tue, Jan 31, 2017 at 11:45:47AM -0800, Shaohua Li wrote:
> > > On Tue, Jan 31, 2017 at 01:59:49PM -0500, Johannes Weiner wrote:
> > > > Hi Shaohua,
> > > > 
> > > > On Sun, Jan 29, 2017 at 09:51:17PM -0800, Shaohua Li wrote:
> > > > > We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
> > > > > solving the issues, jemalloc can't use the MADV_FREE feature.
> > > > > - Doesn't support system without swap enabled. Because if swap is off, we can't
> > > > >   or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
> > > > >   with other anonymous pages, we can't reclaim MADV_FREE pages. In current
> > > > >   implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
> > > > >   But in our environment, a lot of machines don't enable swap. This will prevent
> > > > >   our setup using MADV_FREE.
> > > > > - Increases memory pressure. page reclaim bias file pages reclaim against
> > > > >   anonymous pages. This doesn't make sense for MADV_FREE pages, because those
> > > > >   pages could be freed easily and refilled with very slight penality. Even page
> > > > >   reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
> > > > >   pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
> > > > >   page, we probably must scan a lot of other anonymous pages, which is
> > > > >   inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
> > > > >   without it.
> > > > 
> > > > Fully agreed, the anon LRU is a bad place for these pages.
> > > > 
> > > > > For the first two issues, introducing a new LRU list for MADV_FREE pages could
> > > > > solve the issues. We can directly reclaim MADV_FREE pages without writting them
> > > > > out to swap, so the first issue could be fixed. If only MADV_FREE pages are in
> > > > > the new list, page reclaim can easily reclaim such pages without interference
> > > > > of file or anonymous pages. The memory pressure issue will disappear.
> > > > 
> > > > Do we actually need a new page flag and a special LRU for them? These
> > > > pages are basically like clean cache pages at that point. What do you
> > > > think about clearing their PG_swapbacked flag on MADV_FREE and moving
> > > > them to the inactive file list? The way isolate+putback works should
> > > > not even need much modification, something like clear_page_mlock().
> > > > 
> > > > When the reclaim scanner finds anon && dirty && !swapbacked, it can
> > > > again set PG_swapbacked and goto keep_locked to move the page back
> > > > into the anon LRU to get reclaimed according to swapping rules.
> > > 
> > > Interesting idea! Not sure though, the MADV_FREE pages are actually anonymous
> > > pages, this will introduce confusion. On the other hand, if the MADV_FREE pages
> > > are mixed with inactive file pages, page reclaim need to reclaim a lot of file
> > > pages first before reclaim the MADV_FREE pages. This doesn't look good. The
> > > point of a separate LRU is to avoid scan other anon/file pages.
> > 
> > The LRU code and the rest of VM already use independent page type
> > distinctions. That's because shmem pages are !PageAnon - they have a
> > page->mapping that points to a real address space, not an anon_vma -
> > but they are swapbacked and thus go through the anon LRU. This would
> > just do the reverse: put PageAnon pages on the file LRU when they
> > don't contain valid data and are thus not swapbacked.
> > 
> > As far as mixing with inactive file pages goes, it'd be possible to
> > link the MADV_FREE pages to the tail of the inactive list, rather than
> > the head. That said, I'm not sure reclaiming use-once filesystem cache
> > before MADV_FREE is such a bad policy. MADV_FREE retains the vmas for
> > the sole purpose of reusing them in the (near) future. That is
> > actually a stronger reuse signal than we have for use-once file pages.
> > If somebody does continuous writes to a logfile or a one-off search
> > through one or more files, we should actually reclaim that cache
> > before we go after MADV_FREE pages that are temporarily invalidated.
> 
> Yes, we should be careful on this issue. It was main arguable point.
> How about moving them to head of inactive file, not tail if we want to
> go with inactive file LRU?
> 
> With that, VM try to reclaim file pages first from the tail of list
> and if pages reclaimed were workingset, it could be activated by
> workingset_refault. Otherwise, we can discard use-once pages without
> puring *madv_free* pages so I think it's good compromise.
> 
> What do you think?

That's what I tried to say. To address Shaohua's concern in two steps,
first, it *would* be possible to move MADV_FREE pages to the tail of
the inactive list. But then, taking a step back, I argued that this is
probably not be the reclaim policy we actually want.

So I agree with you. I think MADV_FREE should move these pages to the
*head* of the inactive cache list, so that we reclaim colder use-once
cache first. Workingset detection will make any necessary corrections.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2017-02-02 19:28 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-30  5:51 [RFC 0/6]mm: add new LRU list for MADV_FREE pages Shaohua Li
2017-01-30  5:51 ` Shaohua Li
2017-01-30  5:51 ` [RFC 1/6] mm: add wrap for page accouting index Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 2/6] mm: add lazyfree page flag Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 3/6] mm: add LRU_LAZYFREE lru list Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 4/6] mm: move MADV_FREE pages into LRU_LAZYFREE list Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 5/6] mm: reclaim lazyfree pages Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-30  5:51 ` [RFC 6/6] mm: enable MADV_FREE for swapless system Shaohua Li
2017-01-30  5:51   ` Shaohua Li
2017-01-31 18:59 ` [RFC 0/6]mm: add new LRU list for MADV_FREE pages Johannes Weiner
2017-01-31 18:59   ` Johannes Weiner
2017-01-31 19:45   ` Shaohua Li
2017-01-31 19:45     ` Shaohua Li
2017-01-31 21:38     ` Johannes Weiner
2017-01-31 21:38       ` Johannes Weiner
2017-02-01  9:02       ` Michal Hocko
2017-02-01  9:02         ` Michal Hocko
2017-02-01 16:37       ` Shaohua Li
2017-02-01 16:37         ` Shaohua Li
2017-02-02  5:14       ` Minchan Kim
2017-02-02  5:14         ` Minchan Kim
2017-02-02 19:28         ` Johannes Weiner
2017-02-02 19:28           ` Johannes Weiner
2017-02-01  5:47 ` Minchan Kim
2017-02-01  5:47   ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.