All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V4 0/6] mm: fix some MADV_FREE issues
@ 2017-02-22 18:50 ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

Hi,

We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
solving the issues, jemalloc can't use the MADV_FREE feature.
- Doesn't support system without swap enabled. Because if swap is off, we can't
  or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
  with other anonymous pages, we can't reclaim MADV_FREE pages. In current
  implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
  But in our environment, a lot of machines don't enable swap. This will prevent
  our setup using MADV_FREE.
- Increases memory pressure. page reclaim bias file pages reclaim against
  anonymous pages. This doesn't make sense for MADV_FREE pages, because those
  pages could be freed easily and refilled with very slight penality. Even page
  reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
  pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
  page, we probably must scan a lot of other anonymous pages, which is
  inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
  without it.
- Accounting. There are two accounting problems. We don't have a global
  accounting. If the system is abnormal, we don't know if it's a problem from
  MADV_FREE side. The other problem is RSS accounting. MADV_FREE pages are
  accounted as normal anon pages and reclaimed lazily, so application's RSS
  becomes bigger. This confuses our workloads. We have monitoring daemon running
  and if it finds applications' RSS becomes abnormal, the daemon will kill the
  applications even kernel can reclaim the memory easily.

To address the first the two issues, we can either put MADV_FREE pages into a
separate LRU list (Minchan's previous patches and V1 patches), or put them into
LRU_INACTIVE_FILE list (suggested by Johannes). The patchset use the second
idea. The reason is LRU_INACTIVE_FILE list is tiny nowadays and should be full
of used once file pages. So we can still efficiently reclaim MADV_FREE pages
there without interference with other anon and active file pages. Putting the
pages into inactive file list also has an advantage which allows page reclaim
to prioritize MADV_FREE pages and used once file pages. MADV_FREE pages are put
into the lru list and clear SwapBacked flag, so PageAnon(page) &&
!PageSwapBacked(page) will indicate a MADV_FREE pages. These pages will
directly freed without pageout if they are clean, otherwise normal swap will
reclaim them.

For the third issue, the previous post adds global accounting and a separate
RSS count for MADV_FREE pages. The problem is we never get accurate accounting
for MADV_FREE pages. The pages are mapped to userspace, can be dirtied without
notice from kernel side. To get accurate accounting, we could write protect the
page, but then there is extra page fault overhead, which people don't want to
pay. Jemalloc guys have concerns about the inaccurate accounting, so this post
drops the accounting patches temporarily. The info exported to /proc/pid/smaps
for MADV_FREE pages are kept, which is the only place we can get accurate
accounting right now.

Thanks,
Shaohua

V3->V4:
- rebase to latest -mm tree
- Address several issues pointed out by Johannes and Minchan
- Dropped vmstat and RSS accounting

V2->V3:
- rebase to latest -mm tree
- Address severl issues pointed out by Minchan
- Add more descriptions
http://marc.info/?l=linux-mm&m=148710098701674&w=2

V1->V2:
- Put MADV_FREE pages into LRU_INACTIVE_FILE list instead of adding a new lru
  list, suggested by Johannes
- Add RSS support
http://marc.info/?l=linux-mm&m=148616481928054&w=2

Minchan previous patches:
http://marc.info/?l=linux-mm&m=144800657002763&w=2

----------------------
Shaohua Li (6):
  mm: delete unnecessary TTU_* flags
  mm: don't assume anonymous pages have SwapBacked flag
  mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  mm: reclaim MADV_FREE pages
  mm: enable MADV_FREE for swapless system
  proc: show MADV_FREE pages info in smaps

 Documentation/filesystems/proc.txt |  4 +++
 fs/proc/task_mmu.c                 |  8 +++++-
 include/linux/rmap.h               |  4 +--
 include/linux/swap.h               |  2 +-
 include/linux/vm_event_item.h      |  2 +-
 mm/huge_memory.c                   |  6 ++--
 mm/khugepaged.c                    |  8 ++----
 mm/madvise.c                       | 11 ++------
 mm/memory-failure.c                |  2 +-
 mm/migrate.c                       |  3 +-
 mm/rmap.c                          | 15 +++++++---
 mm/swap.c                          | 56 +++++++++++++++++++++++---------------
 mm/vmscan.c                        | 45 +++++++++++++++++-------------
 mm/vmstat.c                        |  1 +
 14 files changed, 96 insertions(+), 71 deletions(-)

-- 
2.9.3

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH V4 0/6] mm: fix some MADV_FREE issues
@ 2017-02-22 18:50 ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

Hi,

We are trying to use MADV_FREE in jemalloc. Several issues are found. Without
solving the issues, jemalloc can't use the MADV_FREE feature.
- Doesn't support system without swap enabled. Because if swap is off, we can't
  or can't efficiently age anonymous pages. And since MADV_FREE pages are mixed
  with other anonymous pages, we can't reclaim MADV_FREE pages. In current
  implementation, MADV_FREE will fallback to MADV_DONTNEED without swap enabled.
  But in our environment, a lot of machines don't enable swap. This will prevent
  our setup using MADV_FREE.
- Increases memory pressure. page reclaim bias file pages reclaim against
  anonymous pages. This doesn't make sense for MADV_FREE pages, because those
  pages could be freed easily and refilled with very slight penality. Even page
  reclaim doesn't bias file pages, there is still an issue, because MADV_FREE
  pages and other anonymous pages are mixed together. To reclaim a MADV_FREE
  page, we probably must scan a lot of other anonymous pages, which is
  inefficient. In our test, we usually see oom with MADV_FREE enabled and nothing
  without it.
- Accounting. There are two accounting problems. We don't have a global
  accounting. If the system is abnormal, we don't know if it's a problem from
  MADV_FREE side. The other problem is RSS accounting. MADV_FREE pages are
  accounted as normal anon pages and reclaimed lazily, so application's RSS
  becomes bigger. This confuses our workloads. We have monitoring daemon running
  and if it finds applications' RSS becomes abnormal, the daemon will kill the
  applications even kernel can reclaim the memory easily.

To address the first the two issues, we can either put MADV_FREE pages into a
separate LRU list (Minchan's previous patches and V1 patches), or put them into
LRU_INACTIVE_FILE list (suggested by Johannes). The patchset use the second
idea. The reason is LRU_INACTIVE_FILE list is tiny nowadays and should be full
of used once file pages. So we can still efficiently reclaim MADV_FREE pages
there without interference with other anon and active file pages. Putting the
pages into inactive file list also has an advantage which allows page reclaim
to prioritize MADV_FREE pages and used once file pages. MADV_FREE pages are put
into the lru list and clear SwapBacked flag, so PageAnon(page) &&
!PageSwapBacked(page) will indicate a MADV_FREE pages. These pages will
directly freed without pageout if they are clean, otherwise normal swap will
reclaim them.

For the third issue, the previous post adds global accounting and a separate
RSS count for MADV_FREE pages. The problem is we never get accurate accounting
for MADV_FREE pages. The pages are mapped to userspace, can be dirtied without
notice from kernel side. To get accurate accounting, we could write protect the
page, but then there is extra page fault overhead, which people don't want to
pay. Jemalloc guys have concerns about the inaccurate accounting, so this post
drops the accounting patches temporarily. The info exported to /proc/pid/smaps
for MADV_FREE pages are kept, which is the only place we can get accurate
accounting right now.

Thanks,
Shaohua

V3->V4:
- rebase to latest -mm tree
- Address several issues pointed out by Johannes and Minchan
- Dropped vmstat and RSS accounting

V2->V3:
- rebase to latest -mm tree
- Address severl issues pointed out by Minchan
- Add more descriptions
http://marc.info/?l=linux-mm&m=148710098701674&w=2

V1->V2:
- Put MADV_FREE pages into LRU_INACTIVE_FILE list instead of adding a new lru
  list, suggested by Johannes
- Add RSS support
http://marc.info/?l=linux-mm&m=148616481928054&w=2

Minchan previous patches:
http://marc.info/?l=linux-mm&m=144800657002763&w=2

----------------------
Shaohua Li (6):
  mm: delete unnecessary TTU_* flags
  mm: don't assume anonymous pages have SwapBacked flag
  mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  mm: reclaim MADV_FREE pages
  mm: enable MADV_FREE for swapless system
  proc: show MADV_FREE pages info in smaps

 Documentation/filesystems/proc.txt |  4 +++
 fs/proc/task_mmu.c                 |  8 +++++-
 include/linux/rmap.h               |  4 +--
 include/linux/swap.h               |  2 +-
 include/linux/vm_event_item.h      |  2 +-
 mm/huge_memory.c                   |  6 ++--
 mm/khugepaged.c                    |  8 ++----
 mm/madvise.c                       | 11 ++------
 mm/memory-failure.c                |  2 +-
 mm/migrate.c                       |  3 +-
 mm/rmap.c                          | 15 +++++++---
 mm/swap.c                          | 56 +++++++++++++++++++++++---------------
 mm/vmscan.c                        | 45 +++++++++++++++++-------------
 mm/vmstat.c                        |  1 +
 14 files changed, 96 insertions(+), 71 deletions(-)

-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
  2017-02-22 18:50 ` Shaohua Li
@ 2017-02-22 18:50   ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
always have the flag set if we want to do an unmap. For cases we don't
do an unmap, the TTU_LZFREE part of code should never run.

Also the TTU_UNMAP is unnecessary. If no other flags set (for
example, TTU_MIGRATION), an unmap is implied.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/rmap.h |  2 --
 mm/memory-failure.c  |  2 +-
 mm/rmap.c            |  2 +-
 mm/vmscan.c          | 11 ++++-------
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8c89e90..e2cd8f9 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -83,10 +83,8 @@ struct anon_vma_chain {
 };
 
 enum ttu_flags {
-	TTU_UNMAP = 1,			/* unmap mode */
 	TTU_MIGRATION = 2,		/* migration mode */
 	TTU_MUNLOCK = 4,		/* munlock mode */
-	TTU_LZFREE = 8,			/* lazy free mode */
 	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
 
 	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3d0f2fd..b78d080 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -906,7 +906,7 @@ EXPORT_SYMBOL_GPL(get_hwpoison_page);
 static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 				  int trapno, int flags, struct page **hpagep)
 {
-	enum ttu_flags ttu = TTU_UNMAP | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
+	enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
 	struct address_space *mapping;
 	LIST_HEAD(tokill);
 	int ret;
diff --git a/mm/rmap.c b/mm/rmap.c
index 8774791..96eb85c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1418,7 +1418,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			 */
 			VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 
-			if (!PageDirty(page) && (flags & TTU_LZFREE)) {
+			if (!PageDirty(page)) {
 				/* It's a freeable page by MADV_FREE */
 				dec_mm_counter(mm, MM_ANONPAGES);
 				rp->lazyfreed++;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 26c3b40..68ea50d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -971,7 +971,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		int may_enter_fs;
 		enum page_references references = PAGEREF_RECLAIM_CLEAN;
 		bool dirty, writeback;
-		bool lazyfree = false;
 		int ret = SWAP_SUCCESS;
 
 		cond_resched();
@@ -1125,7 +1124,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			if (!add_to_swap(page, page_list))
 				goto activate_locked;
-			lazyfree = true;
 			may_enter_fs = 1;
 
 			/* Adding to swap updated mapping */
@@ -1143,9 +1141,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * processes. Try to unmap it here.
 		 */
 		if (page_mapped(page) && mapping) {
-			switch (ret = try_to_unmap(page, lazyfree ?
-				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
-				(ttu_flags | TTU_BATCH_FLUSH))) {
+			switch (ret = try_to_unmap(page,
+				ttu_flags | TTU_BATCH_FLUSH)) {
 			case SWAP_FAIL:
 				nr_unmap_fail++;
 				goto activate_locked;
@@ -1353,7 +1350,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 	}
 
 	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
-			TTU_UNMAP|TTU_IGNORE_ACCESS, NULL, true);
+			TTU_IGNORE_ACCESS, NULL, true);
 	list_splice(&clean_pages, page_list);
 	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
 	return ret;
@@ -1760,7 +1757,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	if (nr_taken == 0)
 		return 0;
 
-	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, TTU_UNMAP,
+	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0,
 				&stat, false);
 
 	spin_lock_irq(&pgdat->lru_lock);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
@ 2017-02-22 18:50   ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
always have the flag set if we want to do an unmap. For cases we don't
do an unmap, the TTU_LZFREE part of code should never run.

Also the TTU_UNMAP is unnecessary. If no other flags set (for
example, TTU_MIGRATION), an unmap is implied.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/rmap.h |  2 --
 mm/memory-failure.c  |  2 +-
 mm/rmap.c            |  2 +-
 mm/vmscan.c          | 11 ++++-------
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 8c89e90..e2cd8f9 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -83,10 +83,8 @@ struct anon_vma_chain {
 };
 
 enum ttu_flags {
-	TTU_UNMAP = 1,			/* unmap mode */
 	TTU_MIGRATION = 2,		/* migration mode */
 	TTU_MUNLOCK = 4,		/* munlock mode */
-	TTU_LZFREE = 8,			/* lazy free mode */
 	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
 
 	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 3d0f2fd..b78d080 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -906,7 +906,7 @@ EXPORT_SYMBOL_GPL(get_hwpoison_page);
 static int hwpoison_user_mappings(struct page *p, unsigned long pfn,
 				  int trapno, int flags, struct page **hpagep)
 {
-	enum ttu_flags ttu = TTU_UNMAP | TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
+	enum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_IGNORE_ACCESS;
 	struct address_space *mapping;
 	LIST_HEAD(tokill);
 	int ret;
diff --git a/mm/rmap.c b/mm/rmap.c
index 8774791..96eb85c 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1418,7 +1418,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			 */
 			VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 
-			if (!PageDirty(page) && (flags & TTU_LZFREE)) {
+			if (!PageDirty(page)) {
 				/* It's a freeable page by MADV_FREE */
 				dec_mm_counter(mm, MM_ANONPAGES);
 				rp->lazyfreed++;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 26c3b40..68ea50d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -971,7 +971,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		int may_enter_fs;
 		enum page_references references = PAGEREF_RECLAIM_CLEAN;
 		bool dirty, writeback;
-		bool lazyfree = false;
 		int ret = SWAP_SUCCESS;
 
 		cond_resched();
@@ -1125,7 +1124,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			if (!add_to_swap(page, page_list))
 				goto activate_locked;
-			lazyfree = true;
 			may_enter_fs = 1;
 
 			/* Adding to swap updated mapping */
@@ -1143,9 +1141,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * processes. Try to unmap it here.
 		 */
 		if (page_mapped(page) && mapping) {
-			switch (ret = try_to_unmap(page, lazyfree ?
-				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
-				(ttu_flags | TTU_BATCH_FLUSH))) {
+			switch (ret = try_to_unmap(page,
+				ttu_flags | TTU_BATCH_FLUSH)) {
 			case SWAP_FAIL:
 				nr_unmap_fail++;
 				goto activate_locked;
@@ -1353,7 +1350,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 	}
 
 	ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc,
-			TTU_UNMAP|TTU_IGNORE_ACCESS, NULL, true);
+			TTU_IGNORE_ACCESS, NULL, true);
 	list_splice(&clean_pages, page_list);
 	mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret);
 	return ret;
@@ -1760,7 +1757,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	if (nr_taken == 0)
 		return 0;
 
-	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, TTU_UNMAP,
+	nr_reclaimed = shrink_page_list(&page_list, pgdat, sc, 0,
 				&stat, false);
 
 	spin_lock_irq(&pgdat->lru_lock);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 2/6] mm: don't assume anonymous pages have SwapBacked flag
  2017-02-22 18:50 ` Shaohua Li
@ 2017-02-22 18:50   ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

There are a few places the code assumes anonymous pages should have
SwapBacked flag set. MADV_FREE pages are anonymous pages but we are
going to add them to LRU_INACTIVE_FILE list and clear SwapBacked flag
for them. The assumption doesn't hold any more, so fix them.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/huge_memory.c | 1 -
 mm/khugepaged.c  | 8 +++-----
 mm/migrate.c     | 3 ++-
 mm/rmap.c        | 3 ++-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7dda8d6..cf9fb46 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2361,7 +2361,6 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 
 	if (PageAnon(head)) {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 34bce5c..a4b499f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -481,8 +481,7 @@ void __khugepaged_exit(struct mm_struct *mm)
 
 static void release_pte_page(struct page *page)
 {
-	/* 0 stands for page_is_file_cache(page) == false */
-	dec_node_page_state(page, NR_ISOLATED_ANON + 0);
+	dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page));
 	unlock_page(page);
 	putback_lru_page(page);
 }
@@ -530,7 +529,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 
 		VM_BUG_ON_PAGE(PageCompound(page), page);
 		VM_BUG_ON_PAGE(!PageAnon(page), page);
-		VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
 		/*
 		 * We can do it before isolate_lru_page because the
@@ -577,8 +575,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			result = SCAN_DEL_PAGE_LRU;
 			goto out;
 		}
-		/* 0 stands for page_is_file_cache(page) == false */
-		inc_node_page_state(page, NR_ISOLATED_ANON + 0);
+		inc_node_page_state(page,
+				NR_ISOLATED_ANON + page_is_file_cache(page));
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 2c63ac0..7c8df1f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1943,7 +1943,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 
 	/* Prepare a page as a migration target */
 	__SetPageLocked(new_page);
-	__SetPageSwapBacked(new_page);
+	if (PageSwapBacked(page))
+		__SetPageSwapBacked(new_page);
 
 	/* anon mapping, we can simply copy page->mapping to the new page: */
 	new_page->mapping = page->mapping;
diff --git a/mm/rmap.c b/mm/rmap.c
index 96eb85c..c621088 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1416,7 +1416,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			 * Store the swap location in the pte.
 			 * See handle_pte_fault() ...
 			 */
-			VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+			VM_BUG_ON_PAGE(!PageSwapCache(page) && PageSwapBacked(page),
+				page);
 
 			if (!PageDirty(page)) {
 				/* It's a freeable page by MADV_FREE */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 2/6] mm: don't assume anonymous pages have SwapBacked flag
@ 2017-02-22 18:50   ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

There are a few places the code assumes anonymous pages should have
SwapBacked flag set. MADV_FREE pages are anonymous pages but we are
going to add them to LRU_INACTIVE_FILE list and clear SwapBacked flag
for them. The assumption doesn't hold any more, so fix them.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/huge_memory.c | 1 -
 mm/khugepaged.c  | 8 +++-----
 mm/migrate.c     | 3 ++-
 mm/rmap.c        | 3 ++-
 4 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 7dda8d6..cf9fb46 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2361,7 +2361,6 @@ int split_huge_page_to_list(struct page *page, struct list_head *list)
 
 	VM_BUG_ON_PAGE(is_huge_zero_page(page), page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
-	VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 	VM_BUG_ON_PAGE(!PageCompound(page), page);
 
 	if (PageAnon(head)) {
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 34bce5c..a4b499f 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -481,8 +481,7 @@ void __khugepaged_exit(struct mm_struct *mm)
 
 static void release_pte_page(struct page *page)
 {
-	/* 0 stands for page_is_file_cache(page) == false */
-	dec_node_page_state(page, NR_ISOLATED_ANON + 0);
+	dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page));
 	unlock_page(page);
 	putback_lru_page(page);
 }
@@ -530,7 +529,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 
 		VM_BUG_ON_PAGE(PageCompound(page), page);
 		VM_BUG_ON_PAGE(!PageAnon(page), page);
-		VM_BUG_ON_PAGE(!PageSwapBacked(page), page);
 
 		/*
 		 * We can do it before isolate_lru_page because the
@@ -577,8 +575,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma,
 			result = SCAN_DEL_PAGE_LRU;
 			goto out;
 		}
-		/* 0 stands for page_is_file_cache(page) == false */
-		inc_node_page_state(page, NR_ISOLATED_ANON + 0);
+		inc_node_page_state(page,
+				NR_ISOLATED_ANON + page_is_file_cache(page));
 		VM_BUG_ON_PAGE(!PageLocked(page), page);
 		VM_BUG_ON_PAGE(PageLRU(page), page);
 
diff --git a/mm/migrate.c b/mm/migrate.c
index 2c63ac0..7c8df1f 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1943,7 +1943,8 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm,
 
 	/* Prepare a page as a migration target */
 	__SetPageLocked(new_page);
-	__SetPageSwapBacked(new_page);
+	if (PageSwapBacked(page))
+		__SetPageSwapBacked(new_page);
 
 	/* anon mapping, we can simply copy page->mapping to the new page: */
 	new_page->mapping = page->mapping;
diff --git a/mm/rmap.c b/mm/rmap.c
index 96eb85c..c621088 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1416,7 +1416,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			 * Store the swap location in the pte.
 			 * See handle_pte_fault() ...
 			 */
-			VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+			VM_BUG_ON_PAGE(!PageSwapCache(page) && PageSwapBacked(page),
+				page);
 
 			if (!PageDirty(page)) {
 				/* It's a freeable page by MADV_FREE */
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-22 18:50 ` Shaohua Li
@ 2017-02-22 18:50   ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
pages, but they can be freed without pageout. To destinguish them
against normal anonymous pages, we clear their SwapBacked flag.

MADV_FREE pages could be freed without pageout, so they pretty much like
used once file pages. For such pages, we'd like to reclaim them once
there is memory pressure. Also it might be unfair reclaiming MADV_FREE
pages always before used once file pages and we definitively want to
reclaim the pages before other anonymous and file pages.

To speed up MADV_FREE pages reclaim, we put the pages into
LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
nowadays and should be full of used once file pages. Reclaiming
MADV_FREE pages will not have much interfere of anonymous and active
file pages. And the inactive file pages and MADV_FREE pages will be
reclaimed according to their age, so we don't reclaim too many MADV_FREE
pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
means we can reclaim the pages without swap support. This idea is
suggested by Johannes.

This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
avoid bisect failure, next patch will do it.

The patch is based on Minchan's original patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/swap.h          |  2 +-
 include/linux/vm_event_item.h |  2 +-
 mm/huge_memory.c              |  3 ---
 mm/madvise.c                  |  2 --
 mm/swap.c                     | 56 ++++++++++++++++++++++++++-----------------
 mm/vmstat.c                   |  1 +
 6 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45e91dd..486494e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -279,7 +279,7 @@ extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
-extern void deactivate_page(struct page *page);
+extern void mark_page_lazyfree(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 6aa1b6c..94e58da 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -25,7 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
 		FOR_ALL_ZONES(ALLOCSTALL),
 		FOR_ALL_ZONES(PGSCAN_SKIP),
-		PGFREE, PGACTIVATE, PGDEACTIVATE,
+		PGFREE, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE,
 		PGFAULT, PGMAJFAULT,
 		PGLAZYFREED,
 		PGREFILL,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cf9fb46..3b7ee0c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1562,9 +1562,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		ClearPageDirty(page);
 	unlock_page(page);
 
-	if (PageActive(page))
-		deactivate_page(page);
-
 	if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
 		orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
 			tlb->fullmm);
diff --git a/mm/madvise.c b/mm/madvise.c
index dc5927c..61e10b1 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -411,8 +411,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			ptent = pte_mkold(ptent);
 			ptent = pte_mkclean(ptent);
 			set_pte_at(mm, addr, pte, ptent);
-			if (PageActive(page))
-				deactivate_page(page);
 			tlb_remove_tlb_entry(tlb, pte, addr);
 		}
 	}
diff --git a/mm/swap.c b/mm/swap.c
index c4910f1..fbd8484 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -46,7 +46,7 @@ int page_cluster;
 static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
 #endif
@@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
 		int lru = page_lru_base_type(page);
 
 		del_page_from_lru_list(page, lruvec, lru);
+		if (PageAnon(page) && !PageSwapBacked(page)) {
+			SetPageSwapBacked(page);
+			/* charge to anon scanned/rotated reclaim_stat */
+			file = 0;
+			lru = LRU_INACTIVE_ANON;
+		}
 		SetPageActive(page);
 		lru += LRU_ACTIVE;
 		add_page_to_lru_list(page, lruvec, lru);
@@ -561,20 +567,26 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
 }
 
 
-static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
+static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
 			    void *arg)
 {
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		int file = page_is_file_cache(page);
-		int lru = page_lru_base_type(page);
+	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
+	    !PageUnevictable(page)) {
+		bool active = PageActive(page);
 
-		del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
+		del_page_from_lru_list(page, lruvec, LRU_INACTIVE_ANON + active);
 		ClearPageActive(page);
 		ClearPageReferenced(page);
-		add_page_to_lru_list(page, lruvec, lru);
+		/*
+		 * lazyfree pages are clean anonymous pages. They have
+		 * SwapBacked flag cleared to destinguish normal anonymous
+		 * pages
+		 */
+		ClearPageSwapBacked(page);
+		add_page_to_lru_list(page, lruvec, LRU_INACTIVE_FILE);
 
-		__count_vm_event(PGDEACTIVATE);
-		update_page_reclaim_stat(lruvec, file, 0);
+		__count_vm_events(PGLAZYFREE, hpage_nr_pages(page));
+		update_page_reclaim_stat(lruvec, 1, 0);
 	}
 }
 
@@ -604,9 +616,9 @@ void lru_add_drain_cpu(int cpu)
 	if (pagevec_count(pvec))
 		pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
 
-	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+	pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
 	if (pagevec_count(pvec))
-		pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+		pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
 
 	activate_page_drain(cpu);
 }
@@ -638,22 +650,22 @@ void deactivate_file_page(struct page *page)
 }
 
 /**
- * deactivate_page - deactivate a page
+ * mark_page_lazyfree - make an anon page lazyfree
  * @page: page to deactivate
  *
- * deactivate_page() moves @page to the inactive list if @page was on the active
- * list and was not an unevictable page.  This is done to accelerate the reclaim
- * of @page.
+ * mark_page_lazyfree() moves @page to the inactive file list.
+ * This is done to accelerate the reclaim of @page.
  */
-void deactivate_page(struct page *page)
-{
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+void mark_page_lazyfree(struct page *page)
+ {
+	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
+	    !PageUnevictable(page)) {
+		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
 
 		get_page(page);
 		if (!pagevec_add(pvec, page) || PageCompound(page))
-			pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
-		put_cpu_var(lru_deactivate_pvecs);
+			pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
+		put_cpu_var(lru_lazyfree_pvecs);
 	}
 }
 
@@ -704,7 +716,7 @@ void lru_add_drain_all(void)
 		if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
 		    pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
 		    pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
-		    pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
+		    pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
 		    need_activate_page_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, lru_add_drain_wq, work);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 69f9aff..7774196 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -992,6 +992,7 @@ const char * const vmstat_text[] = {
 	"pgfree",
 	"pgactivate",
 	"pgdeactivate",
+	"pglazyfree",
 
 	"pgfault",
 	"pgmajfault",
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-22 18:50   ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
pages, but they can be freed without pageout. To destinguish them
against normal anonymous pages, we clear their SwapBacked flag.

MADV_FREE pages could be freed without pageout, so they pretty much like
used once file pages. For such pages, we'd like to reclaim them once
there is memory pressure. Also it might be unfair reclaiming MADV_FREE
pages always before used once file pages and we definitively want to
reclaim the pages before other anonymous and file pages.

To speed up MADV_FREE pages reclaim, we put the pages into
LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
nowadays and should be full of used once file pages. Reclaiming
MADV_FREE pages will not have much interfere of anonymous and active
file pages. And the inactive file pages and MADV_FREE pages will be
reclaimed according to their age, so we don't reclaim too many MADV_FREE
pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
means we can reclaim the pages without swap support. This idea is
suggested by Johannes.

This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
avoid bisect failure, next patch will do it.

The patch is based on Minchan's original patch.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/swap.h          |  2 +-
 include/linux/vm_event_item.h |  2 +-
 mm/huge_memory.c              |  3 ---
 mm/madvise.c                  |  2 --
 mm/swap.c                     | 56 ++++++++++++++++++++++++++-----------------
 mm/vmstat.c                   |  1 +
 6 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 45e91dd..486494e 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -279,7 +279,7 @@ extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_all(void);
 extern void rotate_reclaimable_page(struct page *page);
 extern void deactivate_file_page(struct page *page);
-extern void deactivate_page(struct page *page);
+extern void mark_page_lazyfree(struct page *page);
 extern void swap_setup(void);
 
 extern void add_page_to_unevictable_list(struct page *page);
diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
index 6aa1b6c..94e58da 100644
--- a/include/linux/vm_event_item.h
+++ b/include/linux/vm_event_item.h
@@ -25,7 +25,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
 		FOR_ALL_ZONES(PGALLOC),
 		FOR_ALL_ZONES(ALLOCSTALL),
 		FOR_ALL_ZONES(PGSCAN_SKIP),
-		PGFREE, PGACTIVATE, PGDEACTIVATE,
+		PGFREE, PGACTIVATE, PGDEACTIVATE, PGLAZYFREE,
 		PGFAULT, PGMAJFAULT,
 		PGLAZYFREED,
 		PGREFILL,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index cf9fb46..3b7ee0c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1562,9 +1562,6 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		ClearPageDirty(page);
 	unlock_page(page);
 
-	if (PageActive(page))
-		deactivate_page(page);
-
 	if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) {
 		orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd,
 			tlb->fullmm);
diff --git a/mm/madvise.c b/mm/madvise.c
index dc5927c..61e10b1 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -411,8 +411,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			ptent = pte_mkold(ptent);
 			ptent = pte_mkclean(ptent);
 			set_pte_at(mm, addr, pte, ptent);
-			if (PageActive(page))
-				deactivate_page(page);
 			tlb_remove_tlb_entry(tlb, pte, addr);
 		}
 	}
diff --git a/mm/swap.c b/mm/swap.c
index c4910f1..fbd8484 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -46,7 +46,7 @@ int page_cluster;
 static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
 static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
 static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
-static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
+static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
 #ifdef CONFIG_SMP
 static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
 #endif
@@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
 		int lru = page_lru_base_type(page);
 
 		del_page_from_lru_list(page, lruvec, lru);
+		if (PageAnon(page) && !PageSwapBacked(page)) {
+			SetPageSwapBacked(page);
+			/* charge to anon scanned/rotated reclaim_stat */
+			file = 0;
+			lru = LRU_INACTIVE_ANON;
+		}
 		SetPageActive(page);
 		lru += LRU_ACTIVE;
 		add_page_to_lru_list(page, lruvec, lru);
@@ -561,20 +567,26 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
 }
 
 
-static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
+static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
 			    void *arg)
 {
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		int file = page_is_file_cache(page);
-		int lru = page_lru_base_type(page);
+	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
+	    !PageUnevictable(page)) {
+		bool active = PageActive(page);
 
-		del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
+		del_page_from_lru_list(page, lruvec, LRU_INACTIVE_ANON + active);
 		ClearPageActive(page);
 		ClearPageReferenced(page);
-		add_page_to_lru_list(page, lruvec, lru);
+		/*
+		 * lazyfree pages are clean anonymous pages. They have
+		 * SwapBacked flag cleared to destinguish normal anonymous
+		 * pages
+		 */
+		ClearPageSwapBacked(page);
+		add_page_to_lru_list(page, lruvec, LRU_INACTIVE_FILE);
 
-		__count_vm_event(PGDEACTIVATE);
-		update_page_reclaim_stat(lruvec, file, 0);
+		__count_vm_events(PGLAZYFREE, hpage_nr_pages(page));
+		update_page_reclaim_stat(lruvec, 1, 0);
 	}
 }
 
@@ -604,9 +616,9 @@ void lru_add_drain_cpu(int cpu)
 	if (pagevec_count(pvec))
 		pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
 
-	pvec = &per_cpu(lru_deactivate_pvecs, cpu);
+	pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
 	if (pagevec_count(pvec))
-		pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
+		pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
 
 	activate_page_drain(cpu);
 }
@@ -638,22 +650,22 @@ void deactivate_file_page(struct page *page)
 }
 
 /**
- * deactivate_page - deactivate a page
+ * mark_page_lazyfree - make an anon page lazyfree
  * @page: page to deactivate
  *
- * deactivate_page() moves @page to the inactive list if @page was on the active
- * list and was not an unevictable page.  This is done to accelerate the reclaim
- * of @page.
+ * mark_page_lazyfree() moves @page to the inactive file list.
+ * This is done to accelerate the reclaim of @page.
  */
-void deactivate_page(struct page *page)
-{
-	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
-		struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
+void mark_page_lazyfree(struct page *page)
+ {
+	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
+	    !PageUnevictable(page)) {
+		struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
 
 		get_page(page);
 		if (!pagevec_add(pvec, page) || PageCompound(page))
-			pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
-		put_cpu_var(lru_deactivate_pvecs);
+			pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
+		put_cpu_var(lru_lazyfree_pvecs);
 	}
 }
 
@@ -704,7 +716,7 @@ void lru_add_drain_all(void)
 		if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
 		    pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
 		    pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
-		    pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
+		    pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
 		    need_activate_page_drain(cpu)) {
 			INIT_WORK(work, lru_add_drain_per_cpu);
 			queue_work_on(cpu, lru_add_drain_wq, work);
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 69f9aff..7774196 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -992,6 +992,7 @@ const char * const vmstat_text[] = {
 	"pgfree",
 	"pgactivate",
 	"pgdeactivate",
+	"pglazyfree",
 
 	"pgfault",
 	"pgmajfault",
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 4/6] mm: reclaim MADV_FREE pages
  2017-02-22 18:50 ` Shaohua Li
@ 2017-02-22 18:50   ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

When memory pressure is high, we free MADV_FREE pages. If the pages are
not dirty in pte, the pages could be freed immediately. Otherwise we
can't reclaim them. We put the pages back to anonumous LRU list (by
setting SwapBacked flag) and the pages will be reclaimed in normal
swapout way.

We use normal page reclaim policy. Since MADV_FREE pages are put into
inactive file list, such pages and inactive file pages are reclaimed
according to their age. This is expected, because we don't want to
reclaim too many MADV_FREE pages before used once pages.

Based on Minchan's original patch

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/rmap.h |  2 +-
 mm/huge_memory.c     |  2 ++
 mm/madvise.c         |  1 +
 mm/rmap.c            | 10 ++++++++--
 mm/vmscan.c          | 34 ++++++++++++++++++++++------------
 5 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e2cd8f9..2bfd8c6 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -300,6 +300,6 @@ static inline int page_mkclean(struct page *page)
 #define SWAP_AGAIN	1
 #define SWAP_FAIL	2
 #define SWAP_MLOCK	3
-#define SWAP_LZFREE	4
+#define SWAP_DIRTY	4
 
 #endif	/* _LINUX_RMAP_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3b7ee0c..4c7454b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1571,6 +1571,8 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		set_pmd_at(mm, addr, pmd, orig_pmd);
 		tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 	}
+
+	mark_page_lazyfree(page);
 	ret = true;
 out:
 	spin_unlock(ptl);
diff --git a/mm/madvise.c b/mm/madvise.c
index 61e10b1..225af7d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -413,6 +413,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			set_pte_at(mm, addr, pte, ptent);
 			tlb_remove_tlb_entry(tlb, pte, addr);
 		}
+		mark_page_lazyfree(page);
 	}
 out:
 	if (nr_swap) {
diff --git a/mm/rmap.c b/mm/rmap.c
index c621088..083f32e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 				dec_mm_counter(mm, MM_ANONPAGES);
 				rp->lazyfreed++;
 				goto discard;
+			} else if (!PageSwapBacked(page)) {
+				/* dirty MADV_FREE page */
+				set_pte_at(mm, address, pvmw.pte, pteval);
+				ret = SWAP_DIRTY;
+				page_vma_mapped_walk_done(&pvmw);
+				break;
 			}
 
 			if (swap_duplicate(entry) < 0) {
@@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 
 	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
 		ret = SWAP_SUCCESS;
-		if (rp.lazyfreed && !PageDirty(page))
-			ret = SWAP_LZFREE;
+		if (rp.lazyfreed && PageDirty(page))
+			ret = SWAP_DIRTY;
 	}
 	return ret;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 68ea50d..830981a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -911,7 +911,8 @@ static void page_check_dirty_writeback(struct page *page,
 	 * Anonymous pages are not handled by flushers and must be written
 	 * from reclaim context. Do not stall reclaim based on them
 	 */
-	if (!page_is_file_cache(page)) {
+	if (!page_is_file_cache(page) ||
+	    (PageAnon(page) && !PageSwapBacked(page))) {
 		*dirty = false;
 		*writeback = false;
 		return;
@@ -992,7 +993,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto keep_locked;
 
 		/* Double the slab pressure for mapped and swapcache pages */
-		if (page_mapped(page) || PageSwapCache(page))
+		if ((page_mapped(page) || PageSwapCache(page)) &&
+		    !(PageAnon(page) && !PageSwapBacked(page)))
 			sc->nr_scanned++;
 
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
@@ -1118,8 +1120,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		/*
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
+		 * Lazyfree page could be freed directly
 		 */
-		if (PageAnon(page) && !PageSwapCache(page)) {
+		if (PageAnon(page) && !PageSwapCache(page) &&
+		    PageSwapBacked(page)) {
 			if (!(sc->gfp_mask & __GFP_IO))
 				goto keep_locked;
 			if (!add_to_swap(page, page_list))
@@ -1140,9 +1144,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * The page is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.
 		 */
-		if (page_mapped(page) && mapping) {
+		if (page_mapped(page)) {
 			switch (ret = try_to_unmap(page,
 				ttu_flags | TTU_BATCH_FLUSH)) {
+			case SWAP_DIRTY:
+				SetPageSwapBacked(page);
+				/* fall through */
 			case SWAP_FAIL:
 				nr_unmap_fail++;
 				goto activate_locked;
@@ -1150,8 +1157,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			case SWAP_MLOCK:
 				goto cull_mlocked;
-			case SWAP_LZFREE:
-				goto lazyfree;
 			case SWAP_SUCCESS:
 				; /* try to free the page below */
 			}
@@ -1263,10 +1268,18 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-lazyfree:
-		if (!mapping || !__remove_mapping(mapping, page, true))
-			goto keep_locked;
+		if (PageAnon(page) && !PageSwapBacked(page)) {
+			/* follow __remove_mapping for reference */
+			if (!page_ref_freeze(page, 1))
+				goto keep_locked;
+			if (PageDirty(page)) {
+				page_ref_unfreeze(page, 1);
+				goto keep_locked;
+			}
 
+			count_vm_event(PGLAZYFREED);
+		} else if (!mapping || !__remove_mapping(mapping, page, true))
+			goto keep_locked;
 		/*
 		 * At this point, we have no other references and there is
 		 * no way to pick any more up (removed from LRU, removed
@@ -1276,9 +1289,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 */
 		__ClearPageLocked(page);
 free_it:
-		if (ret == SWAP_LZFREE)
-			count_vm_event(PGLAZYFREED);
-
 		nr_reclaimed++;
 
 		/*
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 4/6] mm: reclaim MADV_FREE pages
@ 2017-02-22 18:50   ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

When memory pressure is high, we free MADV_FREE pages. If the pages are
not dirty in pte, the pages could be freed immediately. Otherwise we
can't reclaim them. We put the pages back to anonumous LRU list (by
setting SwapBacked flag) and the pages will be reclaimed in normal
swapout way.

We use normal page reclaim policy. Since MADV_FREE pages are put into
inactive file list, such pages and inactive file pages are reclaimed
according to their age. This is expected, because we don't want to
reclaim too many MADV_FREE pages before used once pages.

Based on Minchan's original patch

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/rmap.h |  2 +-
 mm/huge_memory.c     |  2 ++
 mm/madvise.c         |  1 +
 mm/rmap.c            | 10 ++++++++--
 mm/vmscan.c          | 34 ++++++++++++++++++++++------------
 5 files changed, 34 insertions(+), 15 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index e2cd8f9..2bfd8c6 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -300,6 +300,6 @@ static inline int page_mkclean(struct page *page)
 #define SWAP_AGAIN	1
 #define SWAP_FAIL	2
 #define SWAP_MLOCK	3
-#define SWAP_LZFREE	4
+#define SWAP_DIRTY	4
 
 #endif	/* _LINUX_RMAP_H */
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3b7ee0c..4c7454b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1571,6 +1571,8 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
 		set_pmd_at(mm, addr, pmd, orig_pmd);
 		tlb_remove_pmd_tlb_entry(tlb, pmd, addr);
 	}
+
+	mark_page_lazyfree(page);
 	ret = true;
 out:
 	spin_unlock(ptl);
diff --git a/mm/madvise.c b/mm/madvise.c
index 61e10b1..225af7d 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -413,6 +413,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
 			set_pte_at(mm, addr, pte, ptent);
 			tlb_remove_tlb_entry(tlb, pte, addr);
 		}
+		mark_page_lazyfree(page);
 	}
 out:
 	if (nr_swap) {
diff --git a/mm/rmap.c b/mm/rmap.c
index c621088..083f32e 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 				dec_mm_counter(mm, MM_ANONPAGES);
 				rp->lazyfreed++;
 				goto discard;
+			} else if (!PageSwapBacked(page)) {
+				/* dirty MADV_FREE page */
+				set_pte_at(mm, address, pvmw.pte, pteval);
+				ret = SWAP_DIRTY;
+				page_vma_mapped_walk_done(&pvmw);
+				break;
 			}
 
 			if (swap_duplicate(entry) < 0) {
@@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 
 	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
 		ret = SWAP_SUCCESS;
-		if (rp.lazyfreed && !PageDirty(page))
-			ret = SWAP_LZFREE;
+		if (rp.lazyfreed && PageDirty(page))
+			ret = SWAP_DIRTY;
 	}
 	return ret;
 }
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 68ea50d..830981a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -911,7 +911,8 @@ static void page_check_dirty_writeback(struct page *page,
 	 * Anonymous pages are not handled by flushers and must be written
 	 * from reclaim context. Do not stall reclaim based on them
 	 */
-	if (!page_is_file_cache(page)) {
+	if (!page_is_file_cache(page) ||
+	    (PageAnon(page) && !PageSwapBacked(page))) {
 		*dirty = false;
 		*writeback = false;
 		return;
@@ -992,7 +993,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			goto keep_locked;
 
 		/* Double the slab pressure for mapped and swapcache pages */
-		if (page_mapped(page) || PageSwapCache(page))
+		if ((page_mapped(page) || PageSwapCache(page)) &&
+		    !(PageAnon(page) && !PageSwapBacked(page)))
 			sc->nr_scanned++;
 
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
@@ -1118,8 +1120,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		/*
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
+		 * Lazyfree page could be freed directly
 		 */
-		if (PageAnon(page) && !PageSwapCache(page)) {
+		if (PageAnon(page) && !PageSwapCache(page) &&
+		    PageSwapBacked(page)) {
 			if (!(sc->gfp_mask & __GFP_IO))
 				goto keep_locked;
 			if (!add_to_swap(page, page_list))
@@ -1140,9 +1144,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * The page is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.
 		 */
-		if (page_mapped(page) && mapping) {
+		if (page_mapped(page)) {
 			switch (ret = try_to_unmap(page,
 				ttu_flags | TTU_BATCH_FLUSH)) {
+			case SWAP_DIRTY:
+				SetPageSwapBacked(page);
+				/* fall through */
 			case SWAP_FAIL:
 				nr_unmap_fail++;
 				goto activate_locked;
@@ -1150,8 +1157,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				goto keep_locked;
 			case SWAP_MLOCK:
 				goto cull_mlocked;
-			case SWAP_LZFREE:
-				goto lazyfree;
 			case SWAP_SUCCESS:
 				; /* try to free the page below */
 			}
@@ -1263,10 +1268,18 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-lazyfree:
-		if (!mapping || !__remove_mapping(mapping, page, true))
-			goto keep_locked;
+		if (PageAnon(page) && !PageSwapBacked(page)) {
+			/* follow __remove_mapping for reference */
+			if (!page_ref_freeze(page, 1))
+				goto keep_locked;
+			if (PageDirty(page)) {
+				page_ref_unfreeze(page, 1);
+				goto keep_locked;
+			}
 
+			count_vm_event(PGLAZYFREED);
+		} else if (!mapping || !__remove_mapping(mapping, page, true))
+			goto keep_locked;
 		/*
 		 * At this point, we have no other references and there is
 		 * no way to pick any more up (removed from LRU, removed
@@ -1276,9 +1289,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 */
 		__ClearPageLocked(page);
 free_it:
-		if (ret == SWAP_LZFREE)
-			count_vm_event(PGLAZYFREED);
-
 		nr_reclaimed++;
 
 		/*
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 5/6] mm: enable MADV_FREE for swapless system
  2017-02-22 18:50 ` Shaohua Li
@ 2017-02-22 18:50   ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

Now MADV_FREE pages can be easily reclaimed even for swapless system. We
can safely enable MADV_FREE for all systems.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/madvise.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 225af7d..5ab4b7b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -612,13 +612,7 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	case MADV_WILLNEED:
 		return madvise_willneed(vma, prev, start, end);
 	case MADV_FREE:
-		/*
-		 * XXX: In this implementation, MADV_FREE works like
-		 * MADV_DONTNEED on swapless system or full swap.
-		 */
-		if (get_nr_swap_pages() > 0)
-			return madvise_free(vma, prev, start, end);
-		/* passthrough */
+		return madvise_free(vma, prev, start, end);
 	case MADV_DONTNEED:
 		return madvise_dontneed(vma, prev, start, end);
 	default:
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 5/6] mm: enable MADV_FREE for swapless system
@ 2017-02-22 18:50   ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

Now MADV_FREE pages can be easily reclaimed even for swapless system. We
can safely enable MADV_FREE for all systems.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 mm/madvise.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 225af7d..5ab4b7b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -612,13 +612,7 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 	case MADV_WILLNEED:
 		return madvise_willneed(vma, prev, start, end);
 	case MADV_FREE:
-		/*
-		 * XXX: In this implementation, MADV_FREE works like
-		 * MADV_DONTNEED on swapless system or full swap.
-		 */
-		if (get_nr_swap_pages() > 0)
-			return madvise_free(vma, prev, start, end);
-		/* passthrough */
+		return madvise_free(vma, prev, start, end);
 	case MADV_DONTNEED:
 		return madvise_dontneed(vma, prev, start, end);
 	default:
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
  2017-02-22 18:50 ` Shaohua Li
@ 2017-02-22 18:50   ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

show MADV_FREE pages info of each vma in smaps. The interface is for
diganose or monitoring purpose, userspace could use it to understand
what happens in the application. Since userspace could dirty MADV_FREE
pages without notice from kernel, this interface is the only place we
can get accurate accounting info about MADV_FREE pages.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 Documentation/filesystems/proc.txt | 4 ++++
 fs/proc/task_mmu.c                 | 8 +++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index c94b467..45853e1 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -412,6 +412,7 @@ Private_Clean:         0 kB
 Private_Dirty:         0 kB
 Referenced:          892 kB
 Anonymous:             0 kB
+LazyFree:              0 kB
 AnonHugePages:         0 kB
 ShmemPmdMapped:        0 kB
 Shared_Hugetlb:        0 kB
@@ -441,6 +442,9 @@ accessed.
 "Anonymous" shows the amount of memory that does not belong to any file.  Even
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
+"LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
+The memory isn't freed immediately with madvise(). It's freed in memory
+pressure if the memory is clean.
 "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
 "ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
 huge pages.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ee3efb2..8a5ec00 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -440,6 +440,7 @@ struct mem_size_stats {
 	unsigned long private_dirty;
 	unsigned long referenced;
 	unsigned long anonymous;
+	unsigned long lazyfree;
 	unsigned long anonymous_thp;
 	unsigned long shmem_thp;
 	unsigned long swap;
@@ -456,8 +457,11 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	int i, nr = compound ? 1 << compound_order(page) : 1;
 	unsigned long size = nr * PAGE_SIZE;
 
-	if (PageAnon(page))
+	if (PageAnon(page)) {
 		mss->anonymous += size;
+		if (!PageSwapBacked(page) && !dirty && !PageDirty(page))
+			mss->lazyfree += size;
+	}
 
 	mss->resident += size;
 	/* Accumulate the size in pages that have been accessed. */
@@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "Private_Dirty:  %8lu kB\n"
 		   "Referenced:     %8lu kB\n"
 		   "Anonymous:      %8lu kB\n"
+		   "LazyFree:       %8lu kB\n"
 		   "AnonHugePages:  %8lu kB\n"
 		   "ShmemPmdMapped: %8lu kB\n"
 		   "Shared_Hugetlb: %8lu kB\n"
@@ -788,6 +793,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   mss.private_dirty >> 10,
 		   mss.referenced >> 10,
 		   mss.anonymous >> 10,
+		   mss.lazyfree >> 10,
 		   mss.anonymous_thp >> 10,
 		   mss.shmem_thp >> 10,
 		   mss.shared_hugetlb >> 10,
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
@ 2017-02-22 18:50   ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-22 18:50 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

show MADV_FREE pages info of each vma in smaps. The interface is for
diganose or monitoring purpose, userspace could use it to understand
what happens in the application. Since userspace could dirty MADV_FREE
pages without notice from kernel, this interface is the only place we
can get accurate accounting info about MADV_FREE pages.

Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 Documentation/filesystems/proc.txt | 4 ++++
 fs/proc/task_mmu.c                 | 8 +++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index c94b467..45853e1 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -412,6 +412,7 @@ Private_Clean:         0 kB
 Private_Dirty:         0 kB
 Referenced:          892 kB
 Anonymous:             0 kB
+LazyFree:              0 kB
 AnonHugePages:         0 kB
 ShmemPmdMapped:        0 kB
 Shared_Hugetlb:        0 kB
@@ -441,6 +442,9 @@ accessed.
 "Anonymous" shows the amount of memory that does not belong to any file.  Even
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
+"LazyFree" shows the amount of memory which is marked by madvise(MADV_FREE).
+The memory isn't freed immediately with madvise(). It's freed in memory
+pressure if the memory is clean.
 "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
 "ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by
 huge pages.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index ee3efb2..8a5ec00 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -440,6 +440,7 @@ struct mem_size_stats {
 	unsigned long private_dirty;
 	unsigned long referenced;
 	unsigned long anonymous;
+	unsigned long lazyfree;
 	unsigned long anonymous_thp;
 	unsigned long shmem_thp;
 	unsigned long swap;
@@ -456,8 +457,11 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	int i, nr = compound ? 1 << compound_order(page) : 1;
 	unsigned long size = nr * PAGE_SIZE;
 
-	if (PageAnon(page))
+	if (PageAnon(page)) {
 		mss->anonymous += size;
+		if (!PageSwapBacked(page) && !dirty && !PageDirty(page))
+			mss->lazyfree += size;
+	}
 
 	mss->resident += size;
 	/* Accumulate the size in pages that have been accessed. */
@@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "Private_Dirty:  %8lu kB\n"
 		   "Referenced:     %8lu kB\n"
 		   "Anonymous:      %8lu kB\n"
+		   "LazyFree:       %8lu kB\n"
 		   "AnonHugePages:  %8lu kB\n"
 		   "ShmemPmdMapped: %8lu kB\n"
 		   "Shared_Hugetlb: %8lu kB\n"
@@ -788,6 +793,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   mss.private_dirty >> 10,
 		   mss.referenced >> 10,
 		   mss.anonymous >> 10,
+		   mss.lazyfree >> 10,
 		   mss.anonymous_thp >> 10,
 		   mss.shmem_thp >> 10,
 		   mss.shared_hugetlb >> 10,
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-23 15:35     ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 15:35 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:39AM -0800, Shaohua Li wrote:
> Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
> always have the flag set if we want to do an unmap. For cases we don't
> do an unmap, the TTU_LZFREE part of code should never run.
> 
> Also the TTU_UNMAP is unnecessary. If no other flags set (for
> example, TTU_MIGRATION), an unmap is implied.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>

Thanks!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

> @@ -83,10 +83,8 @@ struct anon_vma_chain {
>  };
>  
>  enum ttu_flags {
> -	TTU_UNMAP = 1,			/* unmap mode */
>  	TTU_MIGRATION = 2,		/* migration mode */
>  	TTU_MUNLOCK = 4,		/* munlock mode */
> -	TTU_LZFREE = 8,			/* lazy free mode */
>  	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
>  
>  	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */

This on top?

---
Subject: [PATCH] mm: delete unnecessary TTU_* flags fix

Clean up the TTU flags a bit. Remove dead TTU_ACTION macro.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 70ef7536c088..640214bc4635 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -82,17 +82,17 @@ struct anon_vma_chain {
 };
 
 enum ttu_flags {
-	TTU_MIGRATION = 2,		/* migration mode */
-	TTU_MUNLOCK = 4,		/* munlock mode */
-	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
-
-	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
-	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
-	TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
-	TTU_BATCH_FLUSH = (1 << 11),	/* Batch TLB flushes where possible
+	TTU_MIGRATION		= 0x1,	/* migration mode */
+	TTU_MUNLOCK		= 0x2,	/* munlock mode */
+
+	TTU_SPLIT_HUGE_PMD	= 0x4,	/* split huge PMD if any */
+	TTU_IGNORE_MLOCK	= 0x8,	/* ignore mlock */
+	TTU_IGNORE_ACCESS	= 0x10,	/* don't age */
+	TTU_IGNORE_HWPOISON	= 0x20,	/* corrupted page is recoverable */
+	TTU_BATCH_FLUSH		= 0x40,	/* Batch TLB flushes where possible
 					 * and caller guarantees they will
 					 * do a final flush if necessary */
-	TTU_RMAP_LOCKED = (1 << 12)	/* do not grab rmap lock:
+	TTU_RMAP_LOCKED		= 0x80	/* do not grab rmap lock:
 					 * caller holds it */
 };
 
@@ -182,8 +182,6 @@ static inline void page_dup_rmap(struct page *page, bool compound)
 int page_referenced(struct page *, int is_locked,
 			struct mem_cgroup *memcg, unsigned long *vm_flags);
 
-#define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
-
 int try_to_unmap(struct page *, enum ttu_flags flags);
 
 /*

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
@ 2017-02-23 15:35     ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 15:35 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:39AM -0800, Shaohua Li wrote:
> Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
> always have the flag set if we want to do an unmap. For cases we don't
> do an unmap, the TTU_LZFREE part of code should never run.
> 
> Also the TTU_UNMAP is unnecessary. If no other flags set (for
> example, TTU_MIGRATION), an unmap is implied.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>

Thanks!

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

> @@ -83,10 +83,8 @@ struct anon_vma_chain {
>  };
>  
>  enum ttu_flags {
> -	TTU_UNMAP = 1,			/* unmap mode */
>  	TTU_MIGRATION = 2,		/* migration mode */
>  	TTU_MUNLOCK = 4,		/* munlock mode */
> -	TTU_LZFREE = 8,			/* lazy free mode */
>  	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
>  
>  	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */

This on top?

---
Subject: [PATCH] mm: delete unnecessary TTU_* flags fix

Clean up the TTU flags a bit. Remove dead TTU_ACTION macro.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
---

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 70ef7536c088..640214bc4635 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -82,17 +82,17 @@ struct anon_vma_chain {
 };
 
 enum ttu_flags {
-	TTU_MIGRATION = 2,		/* migration mode */
-	TTU_MUNLOCK = 4,		/* munlock mode */
-	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
-
-	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
-	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
-	TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
-	TTU_BATCH_FLUSH = (1 << 11),	/* Batch TLB flushes where possible
+	TTU_MIGRATION		= 0x1,	/* migration mode */
+	TTU_MUNLOCK		= 0x2,	/* munlock mode */
+
+	TTU_SPLIT_HUGE_PMD	= 0x4,	/* split huge PMD if any */
+	TTU_IGNORE_MLOCK	= 0x8,	/* ignore mlock */
+	TTU_IGNORE_ACCESS	= 0x10,	/* don't age */
+	TTU_IGNORE_HWPOISON	= 0x20,	/* corrupted page is recoverable */
+	TTU_BATCH_FLUSH		= 0x40,	/* Batch TLB flushes where possible
 					 * and caller guarantees they will
 					 * do a final flush if necessary */
-	TTU_RMAP_LOCKED = (1 << 12)	/* do not grab rmap lock:
+	TTU_RMAP_LOCKED		= 0x80	/* do not grab rmap lock:
 					 * caller holds it */
 };
 
@@ -182,8 +182,6 @@ static inline void page_dup_rmap(struct page *page, bool compound)
 int page_referenced(struct page *, int is_locked,
 			struct mem_cgroup *memcg, unsigned long *vm_flags);
 
-#define TTU_ACTION(x) ((x) & TTU_ACTION_MASK)
-
 int try_to_unmap(struct page *, enum ttu_flags flags);
 
 /*

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-23 15:58     ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 15:58 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

Hi Shaohua,

On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
>  		int lru = page_lru_base_type(page);
>  
>  		del_page_from_lru_list(page, lruvec, lru);
> +		if (PageAnon(page) && !PageSwapBacked(page)) {
> +			SetPageSwapBacked(page);
> +			/* charge to anon scanned/rotated reclaim_stat */
> +			file = 0;
> +			lru = LRU_INACTIVE_ANON;
> +		}

As per my previous feedback, please remove this. Write-after-free will
be caught and handled in the reclaimer, read-after-free is a bug that
really doesn't require optimizing page aging for. And we definitely
shouldn't declare invalid data suddenly valid because it's being read.

> @@ -561,20 +567,26 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
>  }
>  
>  
> -static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
> +static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
>  			    void *arg)
>  {
> -	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
> -		int file = page_is_file_cache(page);
> -		int lru = page_lru_base_type(page);
> +	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> +	    !PageUnevictable(page)) {
> +		bool active = PageActive(page);
>  
> -		del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
> +		del_page_from_lru_list(page, lruvec, LRU_INACTIVE_ANON + active);
>  		ClearPageActive(page);
>  		ClearPageReferenced(page);
> -		add_page_to_lru_list(page, lruvec, lru);
> +		/*
> +		 * lazyfree pages are clean anonymous pages. They have
> +		 * SwapBacked flag cleared to destinguish normal anonymous
> +		 * pages

distinguish

Otherwise, looks great to me. Thanks!

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-23 15:58     ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 15:58 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

Hi Shaohua,

On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
>  		int lru = page_lru_base_type(page);
>  
>  		del_page_from_lru_list(page, lruvec, lru);
> +		if (PageAnon(page) && !PageSwapBacked(page)) {
> +			SetPageSwapBacked(page);
> +			/* charge to anon scanned/rotated reclaim_stat */
> +			file = 0;
> +			lru = LRU_INACTIVE_ANON;
> +		}

As per my previous feedback, please remove this. Write-after-free will
be caught and handled in the reclaimer, read-after-free is a bug that
really doesn't require optimizing page aging for. And we definitely
shouldn't declare invalid data suddenly valid because it's being read.

> @@ -561,20 +567,26 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
>  }
>  
>  
> -static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
> +static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
>  			    void *arg)
>  {
> -	if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
> -		int file = page_is_file_cache(page);
> -		int lru = page_lru_base_type(page);
> +	if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
> +	    !PageUnevictable(page)) {
> +		bool active = PageActive(page);
>  
> -		del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
> +		del_page_from_lru_list(page, lruvec, LRU_INACTIVE_ANON + active);
>  		ClearPageActive(page);
>  		ClearPageReferenced(page);
> -		add_page_to_lru_list(page, lruvec, lru);
> +		/*
> +		 * lazyfree pages are clean anonymous pages. They have
> +		 * SwapBacked flag cleared to destinguish normal anonymous
> +		 * pages

distinguish

Otherwise, looks great to me. Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-23 16:13     ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 16:13 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  				dec_mm_counter(mm, MM_ANONPAGES);
>  				rp->lazyfreed++;
>  				goto discard;
> +			} else if (!PageSwapBacked(page)) {
> +				/* dirty MADV_FREE page */
> +				set_pte_at(mm, address, pvmw.pte, pteval);
> +				ret = SWAP_DIRTY;
> +				page_vma_mapped_walk_done(&pvmw);
> +				break;
>  			}
>  
>  			if (swap_duplicate(entry) < 0) {
> @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
>  
>  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
>  		ret = SWAP_SUCCESS;
> -		if (rp.lazyfreed && !PageDirty(page))
> -			ret = SWAP_LZFREE;
> +		if (rp.lazyfreed && PageDirty(page))
> +			ret = SWAP_DIRTY;

Can this actually happen? If the page is dirty, ret should already be
SWAP_DIRTY, right? How would a dirty page get fully unmapped?

It seems to me rp.lazyfreed can be removed entirely now that we don't
have to identify the lazyfree case anymore. The failure case is much
easier to identify - all it takes is a single pte to be dirty.

> @@ -1118,8 +1120,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		/*
>  		 * Anonymous process memory has backing store?
>  		 * Try to allocate it some swap space here.
> +		 * Lazyfree page could be freed directly
>  		 */
> -		if (PageAnon(page) && !PageSwapCache(page)) {
> +		if (PageAnon(page) && !PageSwapCache(page) &&
> +		    PageSwapBacked(page)) {

Nit: I'd do PageAnon(page) && PageSwapBacked(page) && !PageSwapCache()
since anon && swapbacked together describe the page type and swapcache
the state. Plus, anon && swapbacked go together everywhere else.

Otherwise, looks very straight-forward!

Thanks

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
@ 2017-02-23 16:13     ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 16:13 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  				dec_mm_counter(mm, MM_ANONPAGES);
>  				rp->lazyfreed++;
>  				goto discard;
> +			} else if (!PageSwapBacked(page)) {
> +				/* dirty MADV_FREE page */
> +				set_pte_at(mm, address, pvmw.pte, pteval);
> +				ret = SWAP_DIRTY;
> +				page_vma_mapped_walk_done(&pvmw);
> +				break;
>  			}
>  
>  			if (swap_duplicate(entry) < 0) {
> @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
>  
>  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
>  		ret = SWAP_SUCCESS;
> -		if (rp.lazyfreed && !PageDirty(page))
> -			ret = SWAP_LZFREE;
> +		if (rp.lazyfreed && PageDirty(page))
> +			ret = SWAP_DIRTY;

Can this actually happen? If the page is dirty, ret should already be
SWAP_DIRTY, right? How would a dirty page get fully unmapped?

It seems to me rp.lazyfreed can be removed entirely now that we don't
have to identify the lazyfree case anymore. The failure case is much
easier to identify - all it takes is a single pte to be dirty.

> @@ -1118,8 +1120,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		/*
>  		 * Anonymous process memory has backing store?
>  		 * Try to allocate it some swap space here.
> +		 * Lazyfree page could be freed directly
>  		 */
> -		if (PageAnon(page) && !PageSwapCache(page)) {
> +		if (PageAnon(page) && !PageSwapCache(page) &&
> +		    PageSwapBacked(page)) {

Nit: I'd do PageAnon(page) && PageSwapBacked(page) && !PageSwapCache()
since anon && swapbacked together describe the page type and swapcache
the state. Plus, anon && swapbacked go together everywhere else.

Otherwise, looks very straight-forward!

Thanks

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-23 16:16     ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 16:16 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:44AM -0800, Shaohua Li wrote:
> show MADV_FREE pages info of each vma in smaps. The interface is for
> diganose or monitoring purpose, userspace could use it to understand
> what happens in the application. Since userspace could dirty MADV_FREE
> pages without notice from kernel, this interface is the only place we
> can get accurate accounting info about MADV_FREE pages.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Shaohua Li <shli@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
@ 2017-02-23 16:16     ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 16:16 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:44AM -0800, Shaohua Li wrote:
> show MADV_FREE pages info of each vma in smaps. The interface is for
> diganose or monitoring purpose, userspace could use it to understand
> what happens in the application. Since userspace could dirty MADV_FREE
> pages without notice from kernel, this interface is the only place we
> can get accurate accounting info about MADV_FREE pages.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Shaohua Li <shli@fb.com>

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-23 15:58     ` Johannes Weiner
@ 2017-02-23 16:26       ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-23 16:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 10:58:27AM -0500, Johannes Weiner wrote:
> Hi Shaohua,
> 
> On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
> >  		int lru = page_lru_base_type(page);
> >  
> >  		del_page_from_lru_list(page, lruvec, lru);
> > +		if (PageAnon(page) && !PageSwapBacked(page)) {
> > +			SetPageSwapBacked(page);
> > +			/* charge to anon scanned/rotated reclaim_stat */
> > +			file = 0;
> > +			lru = LRU_INACTIVE_ANON;
> > +		}
> 
> As per my previous feedback, please remove this. Write-after-free will
> be caught and handled in the reclaimer, read-after-free is a bug that
> really doesn't require optimizing page aging for. And we definitely
> shouldn't declare invalid data suddenly valid because it's being read.

GUP could run into this. Don't we move the page because it's hot? I think it's
not just about page aging. If we leave the page there, page reclaim will just
waste time to reclaim the pages which should't be reclaimed.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-23 16:26       ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-23 16:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 10:58:27AM -0500, Johannes Weiner wrote:
> Hi Shaohua,
> 
> On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
> >  		int lru = page_lru_base_type(page);
> >  
> >  		del_page_from_lru_list(page, lruvec, lru);
> > +		if (PageAnon(page) && !PageSwapBacked(page)) {
> > +			SetPageSwapBacked(page);
> > +			/* charge to anon scanned/rotated reclaim_stat */
> > +			file = 0;
> > +			lru = LRU_INACTIVE_ANON;
> > +		}
> 
> As per my previous feedback, please remove this. Write-after-free will
> be caught and handled in the reclaimer, read-after-free is a bug that
> really doesn't require optimizing page aging for. And we definitely
> shouldn't declare invalid data suddenly valid because it's being read.

GUP could run into this. Don't we move the page because it's hot? I think it's
not just about page aging. If we leave the page there, page reclaim will just
waste time to reclaim the pages which should't be reclaimed.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
  2017-02-23 16:13     ` Johannes Weiner
@ 2017-02-23 17:19       ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-23 17:19 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 11:13:42AM -0500, Johannes Weiner wrote:
> On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> > @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >  				dec_mm_counter(mm, MM_ANONPAGES);
> >  				rp->lazyfreed++;
> >  				goto discard;
> > +			} else if (!PageSwapBacked(page)) {
> > +				/* dirty MADV_FREE page */
> > +				set_pte_at(mm, address, pvmw.pte, pteval);
> > +				ret = SWAP_DIRTY;
> > +				page_vma_mapped_walk_done(&pvmw);
> > +				break;
> >  			}
> >  
> >  			if (swap_duplicate(entry) < 0) {
> > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> >  		ret = SWAP_SUCCESS;
> > -		if (rp.lazyfreed && !PageDirty(page))
> > -			ret = SWAP_LZFREE;
> > +		if (rp.lazyfreed && PageDirty(page))
> > +			ret = SWAP_DIRTY;
> 
> Can this actually happen? If the page is dirty, ret should already be
> SWAP_DIRTY, right? How would a dirty page get fully unmapped?
> 
> It seems to me rp.lazyfreed can be removed entirely now that we don't
> have to identify the lazyfree case anymore. The failure case is much
> easier to identify - all it takes is a single pte to be dirty.

ok, I get mixed up. Yes, this couldn't happen any more since we changed the
behavior of try_to_unmap_one. Will delete this in next post.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
@ 2017-02-23 17:19       ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-23 17:19 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 11:13:42AM -0500, Johannes Weiner wrote:
> On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> > @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >  				dec_mm_counter(mm, MM_ANONPAGES);
> >  				rp->lazyfreed++;
> >  				goto discard;
> > +			} else if (!PageSwapBacked(page)) {
> > +				/* dirty MADV_FREE page */
> > +				set_pte_at(mm, address, pvmw.pte, pteval);
> > +				ret = SWAP_DIRTY;
> > +				page_vma_mapped_walk_done(&pvmw);
> > +				break;
> >  			}
> >  
> >  			if (swap_duplicate(entry) < 0) {
> > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> >  		ret = SWAP_SUCCESS;
> > -		if (rp.lazyfreed && !PageDirty(page))
> > -			ret = SWAP_LZFREE;
> > +		if (rp.lazyfreed && PageDirty(page))
> > +			ret = SWAP_DIRTY;
> 
> Can this actually happen? If the page is dirty, ret should already be
> SWAP_DIRTY, right? How would a dirty page get fully unmapped?
> 
> It seems to me rp.lazyfreed can be removed entirely now that we don't
> have to identify the lazyfree case anymore. The failure case is much
> easier to identify - all it takes is a single pte to be dirty.

ok, I get mixed up. Yes, this couldn't happen any more since we changed the
behavior of try_to_unmap_one. Will delete this in next post.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-23 16:26       ` Shaohua Li
@ 2017-02-23 18:22         ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 18:22 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 08:26:03AM -0800, Shaohua Li wrote:
> On Thu, Feb 23, 2017 at 10:58:27AM -0500, Johannes Weiner wrote:
> > Hi Shaohua,
> > 
> > On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > > @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
> > >  		int lru = page_lru_base_type(page);
> > >  
> > >  		del_page_from_lru_list(page, lruvec, lru);
> > > +		if (PageAnon(page) && !PageSwapBacked(page)) {
> > > +			SetPageSwapBacked(page);
> > > +			/* charge to anon scanned/rotated reclaim_stat */
> > > +			file = 0;
> > > +			lru = LRU_INACTIVE_ANON;
> > > +		}
> > 
> > As per my previous feedback, please remove this. Write-after-free will
> > be caught and handled in the reclaimer, read-after-free is a bug that
> > really doesn't require optimizing page aging for. And we definitely
> > shouldn't declare invalid data suddenly valid because it's being read.
> 
> GUP could run into this. Don't we move the page because it's hot? I think it's
> not just about page aging. If we leave the page there, page reclaim will just
> waste time to reclaim the pages which should't be reclaimed.

There is just no convincing justification to add this code, because it
optimizes something that doesn't have a real world application. If we
just delete this branch, for all intents and purposes the outcome will
be perfectly acceptable.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-23 18:22         ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-23 18:22 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 08:26:03AM -0800, Shaohua Li wrote:
> On Thu, Feb 23, 2017 at 10:58:27AM -0500, Johannes Weiner wrote:
> > Hi Shaohua,
> > 
> > On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > > @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
> > >  		int lru = page_lru_base_type(page);
> > >  
> > >  		del_page_from_lru_list(page, lruvec, lru);
> > > +		if (PageAnon(page) && !PageSwapBacked(page)) {
> > > +			SetPageSwapBacked(page);
> > > +			/* charge to anon scanned/rotated reclaim_stat */
> > > +			file = 0;
> > > +			lru = LRU_INACTIVE_ANON;
> > > +		}
> > 
> > As per my previous feedback, please remove this. Write-after-free will
> > be caught and handled in the reclaimer, read-after-free is a bug that
> > really doesn't require optimizing page aging for. And we definitely
> > shouldn't declare invalid data suddenly valid because it's being read.
> 
> GUP could run into this. Don't we move the page because it's hot? I think it's
> not just about page aging. If we leave the page there, page reclaim will just
> waste time to reclaim the pages which should't be reclaimed.

There is just no convincing justification to add this code, because it
optimizes something that doesn't have a real world application. If we
just delete this branch, for all intents and purposes the outcome will
be perfectly acceptable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-23 18:22         ` Johannes Weiner
@ 2017-02-23 19:04           ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-23 19:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 01:22:06PM -0500, Johannes Weiner wrote:
> On Thu, Feb 23, 2017 at 08:26:03AM -0800, Shaohua Li wrote:
> > On Thu, Feb 23, 2017 at 10:58:27AM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > > > @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
> > > >  		int lru = page_lru_base_type(page);
> > > >  
> > > >  		del_page_from_lru_list(page, lruvec, lru);
> > > > +		if (PageAnon(page) && !PageSwapBacked(page)) {
> > > > +			SetPageSwapBacked(page);
> > > > +			/* charge to anon scanned/rotated reclaim_stat */
> > > > +			file = 0;
> > > > +			lru = LRU_INACTIVE_ANON;
> > > > +		}
> > > 
> > > As per my previous feedback, please remove this. Write-after-free will
> > > be caught and handled in the reclaimer, read-after-free is a bug that
> > > really doesn't require optimizing page aging for. And we definitely
> > > shouldn't declare invalid data suddenly valid because it's being read.
> > 
> > GUP could run into this. Don't we move the page because it's hot? I think it's
> > not just about page aging. If we leave the page there, page reclaim will just
> > waste time to reclaim the pages which should't be reclaimed.
> 
> There is just no convincing justification to add this code, because it
> optimizes something that doesn't have a real world application. If we
> just delete this branch, for all intents and purposes the outcome will
> be perfectly acceptable.

Ok, looks you want to ignore all corner cases, the gup case is one and the
unmap failure and mlock case we discussed before are another. I don't disagree
with the intention, but I had the feeling those code will eventually come back.
Anyway, I'll delete this code in next post.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-23 19:04           ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-23 19:04 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	riel, mgorman, akpm

On Thu, Feb 23, 2017 at 01:22:06PM -0500, Johannes Weiner wrote:
> On Thu, Feb 23, 2017 at 08:26:03AM -0800, Shaohua Li wrote:
> > On Thu, Feb 23, 2017 at 10:58:27AM -0500, Johannes Weiner wrote:
> > > Hi Shaohua,
> > > 
> > > On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > > > @@ -268,6 +268,12 @@ static void __activate_page(struct page *page, struct lruvec *lruvec,
> > > >  		int lru = page_lru_base_type(page);
> > > >  
> > > >  		del_page_from_lru_list(page, lruvec, lru);
> > > > +		if (PageAnon(page) && !PageSwapBacked(page)) {
> > > > +			SetPageSwapBacked(page);
> > > > +			/* charge to anon scanned/rotated reclaim_stat */
> > > > +			file = 0;
> > > > +			lru = LRU_INACTIVE_ANON;
> > > > +		}
> > > 
> > > As per my previous feedback, please remove this. Write-after-free will
> > > be caught and handled in the reclaimer, read-after-free is a bug that
> > > really doesn't require optimizing page aging for. And we definitely
> > > shouldn't declare invalid data suddenly valid because it's being read.
> > 
> > GUP could run into this. Don't we move the page because it's hot? I think it's
> > not just about page aging. If we leave the page there, page reclaim will just
> > waste time to reclaim the pages which should't be reclaimed.
> 
> There is just no convincing justification to add this code, because it
> optimizes something that doesn't have a real world application. If we
> just delete this branch, for all intents and purposes the outcome will
> be perfectly acceptable.

Ok, looks you want to ignore all corner cases, the gup case is one and the
unmap failure and mlock case we discussed before are another. I don't disagree
with the intention, but I had the feeling those code will eventually come back.
Anyway, I'll delete this code in next post.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-24  1:25     ` Minchan Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  1:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:39AM -0800, Shaohua Li wrote:
> Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
> always have the flag set if we want to do an unmap. For cases we don't
> do an unmap, the TTU_LZFREE part of code should never run.
> 
> Also the TTU_UNMAP is unnecessary. If no other flags set (for
> example, TTU_MIGRATION), an unmap is implied.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
@ 2017-02-24  1:25     ` Minchan Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  1:25 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:39AM -0800, Shaohua Li wrote:
> Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
> always have the flag set if we want to do an unmap. For cases we don't
> do an unmap, the TTU_LZFREE part of code should never run.
> 
> Also the TTU_UNMAP is unnecessary. If no other flags set (for
> example, TTU_MIGRATION), an unmap is implied.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Minchan Kim <minchan@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-24  1:49     ` Minchan Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  1:49 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> pages, but they can be freed without pageout. To destinguish them
> against normal anonymous pages, we clear their SwapBacked flag.
> 
> MADV_FREE pages could be freed without pageout, so they pretty much like
> used once file pages. For such pages, we'd like to reclaim them once
> there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> pages always before used once file pages and we definitively want to
> reclaim the pages before other anonymous and file pages.
> 
> To speed up MADV_FREE pages reclaim, we put the pages into
> LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> nowadays and should be full of used once file pages. Reclaiming
> MADV_FREE pages will not have much interfere of anonymous and active
> file pages. And the inactive file pages and MADV_FREE pages will be
> reclaimed according to their age, so we don't reclaim too many MADV_FREE
> pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> means we can reclaim the pages without swap support. This idea is
> suggested by Johannes.
> 
> This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> avoid bisect failure, next patch will do it.
> 
> The patch is based on Minchan's original patch.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>

Other than that Johannes pointed out, code itself looks good to me.
However, I hope to merge this patch with next one.
It's enough simple to merge, change behavior(about deactivation),
mark_page_lazyfree is introduced but there is no callsite to use it
in this patch.

I don't think it's worth to separate.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-24  1:49     ` Minchan Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  1:49 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> pages, but they can be freed without pageout. To destinguish them
> against normal anonymous pages, we clear their SwapBacked flag.
> 
> MADV_FREE pages could be freed without pageout, so they pretty much like
> used once file pages. For such pages, we'd like to reclaim them once
> there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> pages always before used once file pages and we definitively want to
> reclaim the pages before other anonymous and file pages.
> 
> To speed up MADV_FREE pages reclaim, we put the pages into
> LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> nowadays and should be full of used once file pages. Reclaiming
> MADV_FREE pages will not have much interfere of anonymous and active
> file pages. And the inactive file pages and MADV_FREE pages will be
> reclaimed according to their age, so we don't reclaim too many MADV_FREE
> pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> means we can reclaim the pages without swap support. This idea is
> suggested by Johannes.
> 
> This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> avoid bisect failure, next patch will do it.
> 
> The patch is based on Minchan's original patch.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>

Other than that Johannes pointed out, code itself looks good to me.
However, I hope to merge this patch with next one.
It's enough simple to merge, change behavior(about deactivation),
mark_page_lazyfree is introduced but there is no callsite to use it
in this patch.

I don't think it's worth to separate.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-24  2:12     ` Minchan Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  2:12 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> When memory pressure is high, we free MADV_FREE pages. If the pages are
> not dirty in pte, the pages could be freed immediately. Otherwise we
> can't reclaim them. We put the pages back to anonumous LRU list (by
> setting SwapBacked flag) and the pages will be reclaimed in normal
> swapout way.
> 
> We use normal page reclaim policy. Since MADV_FREE pages are put into
> inactive file list, such pages and inactive file pages are reclaimed
> according to their age. This is expected, because we don't want to
> reclaim too many MADV_FREE pages before used once pages.
> 
> Based on Minchan's original patch
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  include/linux/rmap.h |  2 +-
>  mm/huge_memory.c     |  2 ++
>  mm/madvise.c         |  1 +
>  mm/rmap.c            | 10 ++++++++--
>  mm/vmscan.c          | 34 ++++++++++++++++++++++------------
>  5 files changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index e2cd8f9..2bfd8c6 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -300,6 +300,6 @@ static inline int page_mkclean(struct page *page)
>  #define SWAP_AGAIN	1
>  #define SWAP_FAIL	2
>  #define SWAP_MLOCK	3
> -#define SWAP_LZFREE	4
> +#define SWAP_DIRTY	4

Could you write down about SWAP_DIRTY in try_to_unmap's description?

< snip >

> diff --git a/mm/rmap.c b/mm/rmap.c
> index c621088..083f32e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  				dec_mm_counter(mm, MM_ANONPAGES);
>  				rp->lazyfreed++;
>  				goto discard;
> +			} else if (!PageSwapBacked(page)) {
> +				/* dirty MADV_FREE page */
> +				set_pte_at(mm, address, pvmw.pte, pteval);
> +				ret = SWAP_DIRTY;
> +				page_vma_mapped_walk_done(&pvmw);
> +				break;
>  			}
>  
>  			if (swap_duplicate(entry) < 0) {
> @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
>  
>  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
>  		ret = SWAP_SUCCESS;
> -		if (rp.lazyfreed && !PageDirty(page))
> -			ret = SWAP_LZFREE;
> +		if (rp.lazyfreed && PageDirty(page))
> +			ret = SWAP_DIRTY;

Hmm, I don't understand why we need to introduce new return value.
Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?

>  	}
>  	return ret;
>  }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 68ea50d..830981a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c

< snip >
>  			goto keep_locked;
>  
>  		/* Double the slab pressure for mapped and swapcache pages */
> -		if (page_mapped(page) || PageSwapCache(page))
> +		if ((page_mapped(page) || PageSwapCache(page)) &&
> +		    !(PageAnon(page) && !PageSwapBacked(page)))
>  			sc->nr_scanned++;
>  
>  		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
> @@ -1118,8 +1120,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		/*
>  		 * Anonymous process memory has backing store?
>  		 * Try to allocate it some swap space here.
> +		 * Lazyfree page could be freed directly
>  		 */
> -		if (PageAnon(page) && !PageSwapCache(page)) {
> +		if (PageAnon(page) && !PageSwapCache(page) &&
> +		    PageSwapBacked(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
>  			if (!add_to_swap(page, page_list))
> @@ -1140,9 +1144,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		 * The page is mapped into the page tables of one or more
>  		 * processes. Try to unmap it here.
>  		 */
> -		if (page_mapped(page) && mapping) {
> +		if (page_mapped(page)) {
>  			switch (ret = try_to_unmap(page,
>  				ttu_flags | TTU_BATCH_FLUSH)) {
> +			case SWAP_DIRTY:
> +				SetPageSwapBacked(page);
> +				/* fall through */
>  			case SWAP_FAIL:
>  				nr_unmap_fail++;
>  				goto activate_locked;

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
@ 2017-02-24  2:12     ` Minchan Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  2:12 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> When memory pressure is high, we free MADV_FREE pages. If the pages are
> not dirty in pte, the pages could be freed immediately. Otherwise we
> can't reclaim them. We put the pages back to anonumous LRU list (by
> setting SwapBacked flag) and the pages will be reclaimed in normal
> swapout way.
> 
> We use normal page reclaim policy. Since MADV_FREE pages are put into
> inactive file list, such pages and inactive file pages are reclaimed
> according to their age. This is expected, because we don't want to
> reclaim too many MADV_FREE pages before used once pages.
> 
> Based on Minchan's original patch
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  include/linux/rmap.h |  2 +-
>  mm/huge_memory.c     |  2 ++
>  mm/madvise.c         |  1 +
>  mm/rmap.c            | 10 ++++++++--
>  mm/vmscan.c          | 34 ++++++++++++++++++++++------------
>  5 files changed, 34 insertions(+), 15 deletions(-)
> 
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index e2cd8f9..2bfd8c6 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -300,6 +300,6 @@ static inline int page_mkclean(struct page *page)
>  #define SWAP_AGAIN	1
>  #define SWAP_FAIL	2
>  #define SWAP_MLOCK	3
> -#define SWAP_LZFREE	4
> +#define SWAP_DIRTY	4

Could you write down about SWAP_DIRTY in try_to_unmap's description?

< snip >

> diff --git a/mm/rmap.c b/mm/rmap.c
> index c621088..083f32e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  				dec_mm_counter(mm, MM_ANONPAGES);
>  				rp->lazyfreed++;
>  				goto discard;
> +			} else if (!PageSwapBacked(page)) {
> +				/* dirty MADV_FREE page */
> +				set_pte_at(mm, address, pvmw.pte, pteval);
> +				ret = SWAP_DIRTY;
> +				page_vma_mapped_walk_done(&pvmw);
> +				break;
>  			}
>  
>  			if (swap_duplicate(entry) < 0) {
> @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
>  
>  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
>  		ret = SWAP_SUCCESS;
> -		if (rp.lazyfreed && !PageDirty(page))
> -			ret = SWAP_LZFREE;
> +		if (rp.lazyfreed && PageDirty(page))
> +			ret = SWAP_DIRTY;

Hmm, I don't understand why we need to introduce new return value.
Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?

>  	}
>  	return ret;
>  }
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 68ea50d..830981a 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c

< snip >
>  			goto keep_locked;
>  
>  		/* Double the slab pressure for mapped and swapcache pages */
> -		if (page_mapped(page) || PageSwapCache(page))
> +		if ((page_mapped(page) || PageSwapCache(page)) &&
> +		    !(PageAnon(page) && !PageSwapBacked(page)))
>  			sc->nr_scanned++;
>  
>  		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
> @@ -1118,8 +1120,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		/*
>  		 * Anonymous process memory has backing store?
>  		 * Try to allocate it some swap space here.
> +		 * Lazyfree page could be freed directly
>  		 */
> -		if (PageAnon(page) && !PageSwapCache(page)) {
> +		if (PageAnon(page) && !PageSwapCache(page) &&
> +		    PageSwapBacked(page)) {
>  			if (!(sc->gfp_mask & __GFP_IO))
>  				goto keep_locked;
>  			if (!add_to_swap(page, page_list))
> @@ -1140,9 +1144,12 @@ static unsigned long shrink_page_list(struct list_head *page_list,
>  		 * The page is mapped into the page tables of one or more
>  		 * processes. Try to unmap it here.
>  		 */
> -		if (page_mapped(page) && mapping) {
> +		if (page_mapped(page)) {
>  			switch (ret = try_to_unmap(page,
>  				ttu_flags | TTU_BATCH_FLUSH)) {
> +			case SWAP_DIRTY:
> +				SetPageSwapBacked(page);
> +				/* fall through */
>  			case SWAP_FAIL:
>  				nr_unmap_fail++;
>  				goto activate_locked;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-24  2:13     ` Minchan Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  2:13 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:44AM -0800, Shaohua Li wrote:
> show MADV_FREE pages info of each vma in smaps. The interface is for
> diganose or monitoring purpose, userspace could use it to understand
> what happens in the application. Since userspace could dirty MADV_FREE
> pages without notice from kernel, this interface is the only place we
> can get accurate accounting info about MADV_FREE pages.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Minchan Kim <minchan@kernel.org>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
@ 2017-02-24  2:13     ` Minchan Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24  2:13 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Wed, Feb 22, 2017 at 10:50:44AM -0800, Shaohua Li wrote:
> show MADV_FREE pages info of each vma in smaps. The interface is for
> diganose or monitoring purpose, userspace could use it to understand
> what happens in the application. Since userspace could dirty MADV_FREE
> pages without notice from kernel, this interface is the only place we
> can get accurate accounting info about MADV_FREE pages.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
Acked-by: Minchan Kim <minchan@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-24  3:29     ` Hillf Danton
  -1 siblings, 0 replies; 54+ messages in thread
From: Hillf Danton @ 2017-02-24  3:29 UTC (permalink / raw)
  To: 'Shaohua Li', linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

On February 23, 2017 2:51 AM Shaohua Li wrote: 
> 
> Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
> always have the flag set if we want to do an unmap. For cases we don't
> do an unmap, the TTU_LZFREE part of code should never run.
> 
> Also the TTU_UNMAP is unnecessary. If no other flags set (for
> example, TTU_MIGRATION), an unmap is implied.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---

Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 1/6] mm: delete unnecessary TTU_* flags
@ 2017-02-24  3:29     ` Hillf Danton
  0 siblings, 0 replies; 54+ messages in thread
From: Hillf Danton @ 2017-02-24  3:29 UTC (permalink / raw)
  To: 'Shaohua Li', linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

On February 23, 2017 2:51 AM Shaohua Li wrote: 
> 
> Johannes pointed out TTU_LZFREE is unnecessary. It's true because we
> always have the flag set if we want to do an unmap. For cases we don't
> do an unmap, the TTU_LZFREE part of code should never run.
> 
> Also the TTU_UNMAP is unnecessary. If no other flags set (for
> example, TTU_MIGRATION), an unmap is implied.
> 
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---

Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
  2017-02-24  2:12     ` Minchan Kim
@ 2017-02-24  6:14       ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-24  6:14 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Fri, Feb 24, 2017 at 11:12:18AM +0900, Minchan Kim wrote:
> On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> > When memory pressure is high, we free MADV_FREE pages. If the pages are
> > not dirty in pte, the pages could be freed immediately. Otherwise we
> > can't reclaim them. We put the pages back to anonumous LRU list (by
> > setting SwapBacked flag) and the pages will be reclaimed in normal
> > swapout way.
> > 
> > We use normal page reclaim policy. Since MADV_FREE pages are put into
> > inactive file list, such pages and inactive file pages are reclaimed
> > according to their age. This is expected, because we don't want to
> > reclaim too many MADV_FREE pages before used once pages.
> > 
> > Based on Minchan's original patch
> > 
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > ---
> >  include/linux/rmap.h |  2 +-
> >  mm/huge_memory.c     |  2 ++
> >  mm/madvise.c         |  1 +
> >  mm/rmap.c            | 10 ++++++++--
> >  mm/vmscan.c          | 34 ++++++++++++++++++++++------------
> >  5 files changed, 34 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index e2cd8f9..2bfd8c6 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -300,6 +300,6 @@ static inline int page_mkclean(struct page *page)
> >  #define SWAP_AGAIN	1
> >  #define SWAP_FAIL	2
> >  #define SWAP_MLOCK	3
> > -#define SWAP_LZFREE	4
> > +#define SWAP_DIRTY	4
> 
> Could you write down about SWAP_DIRTY in try_to_unmap's description?
> 
> < snip >
> 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index c621088..083f32e 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >  				dec_mm_counter(mm, MM_ANONPAGES);
> >  				rp->lazyfreed++;
> >  				goto discard;
> > +			} else if (!PageSwapBacked(page)) {
> > +				/* dirty MADV_FREE page */
> > +				set_pte_at(mm, address, pvmw.pte, pteval);
> > +				ret = SWAP_DIRTY;
> > +				page_vma_mapped_walk_done(&pvmw);
> > +				break;
> >  			}
> >  
> >  			if (swap_duplicate(entry) < 0) {
> > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> >  		ret = SWAP_SUCCESS;
> > -		if (rp.lazyfreed && !PageDirty(page))
> > -			ret = SWAP_LZFREE;
> > +		if (rp.lazyfreed && PageDirty(page))
> > +			ret = SWAP_DIRTY;
> 
> Hmm, I don't understand why we need to introduce new return value.
> Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?

Original idea in my mind is to activate page in SWAP_DIRTY but not activate
page in SWAP_FAIL for other failures. But later we choose to ignore all corner
cases and always activate pages for all failures. So you are right, we don't
need the new return value right now.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
@ 2017-02-24  6:14       ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-24  6:14 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Fri, Feb 24, 2017 at 11:12:18AM +0900, Minchan Kim wrote:
> On Wed, Feb 22, 2017 at 10:50:42AM -0800, Shaohua Li wrote:
> > When memory pressure is high, we free MADV_FREE pages. If the pages are
> > not dirty in pte, the pages could be freed immediately. Otherwise we
> > can't reclaim them. We put the pages back to anonumous LRU list (by
> > setting SwapBacked flag) and the pages will be reclaimed in normal
> > swapout way.
> > 
> > We use normal page reclaim policy. Since MADV_FREE pages are put into
> > inactive file list, such pages and inactive file pages are reclaimed
> > according to their age. This is expected, because we don't want to
> > reclaim too many MADV_FREE pages before used once pages.
> > 
> > Based on Minchan's original patch
> > 
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > ---
> >  include/linux/rmap.h |  2 +-
> >  mm/huge_memory.c     |  2 ++
> >  mm/madvise.c         |  1 +
> >  mm/rmap.c            | 10 ++++++++--
> >  mm/vmscan.c          | 34 ++++++++++++++++++++++------------
> >  5 files changed, 34 insertions(+), 15 deletions(-)
> > 
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index e2cd8f9..2bfd8c6 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -300,6 +300,6 @@ static inline int page_mkclean(struct page *page)
> >  #define SWAP_AGAIN	1
> >  #define SWAP_FAIL	2
> >  #define SWAP_MLOCK	3
> > -#define SWAP_LZFREE	4
> > +#define SWAP_DIRTY	4
> 
> Could you write down about SWAP_DIRTY in try_to_unmap's description?
> 
> < snip >
> 
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index c621088..083f32e 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -1424,6 +1424,12 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> >  				dec_mm_counter(mm, MM_ANONPAGES);
> >  				rp->lazyfreed++;
> >  				goto discard;
> > +			} else if (!PageSwapBacked(page)) {
> > +				/* dirty MADV_FREE page */
> > +				set_pte_at(mm, address, pvmw.pte, pteval);
> > +				ret = SWAP_DIRTY;
> > +				page_vma_mapped_walk_done(&pvmw);
> > +				break;
> >  			}
> >  
> >  			if (swap_duplicate(entry) < 0) {
> > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> >  		ret = SWAP_SUCCESS;
> > -		if (rp.lazyfreed && !PageDirty(page))
> > -			ret = SWAP_LZFREE;
> > +		if (rp.lazyfreed && PageDirty(page))
> > +			ret = SWAP_DIRTY;
> 
> Hmm, I don't understand why we need to introduce new return value.
> Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?

Original idea in my mind is to activate page in SWAP_DIRTY but not activate
page in SWAP_FAIL for other failures. But later we choose to ignore all corner
cases and always activate pages for all failures. So you are right, we don't
need the new return value right now.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-24  1:49     ` Minchan Kim
@ 2017-02-24  6:15       ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-24  6:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Fri, Feb 24, 2017 at 10:49:39AM +0900, Minchan Kim wrote:
> On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> > pages, but they can be freed without pageout. To destinguish them
> > against normal anonymous pages, we clear their SwapBacked flag.
> > 
> > MADV_FREE pages could be freed without pageout, so they pretty much like
> > used once file pages. For such pages, we'd like to reclaim them once
> > there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> > pages always before used once file pages and we definitively want to
> > reclaim the pages before other anonymous and file pages.
> > 
> > To speed up MADV_FREE pages reclaim, we put the pages into
> > LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> > nowadays and should be full of used once file pages. Reclaiming
> > MADV_FREE pages will not have much interfere of anonymous and active
> > file pages. And the inactive file pages and MADV_FREE pages will be
> > reclaimed according to their age, so we don't reclaim too many MADV_FREE
> > pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> > means we can reclaim the pages without swap support. This idea is
> > suggested by Johannes.
> > 
> > This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> > avoid bisect failure, next patch will do it.
> > 
> > The patch is based on Minchan's original patch.
> > 
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> 
> Other than that Johannes pointed out, code itself looks good to me.
> However, I hope to merge this patch with next one.
> It's enough simple to merge, change behavior(about deactivation),
> mark_page_lazyfree is introduced but there is no callsite to use it
> in this patch.
> 
> I don't think it's worth to separate.

I think it's more clear in this way, doing one thing in one patch.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-24  6:15       ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-24  6:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

On Fri, Feb 24, 2017 at 10:49:39AM +0900, Minchan Kim wrote:
> On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> > pages, but they can be freed without pageout. To destinguish them
> > against normal anonymous pages, we clear their SwapBacked flag.
> > 
> > MADV_FREE pages could be freed without pageout, so they pretty much like
> > used once file pages. For such pages, we'd like to reclaim them once
> > there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> > pages always before used once file pages and we definitively want to
> > reclaim the pages before other anonymous and file pages.
> > 
> > To speed up MADV_FREE pages reclaim, we put the pages into
> > LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> > nowadays and should be full of used once file pages. Reclaiming
> > MADV_FREE pages will not have much interfere of anonymous and active
> > file pages. And the inactive file pages and MADV_FREE pages will be
> > reclaimed according to their age, so we don't reclaim too many MADV_FREE
> > pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> > means we can reclaim the pages without swap support. This idea is
> > suggested by Johannes.
> > 
> > This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> > avoid bisect failure, next patch will do it.
> > 
> > The patch is based on Minchan's original patch.
> > 
> > Cc: Michal Hocko <mhocko@suse.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> 
> Other than that Johannes pointed out, code itself looks good to me.
> However, I hope to merge this patch with next one.
> It's enough simple to merge, change behavior(about deactivation),
> mark_page_lazyfree is introduced but there is no callsite to use it
> in this patch.
> 
> I don't think it's worth to separate.

I think it's more clear in this way, doing one thing in one patch.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
  2017-02-24  2:12     ` Minchan Kim
@ 2017-02-24 15:36       ` Johannes Weiner
  -1 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-24 15:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman, akpm

On Fri, Feb 24, 2017 at 11:12:18AM +0900, Minchan Kim wrote:
> > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> >  		ret = SWAP_SUCCESS;
> > -		if (rp.lazyfreed && !PageDirty(page))
> > -			ret = SWAP_LZFREE;
> > +		if (rp.lazyfreed && PageDirty(page))
> > +			ret = SWAP_DIRTY;
> 
> Hmm, I don't understand why we need to introduce new return value.
> Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?

I think that's a bad idea. A function called "try_to_unmap" shouldn't
have as a side effect that it changes the page's LRU type in an error
case. Let try_to_unmap be about unmapping the page. If it fails, make
it report why and let the caller deal with the fallout. Any LRU fixups
are much better placed in vmscan.c.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
@ 2017-02-24 15:36       ` Johannes Weiner
  0 siblings, 0 replies; 54+ messages in thread
From: Johannes Weiner @ 2017-02-24 15:36 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman, akpm

On Fri, Feb 24, 2017 at 11:12:18AM +0900, Minchan Kim wrote:
> > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> >  
> >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> >  		ret = SWAP_SUCCESS;
> > -		if (rp.lazyfreed && !PageDirty(page))
> > -			ret = SWAP_LZFREE;
> > +		if (rp.lazyfreed && PageDirty(page))
> > +			ret = SWAP_DIRTY;
> 
> Hmm, I don't understand why we need to introduce new return value.
> Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?

I think that's a bad idea. A function called "try_to_unmap" shouldn't
have as a side effect that it changes the page's LRU type in an error
case. Let try_to_unmap be about unmapping the page. If it fails, make
it report why and let the caller deal with the fallout. Any LRU fixups
are much better placed in vmscan.c.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
  2017-02-22 18:50   ` Shaohua Li
@ 2017-02-24 17:08     ` Dave Hansen
  -1 siblings, 0 replies; 54+ messages in thread
From: Dave Hansen @ 2017-02-24 17:08 UTC (permalink / raw)
  To: Shaohua Li, linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

On 02/22/2017 10:50 AM, Shaohua Li wrote:
> @@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   "Private_Dirty:  %8lu kB\n"
>  		   "Referenced:     %8lu kB\n"
>  		   "Anonymous:      %8lu kB\n"
> +		   "LazyFree:       %8lu kB\n"
>  		   "AnonHugePages:  %8lu kB\n"
>  		   "ShmemPmdMapped: %8lu kB\n"
>  		   "Shared_Hugetlb: %8lu kB\n"

I've been as guily of this in the past as anyone, but are we just going
to keep adding fields to smaps forever?  For the vast, vast, majority of
folks, this will simply waste the 21 bytes * nr_vmas that it costs us to
print "LazyFree:       0 kB\n" over and over.

Should we maybe start a habit of not printing an entry when it's "0 kB"?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
@ 2017-02-24 17:08     ` Dave Hansen
  0 siblings, 0 replies; 54+ messages in thread
From: Dave Hansen @ 2017-02-24 17:08 UTC (permalink / raw)
  To: Shaohua Li, linux-mm, linux-kernel
  Cc: Kernel-team, mhocko, minchan, hughd, hannes, riel, mgorman, akpm

On 02/22/2017 10:50 AM, Shaohua Li wrote:
> @@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   "Private_Dirty:  %8lu kB\n"
>  		   "Referenced:     %8lu kB\n"
>  		   "Anonymous:      %8lu kB\n"
> +		   "LazyFree:       %8lu kB\n"
>  		   "AnonHugePages:  %8lu kB\n"
>  		   "ShmemPmdMapped: %8lu kB\n"
>  		   "Shared_Hugetlb: %8lu kB\n"

I've been as guily of this in the past as anyone, but are we just going
to keep adding fields to smaps forever?  For the vast, vast, majority of
folks, this will simply waste the 21 bytes * nr_vmas that it costs us to
print "LazyFree:       0 kB\n" over and over.

Should we maybe start a habit of not printing an entry when it's "0 kB"?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
  2017-02-24 17:08     ` Dave Hansen
@ 2017-02-24 21:47       ` Shaohua Li
  -1 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-24 21:47 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	hannes, riel, mgorman, akpm

On Fri, Feb 24, 2017 at 09:08:30AM -0800, Dave Hansen wrote:
> On 02/22/2017 10:50 AM, Shaohua Li wrote:
> > @@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
> >  		   "Private_Dirty:  %8lu kB\n"
> >  		   "Referenced:     %8lu kB\n"
> >  		   "Anonymous:      %8lu kB\n"
> > +		   "LazyFree:       %8lu kB\n"
> >  		   "AnonHugePages:  %8lu kB\n"
> >  		   "ShmemPmdMapped: %8lu kB\n"
> >  		   "Shared_Hugetlb: %8lu kB\n"
> 
> I've been as guily of this in the past as anyone, but are we just going
> to keep adding fields to smaps forever?  For the vast, vast, majority of
> folks, this will simply waste the 21 bytes * nr_vmas that it costs us to
> print "LazyFree:       0 kB\n" over and over.
> 
> Should we maybe start a habit of not printing an entry when it's "0 kB"?

Interesting idea! I'd like this is a separate patch if we go this way, because
this is likely to be controversial. That said, sounds there is no reason we
shouldn't do this.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps
@ 2017-02-24 21:47       ` Shaohua Li
  0 siblings, 0 replies; 54+ messages in thread
From: Shaohua Li @ 2017-02-24 21:47 UTC (permalink / raw)
  To: Dave Hansen
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, minchan, hughd,
	hannes, riel, mgorman, akpm

On Fri, Feb 24, 2017 at 09:08:30AM -0800, Dave Hansen wrote:
> On 02/22/2017 10:50 AM, Shaohua Li wrote:
> > @@ -770,6 +774,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
> >  		   "Private_Dirty:  %8lu kB\n"
> >  		   "Referenced:     %8lu kB\n"
> >  		   "Anonymous:      %8lu kB\n"
> > +		   "LazyFree:       %8lu kB\n"
> >  		   "AnonHugePages:  %8lu kB\n"
> >  		   "ShmemPmdMapped: %8lu kB\n"
> >  		   "Shared_Hugetlb: %8lu kB\n"
> 
> I've been as guily of this in the past as anyone, but are we just going
> to keep adding fields to smaps forever?  For the vast, vast, majority of
> folks, this will simply waste the 21 bytes * nr_vmas that it costs us to
> print "LazyFree:       0 kB\n" over and over.
> 
> Should we maybe start a habit of not printing an entry when it's "0 kB"?

Interesting idea! I'd like this is a separate patch if we go this way, because
this is likely to be controversial. That said, sounds there is no reason we
shouldn't do this.

Thanks,
Shaohua

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
  2017-02-24 15:36       ` Johannes Weiner
@ 2017-02-24 23:26         ` Minchan Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24 23:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman, akpm

Hi Johannes,

On Fri, Feb 24, 2017 at 10:36:55AM -0500, Johannes Weiner wrote:
> On Fri, Feb 24, 2017 at 11:12:18AM +0900, Minchan Kim wrote:
> > > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> > >  
> > >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> > >  		ret = SWAP_SUCCESS;
> > > -		if (rp.lazyfreed && !PageDirty(page))
> > > -			ret = SWAP_LZFREE;
> > > +		if (rp.lazyfreed && PageDirty(page))
> > > +			ret = SWAP_DIRTY;
> > 
> > Hmm, I don't understand why we need to introduce new return value.
> > Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?
> 
> I think that's a bad idea. A function called "try_to_unmap" shouldn't
> have as a side effect that it changes the page's LRU type in an error
> case. Let try_to_unmap be about unmapping the page. If it fails, make
> it report why and let the caller deal with the fallout. Any LRU fixups
> are much better placed in vmscan.c.

I don't think it's page's LRU type change. SetPageSwapBacked is just
indication that page is swappable or not.
Like mlock_vma_page in try_to_unmap_one, we can set SetPageSwapBacked
if we found the lazyfree page dirty. If we don't need to move dirty
lazyfree page to another LRU list, it would be better to not introduce
new return value in try_to_unmap.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 4/6] mm: reclaim MADV_FREE pages
@ 2017-02-24 23:26         ` Minchan Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24 23:26 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shaohua Li, linux-mm, linux-kernel, Kernel-team, mhocko, hughd,
	riel, mgorman, akpm

Hi Johannes,

On Fri, Feb 24, 2017 at 10:36:55AM -0500, Johannes Weiner wrote:
> On Fri, Feb 24, 2017 at 11:12:18AM +0900, Minchan Kim wrote:
> > > @@ -1525,8 +1531,8 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
> > >  
> > >  	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
> > >  		ret = SWAP_SUCCESS;
> > > -		if (rp.lazyfreed && !PageDirty(page))
> > > -			ret = SWAP_LZFREE;
> > > +		if (rp.lazyfreed && PageDirty(page))
> > > +			ret = SWAP_DIRTY;
> > 
> > Hmm, I don't understand why we need to introduce new return value.
> > Can't we set SetPageSwapBacked and return SWAP_FAIL in try_to_unmap_one?
> 
> I think that's a bad idea. A function called "try_to_unmap" shouldn't
> have as a side effect that it changes the page's LRU type in an error
> case. Let try_to_unmap be about unmapping the page. If it fails, make
> it report why and let the caller deal with the fallout. Any LRU fixups
> are much better placed in vmscan.c.

I don't think it's page's LRU type change. SetPageSwapBacked is just
indication that page is swappable or not.
Like mlock_vma_page in try_to_unmap_one, we can set SetPageSwapBacked
if we found the lazyfree page dirty. If we don't need to move dirty
lazyfree page to another LRU list, it would be better to not introduce
new return value in try_to_unmap.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
  2017-02-24  6:15       ` Shaohua Li
@ 2017-02-24 23:37         ` Minchan Kim
  -1 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24 23:37 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

Hi Shaohua,

On Thu, Feb 23, 2017 at 10:15:50PM -0800, Shaohua Li wrote:
> On Fri, Feb 24, 2017 at 10:49:39AM +0900, Minchan Kim wrote:
> > On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > > madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> > > pages, but they can be freed without pageout. To destinguish them
> > > against normal anonymous pages, we clear their SwapBacked flag.
> > > 
> > > MADV_FREE pages could be freed without pageout, so they pretty much like
> > > used once file pages. For such pages, we'd like to reclaim them once
> > > there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> > > pages always before used once file pages and we definitively want to
> > > reclaim the pages before other anonymous and file pages.
> > > 
> > > To speed up MADV_FREE pages reclaim, we put the pages into
> > > LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> > > nowadays and should be full of used once file pages. Reclaiming
> > > MADV_FREE pages will not have much interfere of anonymous and active
> > > file pages. And the inactive file pages and MADV_FREE pages will be
> > > reclaimed according to their age, so we don't reclaim too many MADV_FREE
> > > pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> > > means we can reclaim the pages without swap support. This idea is
> > > suggested by Johannes.
> > > 
> > > This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> > > avoid bisect failure, next patch will do it.
> > > 
> > > The patch is based on Minchan's original patch.
> > > 
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Minchan Kim <minchan@kernel.org>
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Cc: Rik van Riel <riel@redhat.com>
> > > Cc: Mel Gorman <mgorman@techsingularity.net>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Shaohua Li <shli@fb.com>
> > 
> > Other than that Johannes pointed out, code itself looks good to me.
> > However, I hope to merge this patch with next one.
> > It's enough simple to merge, change behavior(about deactivation),
> > mark_page_lazyfree is introduced but there is no callsite to use it
> > in this patch.
> > 
> > I don't think it's worth to separate.
> 
> I think it's more clear in this way, doing one thing in one patch.

There are several times to prevent it that introduce new function
*here* and use it *there*. One of example from Johannes:

https://marc.info/?l=linux-mm&m=147430500910960&w=2

I don't understand why this case is okay.
Nomally, it's anti-pattern for git-bisect which adds uselss bisect
point. Even, if it were good for review, I might agree but this
case is not that, too.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list
@ 2017-02-24 23:37         ` Minchan Kim
  0 siblings, 0 replies; 54+ messages in thread
From: Minchan Kim @ 2017-02-24 23:37 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-mm, linux-kernel, Kernel-team, mhocko, hughd, hannes, riel,
	mgorman, akpm

Hi Shaohua,

On Thu, Feb 23, 2017 at 10:15:50PM -0800, Shaohua Li wrote:
> On Fri, Feb 24, 2017 at 10:49:39AM +0900, Minchan Kim wrote:
> > On Wed, Feb 22, 2017 at 10:50:41AM -0800, Shaohua Li wrote:
> > > madv MADV_FREE indicate pages are 'lazyfree'. They are still anonymous
> > > pages, but they can be freed without pageout. To destinguish them
> > > against normal anonymous pages, we clear their SwapBacked flag.
> > > 
> > > MADV_FREE pages could be freed without pageout, so they pretty much like
> > > used once file pages. For such pages, we'd like to reclaim them once
> > > there is memory pressure. Also it might be unfair reclaiming MADV_FREE
> > > pages always before used once file pages and we definitively want to
> > > reclaim the pages before other anonymous and file pages.
> > > 
> > > To speed up MADV_FREE pages reclaim, we put the pages into
> > > LRU_INACTIVE_FILE list. The rationale is LRU_INACTIVE_FILE list is tiny
> > > nowadays and should be full of used once file pages. Reclaiming
> > > MADV_FREE pages will not have much interfere of anonymous and active
> > > file pages. And the inactive file pages and MADV_FREE pages will be
> > > reclaimed according to their age, so we don't reclaim too many MADV_FREE
> > > pages too. Putting the MADV_FREE pages into LRU_INACTIVE_FILE_LIST also
> > > means we can reclaim the pages without swap support. This idea is
> > > suggested by Johannes.
> > > 
> > > This patch doesn't move MADV_FREE pages to LRU_INACTIVE_FILE list yet to
> > > avoid bisect failure, next patch will do it.
> > > 
> > > The patch is based on Minchan's original patch.
> > > 
> > > Cc: Michal Hocko <mhocko@suse.com>
> > > Cc: Minchan Kim <minchan@kernel.org>
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Cc: Rik van Riel <riel@redhat.com>
> > > Cc: Mel Gorman <mgorman@techsingularity.net>
> > > Cc: Andrew Morton <akpm@linux-foundation.org>
> > > Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Shaohua Li <shli@fb.com>
> > 
> > Other than that Johannes pointed out, code itself looks good to me.
> > However, I hope to merge this patch with next one.
> > It's enough simple to merge, change behavior(about deactivation),
> > mark_page_lazyfree is introduced but there is no callsite to use it
> > in this patch.
> > 
> > I don't think it's worth to separate.
> 
> I think it's more clear in this way, doing one thing in one patch.

There are several times to prevent it that introduce new function
*here* and use it *there*. One of example from Johannes:

https://marc.info/?l=linux-mm&m=147430500910960&w=2

I don't understand why this case is okay.
Nomally, it's anti-pattern for git-bisect which adds uselss bisect
point. Even, if it were good for review, I might agree but this
case is not that, too.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2017-02-24 23:38 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-22 18:50 [PATCH V4 0/6] mm: fix some MADV_FREE issues Shaohua Li
2017-02-22 18:50 ` Shaohua Li
2017-02-22 18:50 ` [PATCH V4 1/6] mm: delete unnecessary TTU_* flags Shaohua Li
2017-02-22 18:50   ` Shaohua Li
2017-02-23 15:35   ` Johannes Weiner
2017-02-23 15:35     ` Johannes Weiner
2017-02-24  1:25   ` Minchan Kim
2017-02-24  1:25     ` Minchan Kim
2017-02-24  3:29   ` Hillf Danton
2017-02-24  3:29     ` Hillf Danton
2017-02-22 18:50 ` [PATCH V4 2/6] mm: don't assume anonymous pages have SwapBacked flag Shaohua Li
2017-02-22 18:50   ` Shaohua Li
2017-02-22 18:50 ` [PATCH V4 3/6] mm: move MADV_FREE pages into LRU_INACTIVE_FILE list Shaohua Li
2017-02-22 18:50   ` Shaohua Li
2017-02-23 15:58   ` Johannes Weiner
2017-02-23 15:58     ` Johannes Weiner
2017-02-23 16:26     ` Shaohua Li
2017-02-23 16:26       ` Shaohua Li
2017-02-23 18:22       ` Johannes Weiner
2017-02-23 18:22         ` Johannes Weiner
2017-02-23 19:04         ` Shaohua Li
2017-02-23 19:04           ` Shaohua Li
2017-02-24  1:49   ` Minchan Kim
2017-02-24  1:49     ` Minchan Kim
2017-02-24  6:15     ` Shaohua Li
2017-02-24  6:15       ` Shaohua Li
2017-02-24 23:37       ` Minchan Kim
2017-02-24 23:37         ` Minchan Kim
2017-02-22 18:50 ` [PATCH V4 4/6] mm: reclaim MADV_FREE pages Shaohua Li
2017-02-22 18:50   ` Shaohua Li
2017-02-23 16:13   ` Johannes Weiner
2017-02-23 16:13     ` Johannes Weiner
2017-02-23 17:19     ` Shaohua Li
2017-02-23 17:19       ` Shaohua Li
2017-02-24  2:12   ` Minchan Kim
2017-02-24  2:12     ` Minchan Kim
2017-02-24  6:14     ` Shaohua Li
2017-02-24  6:14       ` Shaohua Li
2017-02-24 15:36     ` Johannes Weiner
2017-02-24 15:36       ` Johannes Weiner
2017-02-24 23:26       ` Minchan Kim
2017-02-24 23:26         ` Minchan Kim
2017-02-22 18:50 ` [PATCH V4 5/6] mm: enable MADV_FREE for swapless system Shaohua Li
2017-02-22 18:50   ` Shaohua Li
2017-02-22 18:50 ` [PATCH V4 6/6] proc: show MADV_FREE pages info in smaps Shaohua Li
2017-02-22 18:50   ` Shaohua Li
2017-02-23 16:16   ` Johannes Weiner
2017-02-23 16:16     ` Johannes Weiner
2017-02-24  2:13   ` Minchan Kim
2017-02-24  2:13     ` Minchan Kim
2017-02-24 17:08   ` Dave Hansen
2017-02-24 17:08     ` Dave Hansen
2017-02-24 21:47     ` Shaohua Li
2017-02-24 21:47       ` Shaohua Li

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.