linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] per-process reclaim
@ 2016-06-13  7:50 Minchan Kim
  2016-06-13  7:50 ` [PATCH v1 1/3] mm: vmscan: refactoring force_reclaim Minchan Kim
                   ` (4 more replies)
  0 siblings, 5 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-13  7:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Rik van Riel, Minchan Kim, Redmond,
	ZhaoJunmin Zhao(Junmin),
	Vinayak Menon, Juneho Choi, Sangwoo Park, Chan Gyun Jeong

Hi all,

http://thread.gmane.org/gmane.linux.kernel/1480728

I sent per-process reclaim patchset three years ago. Then, last
feedback from akpm was that he want to know real usecase scenario.

Since then, I got question from several embedded people of various
company "why it's not merged into mainline" and heard they have used
the feature as in-house patch and recenlty, I noticed android from
Qualcomm started to use it.

Of course, our product have used it and released it in real procuct.

Quote from Sangwoo Park <angwoo2.park@lge.com>
Thanks for the data, Sangwoo!
"
- Test scenaro
  - platform: android
  - target: MSM8952, 2G DDR, 16G eMMC
  - scenario
    retry app launch and Back Home with 16 apps and 16 turns
    (total app launch count is 256)
  - result:
			  resume count   |  cold launching count
-----------------------------------------------------------------
 vanilla           |           85        |          171
 perproc reclaim   |           184       |           72
"

Higher resume count is better because cold launching needs loading
lots of resource data which takes above 15 ~ 20 seconds for some
games while successful resume just takes 1~5 second.

As perproc reclaim way with new management policy, we could reduce
cold launching a lot(i.e., 171-72) so that it reduces app startup
a lot.

Another useful function from this feature is to make swapout easily
which is useful for testing swapout stress and workloads.

Thanks.

Cc: Redmond <u93410091@gmail.com>
Cc: ZhaoJunmin Zhao(Junmin) <zhaojunmin@huawei.com>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Juneho Choi <juno.choi@lge.com>
Cc: Sangwoo Park <sangwoo2.park@lge.com>
Cc: Chan Gyun Jeong <chan.jeong@lge.com>

Minchan Kim (3):
  mm: vmscan: refactoring force_reclaim
  mm: vmscan: shrink_page_list with multiple zones
  mm: per-process reclaim

 Documentation/filesystems/proc.txt |  15 ++++
 fs/proc/base.c                     |   1 +
 fs/proc/internal.h                 |   1 +
 fs/proc/task_mmu.c                 | 149 +++++++++++++++++++++++++++++++++++++
 include/linux/rmap.h               |   4 +
 mm/vmscan.c                        |  85 ++++++++++++++++-----
 6 files changed, 235 insertions(+), 20 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v1 1/3] mm: vmscan: refactoring force_reclaim
  2016-06-13  7:50 [PATCH v1 0/3] per-process reclaim Minchan Kim
@ 2016-06-13  7:50 ` Minchan Kim
  2016-06-13  7:50 ` [PATCH v1 2/3] mm: vmscan: shrink_page_list with multiple zones Minchan Kim
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-13  7:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Rik van Riel, Minchan Kim

local variable references has PAGEREF_RECLAIM_CLEAN default value
so that it can prevent dirty page writeout effectively for reclaim
latency(introduced by [1]) if force_reclaim is true.

However, it is irony because user wanted *force reclaim* but
we prohibit dirty page writeout.

Let's make it more clear.
This patch is refactoring so it shouldn't change any behavior.

[1] <02c6de8d757c, mm: cma: discard clean pages during contiguous
allocation instead of migration>

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 21d417ccff69..05119983c92e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -95,6 +95,9 @@ struct scan_control {
 	/* Can cgroups be reclaimed below their normal consumption range? */
 	unsigned int may_thrash:1;
 
+	/* reclaim pages unconditionally */
+	unsigned int force_reclaim:1;
+
 	unsigned int hibernation_mode:1;
 
 	/* One of the zones is ready for compaction */
@@ -783,6 +786,7 @@ void putback_lru_page(struct page *page)
 }
 
 enum page_references {
+	PAGEREF_NONE,
 	PAGEREF_RECLAIM,
 	PAGEREF_RECLAIM_CLEAN,
 	PAGEREF_KEEP,
@@ -884,8 +888,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 				      unsigned long *ret_nr_unqueued_dirty,
 				      unsigned long *ret_nr_congested,
 				      unsigned long *ret_nr_writeback,
-				      unsigned long *ret_nr_immediate,
-				      bool force_reclaim)
+				      unsigned long *ret_nr_immediate)
 {
 	LIST_HEAD(ret_pages);
 	LIST_HEAD(free_pages);
@@ -903,7 +906,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		struct address_space *mapping;
 		struct page *page;
 		int may_enter_fs;
-		enum page_references references = PAGEREF_RECLAIM_CLEAN;
+		enum page_references references = PAGEREF_NONE;
 		bool dirty, writeback;
 		bool lazyfree = false;
 		int ret = SWAP_SUCCESS;
@@ -927,13 +930,15 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (!sc->may_unmap && page_mapped(page))
 			goto keep_locked;
 
-		/* Double the slab pressure for mapped and swapcache pages */
-		if (page_mapped(page) || PageSwapCache(page))
-			sc->nr_scanned++;
-
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
 			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
+		if (sc->force_reclaim)
+			goto force_reclaim;
+
+		/* Double the slab pressure for mapped and swapcache pages */
+		if (page_mapped(page) || PageSwapCache(page))
+			sc->nr_scanned++;
 		/*
 		 * The number of dirty pages determines if a zone is marked
 		 * reclaim_congested which affects wait_iff_congested. kswapd
@@ -1028,19 +1033,18 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-		if (!force_reclaim)
-			references = page_check_references(page, sc);
+		references = page_check_references(page, sc);
 
 		switch (references) {
 		case PAGEREF_ACTIVATE:
 			goto activate_locked;
 		case PAGEREF_KEEP:
 			goto keep_locked;
-		case PAGEREF_RECLAIM:
-		case PAGEREF_RECLAIM_CLEAN:
-			; /* try to reclaim the page below */
+		default:
+			break; /* try to reclaim the page below */
 		}
 
+force_reclaim:
 		/*
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
@@ -1253,6 +1257,8 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 		.gfp_mask = GFP_KERNEL,
 		.priority = DEF_PRIORITY,
 		.may_unmap = 1,
+		.may_writepage = 0,
+		.force_reclaim = 1,
 	};
 	unsigned long ret, dummy1, dummy2, dummy3, dummy4, dummy5;
 	struct page *page, *next;
@@ -1268,7 +1274,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 
 	ret = shrink_page_list(&clean_pages, zone, &sc,
 			TTU_UNMAP|TTU_IGNORE_ACCESS,
-			&dummy1, &dummy2, &dummy3, &dummy4, &dummy5, true);
+			&dummy1, &dummy2, &dummy3, &dummy4, &dummy5);
 	list_splice(&clean_pages, page_list);
 	mod_zone_page_state(zone, NR_ISOLATED_FILE, -ret);
 	return ret;
@@ -1623,8 +1629,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 
 	nr_reclaimed = shrink_page_list(&page_list, zone, sc, TTU_UNMAP,
 				&nr_dirty, &nr_unqueued_dirty, &nr_congested,
-				&nr_writeback, &nr_immediate,
-				false);
+				&nr_writeback, &nr_immediate);
 
 	spin_lock_irq(&zone->lru_lock);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 2/3] mm: vmscan: shrink_page_list with multiple zones
  2016-06-13  7:50 [PATCH v1 0/3] per-process reclaim Minchan Kim
  2016-06-13  7:50 ` [PATCH v1 1/3] mm: vmscan: refactoring force_reclaim Minchan Kim
@ 2016-06-13  7:50 ` Minchan Kim
  2016-06-13  7:50 ` [PATCH v1 3/3] mm: per-process reclaim Minchan Kim
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-13  7:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Rik van Riel, Minchan Kim

We have been reclaimed pages per zone but upcoming patch will
pass pages from multiple zones into shrink_page_list so this patch
prepares it.

Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 mm/vmscan.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 05119983c92e..d20c9e863d35 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -881,7 +881,6 @@ static void page_check_dirty_writeback(struct page *page,
  * shrink_page_list() returns the number of reclaimed pages
  */
 static unsigned long shrink_page_list(struct list_head *page_list,
-				      struct zone *zone,
 				      struct scan_control *sc,
 				      enum ttu_flags ttu_flags,
 				      unsigned long *ret_nr_dirty,
@@ -910,6 +909,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		bool dirty, writeback;
 		bool lazyfree = false;
 		int ret = SWAP_SUCCESS;
+		struct zone *zone;
 
 		cond_resched();
 
@@ -919,8 +919,8 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		if (!trylock_page(page))
 			goto keep;
 
+		zone = page_zone(page);
 		VM_BUG_ON_PAGE(PageActive(page), page);
-		VM_BUG_ON_PAGE(page_zone(page) != zone, page);
 
 		sc->nr_scanned++;
 
@@ -933,6 +933,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		may_enter_fs = (sc->gfp_mask & __GFP_FS) ||
 			(PageSwapCache(page) && (sc->gfp_mask & __GFP_IO));
 
+		mapping = page_mapping(page);
 		if (sc->force_reclaim)
 			goto force_reclaim;
 
@@ -958,7 +959,6 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * pages marked for immediate reclaim are making it to the
 		 * end of the LRU a second time.
 		 */
-		mapping = page_mapping(page);
 		if (((dirty || writeback) && mapping &&
 		     inode_write_congested(mapping->host)) ||
 		    (writeback && PageReclaim(page)))
@@ -1272,7 +1272,7 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 		}
 	}
 
-	ret = shrink_page_list(&clean_pages, zone, &sc,
+	ret = shrink_page_list(&clean_pages, &sc,
 			TTU_UNMAP|TTU_IGNORE_ACCESS,
 			&dummy1, &dummy2, &dummy3, &dummy4, &dummy5);
 	list_splice(&clean_pages, page_list);
@@ -1627,7 +1627,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	if (nr_taken == 0)
 		return 0;
 
-	nr_reclaimed = shrink_page_list(&page_list, zone, sc, TTU_UNMAP,
+	nr_reclaimed = shrink_page_list(&page_list, sc, TTU_UNMAP,
 				&nr_dirty, &nr_unqueued_dirty, &nr_congested,
 				&nr_writeback, &nr_immediate);
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v1 3/3] mm: per-process reclaim
  2016-06-13  7:50 [PATCH v1 0/3] per-process reclaim Minchan Kim
  2016-06-13  7:50 ` [PATCH v1 1/3] mm: vmscan: refactoring force_reclaim Minchan Kim
  2016-06-13  7:50 ` [PATCH v1 2/3] mm: vmscan: shrink_page_list with multiple zones Minchan Kim
@ 2016-06-13  7:50 ` Minchan Kim
  2016-06-13 15:06   ` Johannes Weiner
  2016-06-13 17:06   ` Rik van Riel
  2016-06-13 11:50 ` [PATCH v1 0/3] " Chen Feng
  2016-06-13 13:29 ` Vinayak Menon
  4 siblings, 2 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-13  7:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Rik van Riel, Minchan Kim, Sangwoo Park

These day, there are many platforms available in the embedded market
and sometime, they has more hints about workingset than kernel so
they want to involve memory management more heavily like android's
lowmemory killer and ashmem or user-daemon with lowmemory notifier.

This patch adds add new method for userspace to manage memory
efficiently via knob "/proc/<pid>/reclaim" so platform can reclaim
any process anytime.

One of useful usecase is to avoid process killing for getting free
memory in android, which was really terrible experience because I
lost my best score of game I had ever after I switch the phone call
while I enjoyed the game as well as slow start-up by cold launching.

Our product have used it in real procuct.

Quote from Sangwoo Park <angwoo2.park@lge.com>
Thanks for the data, Sangwoo!
"
- Test scenaro
  - platform: android
  - target: MSM8952, 2G DDR, 16G eMMC
  - scenario
    retry app launch and Back Home with 16 apps and 16 turns
    (total app launch count is 256)
  - result:
			  resume count   |  cold launching count
-----------------------------------------------------------------
 vanilla           |           85        |          171
 perproc reclaim   |           184       |           72
"

Higher resume count is better because cold launching needs loading
lots of resource data which takes above 15 ~ 20 seconds for some
games while successful resume just takes 1~5 second.

As perproc reclaim way with new management policy, we could reduce
cold launching a lot(i.e., 171-72) so that it reduces app startup
a lot.

Another useful function from this feature is to make swapout easily
which is useful for testing swapout stress and workloads.

Interface:

Reclaim file-backed pages only.
	echo 1 > /proc/<pid>/reclaim
Reclaim anonymous pages only.
	echo 2 > /proc/<pid>/reclaim
Reclaim all pages
	echo 3 > /proc/<pid>/reclaim

bit 1 : file, bit 2 : anon, bit 1 & 2 : all

Note:
If a page is shared by other processes(i.e., page_mapcount(page) > 1),
it couldn't be reclaimed.

Cc: Sangwoo Park <sangwoo2.park@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/filesystems/proc.txt |  15 ++++
 fs/proc/base.c                     |   1 +
 fs/proc/internal.h                 |   1 +
 fs/proc/task_mmu.c                 | 149 +++++++++++++++++++++++++++++++++++++
 include/linux/rmap.h               |   4 +
 mm/vmscan.c                        |  40 ++++++++++
 6 files changed, 210 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 50fcf48f4d58..3b6adf370f3c 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -138,6 +138,7 @@ Table 1-1: Process specific entries in /proc
  maps		Memory maps to executables and library files	(2.4)
  mem		Memory held by this process
  root		Link to the root directory of this process
+ reclaim	Reclaim pages in this process
  stat		Process status
  statm		Process memory status information
  status		Process status in human readable form
@@ -536,6 +537,20 @@ To reset the peak resident set size ("high water mark") to the process's
 
 Any other value written to /proc/PID/clear_refs will have no effect.
 
+The file /proc/PID/reclaim is used to reclaim pages in this process.
+bit 1: file, bit 2: anon, bit 3: all
+
+To reclaim file-backed pages,
+    > echo 1 > /proc/PID/reclaim
+
+To reclaim anonymous pages,
+    > echo 2 > /proc/PID/reclaim
+
+To reclaim all pages,
+    > echo 3 > /proc/PID/reclaim
+
+If a page is shared by several processes, it cannot be reclaimed.
+
 The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
 using /proc/kpageflags and number of times a page is mapped using
 /proc/kpagecount. For detailed explanation, see Documentation/vm/pagemap.txt.
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 93e7754fd5b2..b957d929516d 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2848,6 +2848,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("mounts",     S_IRUGO, proc_mounts_operations),
 	REG("mountinfo",  S_IRUGO, proc_mountinfo_operations),
 	REG("mountstats", S_IRUSR, proc_mountstats_operations),
+	REG("reclaim", S_IWUSR, proc_reclaim_operations),
 #ifdef CONFIG_PROC_PAGE_MONITOR
 	REG("clear_refs", S_IWUSR, proc_clear_refs_operations),
 	REG("smaps",      S_IRUGO, proc_pid_smaps_operations),
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index aa2781095bd1..ef2b01533c97 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -209,6 +209,7 @@ struct pde_opener {
 extern const struct inode_operations proc_link_inode_operations;
 
 extern const struct inode_operations proc_pid_link_inode_operations;
+extern const struct file_operations proc_reclaim_operations;
 
 extern void proc_init_inodecache(void);
 extern struct inode *proc_get_inode(struct super_block *, struct proc_dir_entry *);
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 187d84ef9de9..31e4657f8fe9 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -11,6 +11,7 @@
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
 #include <linux/swap.h>
+#include <linux/mm_inline.h>
 #include <linux/swapops.h>
 #include <linux/mmu_notifier.h>
 #include <linux/page_idle.h>
@@ -1465,6 +1466,154 @@ const struct file_operations proc_pagemap_operations = {
 };
 #endif /* CONFIG_PROC_PAGE_MONITOR */
 
+static int reclaim_pte_range(pmd_t *pmd, unsigned long addr,
+				unsigned long end, struct mm_walk *walk)
+{
+	struct mm_struct *mm = walk->mm;
+	struct vm_area_struct *vma = walk->private;
+	pte_t *orig_pte, *pte, ptent;
+	spinlock_t *ptl;
+	struct page *page;
+	LIST_HEAD(page_list);
+	int isolated = 0;
+
+	split_huge_pmd(vma, pmd, addr);
+	if (pmd_trans_unstable(pmd))
+		return 0;
+
+	orig_pte = pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+	for (; addr != end; pte++, addr += PAGE_SIZE) {
+		ptent = *pte;
+
+		if (!pte_present(ptent))
+			continue;
+
+		page = vm_normal_page(vma, addr, ptent);
+		if (!page)
+			continue;
+
+		if (page_mapcount(page) != 1)
+			continue;
+
+		if (PageTransCompound(page)) {
+			get_page(page);
+			if (!trylock_page(page)) {
+				put_page(page);
+				goto out;
+			}
+			pte_unmap_unlock(orig_pte, ptl);
+
+			if (split_huge_page(page)) {
+				unlock_page(page);
+				put_page(page);
+				orig_pte = pte_offset_map_lock(mm, pmd,
+								addr, &ptl);
+				goto out;
+			}
+			put_page(page);
+			unlock_page(page);
+			pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+			pte--;
+			addr -= PAGE_SIZE;
+			continue;
+		}
+
+		VM_BUG_ON_PAGE(PageTransCompound(page), page);
+
+		if (isolate_lru_page(page))
+			continue;
+
+		list_add(&page->lru, &page_list);
+		inc_zone_page_state(page, NR_ISOLATED_ANON +
+					page_is_file_cache(page));
+		isolated++;
+		if (isolated >= SWAP_CLUSTER_MAX) {
+			pte_unmap_unlock(orig_pte, ptl);
+			reclaim_pages_from_list(&page_list);
+			isolated = 0;
+			cond_resched();
+			orig_pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
+		}
+	}
+
+out:
+	pte_unmap_unlock(orig_pte, ptl);
+	reclaim_pages_from_list(&page_list);
+
+	cond_resched();
+	return 0;
+}
+
+enum reclaim_type {
+	RECLAIM_FILE = 1,
+	RECLAIM_ANON,
+	RECLAIM_ALL,
+};
+
+static ssize_t reclaim_write(struct file *file, const char __user *buf,
+				size_t count, loff_t *ppos)
+{
+	struct task_struct *task;
+	char buffer[PROC_NUMBUF];
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	int itype;
+	int rv;
+	enum reclaim_type type;
+
+	memset(buffer, 0, sizeof(buffer));
+	if (count > sizeof(buffer) - 1)
+		count = sizeof(buffer) - 1;
+	if (copy_from_user(buffer, buf, count))
+		return -EFAULT;
+	rv = kstrtoint(strstrip(buffer), 10, &itype);
+	if (rv < 0)
+		return rv;
+	type = (enum reclaim_type)itype;
+	if (type < RECLAIM_FILE || type > RECLAIM_ALL)
+		return -EINVAL;
+
+	task = get_proc_task(file->f_path.dentry->d_inode);
+	if (!task)
+		return -ESRCH;
+
+	mm = get_task_mm(task);
+	if (mm) {
+		struct mm_walk reclaim_walk = {
+			.pmd_entry = reclaim_pte_range,
+			.mm = mm,
+		};
+
+		down_read(&mm->mmap_sem);
+		for (vma = mm->mmap; vma; vma = vma->vm_next) {
+			reclaim_walk.private = vma;
+
+			if (is_vm_hugetlb_page(vma))
+				continue;
+
+			if (!vma_is_anonymous(vma) && !(type & RECLAIM_FILE))
+				continue;
+
+			if (vma_is_anonymous(vma) && !(type & RECLAIM_ANON))
+				continue;
+
+			walk_page_range(vma->vm_start, vma->vm_end,
+					&reclaim_walk);
+		}
+		flush_tlb_mm(mm);
+		up_read(&mm->mmap_sem);
+		mmput(mm);
+	}
+	put_task_struct(task);
+
+	return count;
+}
+
+const struct file_operations proc_reclaim_operations = {
+	.write		= reclaim_write,
+	.llseek		= noop_llseek,
+};
+
 #ifdef CONFIG_NUMA
 
 struct numa_maps {
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 5704f101b52e..e90a21b78da3 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -10,6 +10,10 @@
 #include <linux/rwsem.h>
 #include <linux/memcontrol.h>
 
+extern int isolate_lru_page(struct page *page);
+extern void putback_lru_page(struct page *page);
+extern unsigned long reclaim_pages_from_list(struct list_head *page_list);
+
 /*
  * The anon_vma heads a list of private "related" vmas, to scan if
  * an anonymous page pointing to this anon_vma needs to be unmapped:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index d20c9e863d35..442866f77251 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1212,6 +1212,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		 * appear not as the counts should be low
 		 */
 		list_add(&page->lru, &free_pages);
+		/*
+		 * If pagelist are from multiple zones, we should decrease
+		 * NR_ISOLATED_ANON + x on freed pages in here.
+		 */
+		if (!zone)
+			dec_zone_page_state(page, NR_ISOLATED_ANON +
+					page_is_file_cache(page));
 		continue;
 
 cull_mlocked:
@@ -1280,6 +1287,39 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone,
 	return ret;
 }
 
+unsigned long reclaim_pages_from_list(struct list_head *page_list)
+{
+	struct scan_control sc = {
+		.gfp_mask = GFP_KERNEL,
+		.priority = DEF_PRIORITY,
+		.may_writepage = 1,
+		.may_unmap = 1,
+		.may_swap = 1,
+		.force_reclaim = 1,
+	};
+
+	unsigned long nr_reclaimed, dummy1, dummy2, dummy3, dummy4, dummy5;
+	struct page *page;
+
+	list_for_each_entry(page, page_list, lru)
+		ClearPageActive(page);
+
+	nr_reclaimed = shrink_page_list(page_list, &sc,
+					TTU_UNMAP|TTU_IGNORE_ACCESS,
+					&dummy1, &dummy2, &dummy3,
+					&dummy4, &dummy5);
+
+	while (!list_empty(page_list)) {
+		page = lru_to_page(page_list);
+		list_del(&page->lru);
+		dec_zone_page_state(page, NR_ISOLATED_ANON +
+				page_is_file_cache(page));
+		putback_lru_page(page);
+	}
+
+	return nr_reclaimed;
+}
+
 /*
  * Attempt to remove the specified page from its LRU.  Only take this page
  * if it is of the appropriate PageActive status.  Pages which are being
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/3] per-process reclaim
  2016-06-13  7:50 [PATCH v1 0/3] per-process reclaim Minchan Kim
                   ` (2 preceding siblings ...)
  2016-06-13  7:50 ` [PATCH v1 3/3] mm: per-process reclaim Minchan Kim
@ 2016-06-13 11:50 ` Chen Feng
  2016-06-13 12:22   ` ZhaoJunmin Zhao(Junmin)
  2016-06-15  0:43   ` Minchan Kim
  2016-06-13 13:29 ` Vinayak Menon
  4 siblings, 2 replies; 19+ messages in thread
From: Chen Feng @ 2016-06-13 11:50 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Rik van Riel, Redmond,
	ZhaoJunmin Zhao(Junmin),
	Vinayak Menon, Juneho Choi, Sangwoo Park, Chan Gyun Jeong

Hi Minchan,

On 2016/6/13 15:50, Minchan Kim wrote:
> Hi all,
> 
> http://thread.gmane.org/gmane.linux.kernel/1480728
> 
> I sent per-process reclaim patchset three years ago. Then, last
> feedback from akpm was that he want to know real usecase scenario.
> 
> Since then, I got question from several embedded people of various
> company "why it's not merged into mainline" and heard they have used
> the feature as in-house patch and recenlty, I noticed android from
> Qualcomm started to use it.
> 
> Of course, our product have used it and released it in real procuct.
> 
> Quote from Sangwoo Park <angwoo2.park@lge.com>
> Thanks for the data, Sangwoo!
> "
> - Test scenaro
>   - platform: android
>   - target: MSM8952, 2G DDR, 16G eMMC
>   - scenario
>     retry app launch and Back Home with 16 apps and 16 turns
>     (total app launch count is 256)
>   - result:
> 			  resume count   |  cold launching count
> -----------------------------------------------------------------
>  vanilla           |           85        |          171
>  perproc reclaim   |           184       |           72
> "
> 
> Higher resume count is better because cold launching needs loading
> lots of resource data which takes above 15 ~ 20 seconds for some
> games while successful resume just takes 1~5 second.
> 
> As perproc reclaim way with new management policy, we could reduce
> cold launching a lot(i.e., 171-72) so that it reduces app startup
> a lot.
> 
> Another useful function from this feature is to make swapout easily
> which is useful for testing swapout stress and workloads.
> 
Thanks Minchan.

Yes, this is useful interface when there are memory pressure and let the userspace(Android)
to pick process for reclaim. We also take there series into our platform.

But I have a question on the reduce app startup time. Can you also share your
theory(management policy) on how can the app reduce it's startup time?


> Thanks.
> 
> Cc: Redmond <u93410091@gmail.com>
> Cc: ZhaoJunmin Zhao(Junmin) <zhaojunmin@huawei.com>
> Cc: Vinayak Menon <vinmenon@codeaurora.org>
> Cc: Juneho Choi <juno.choi@lge.com>
> Cc: Sangwoo Park <sangwoo2.park@lge.com>
> Cc: Chan Gyun Jeong <chan.jeong@lge.com>
> 
> Minchan Kim (3):
>   mm: vmscan: refactoring force_reclaim
>   mm: vmscan: shrink_page_list with multiple zones
>   mm: per-process reclaim
> 
>  Documentation/filesystems/proc.txt |  15 ++++
>  fs/proc/base.c                     |   1 +
>  fs/proc/internal.h                 |   1 +
>  fs/proc/task_mmu.c                 | 149 +++++++++++++++++++++++++++++++++++++
>  include/linux/rmap.h               |   4 +
>  mm/vmscan.c                        |  85 ++++++++++++++++-----
>  6 files changed, 235 insertions(+), 20 deletions(-)
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/3] per-process reclaim
  2016-06-13 11:50 ` [PATCH v1 0/3] " Chen Feng
@ 2016-06-13 12:22   ` ZhaoJunmin Zhao(Junmin)
  2016-06-15  0:43   ` Minchan Kim
  1 sibling, 0 replies; 19+ messages in thread
From: ZhaoJunmin Zhao(Junmin) @ 2016-06-13 12:22 UTC (permalink / raw)
  To: Chen Feng, Minchan Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Rik van Riel, Redmond, Vinayak Menon,
	Juneho Choi, Sangwoo Park, Chan Gyun Jeong



在 2016/6/13 19:50, Chen Feng 写道:
> Hi Minchan,
>
> On 2016/6/13 15:50, Minchan Kim wrote:
>> Hi all,
>>
>> http://thread.gmane.org/gmane.linux.kernel/1480728
>>
>> I sent per-process reclaim patchset three years ago. Then, last
>> feedback from akpm was that he want to know real usecase scenario.
>>
>> Since then, I got question from several embedded people of various
>> company "why it's not merged into mainline" and heard they have used
>> the feature as in-house patch and recenlty, I noticed android from
>> Qualcomm started to use it.
>>
>> Of course, our product have used it and released it in real procuct.
>>
>> Quote from Sangwoo Park <angwoo2.park@lge.com>
>> Thanks for the data, Sangwoo!
>> "
>> - Test scenaro
>>    - platform: android
>>    - target: MSM8952, 2G DDR, 16G eMMC
>>    - scenario
>>      retry app launch and Back Home with 16 apps and 16 turns
>>      (total app launch count is 256)
>>    - result:
>> 			  resume count   |  cold launching count
>> -----------------------------------------------------------------
>>   vanilla           |           85        |          171
>>   perproc reclaim   |           184       |           72
>> "
>>
>> Higher resume count is better because cold launching needs loading
>> lots of resource data which takes above 15 ~ 20 seconds for some
>> games while successful resume just takes 1~5 second.
>>
>> As perproc reclaim way with new management policy, we could reduce
>> cold launching a lot(i.e., 171-72) so that it reduces app startup
>> a lot.
>>
>> Another useful function from this feature is to make swapout easily
>> which is useful for testing swapout stress and workloads.
>>
> Thanks Minchan.
>
> Yes, this is useful interface when there are memory pressure and let the userspace(Android)
> to pick process for reclaim. We also take there series into our platform.
>
> But I have a question on the reduce app startup time. Can you also share your
> theory(management policy) on how can the app reduce it's startup time?
>
>
>> Thanks.

Yes, In Huawei device, we use the interface now! Now according to 
procsss LRU state in ActivityManagerService, we can reclaim some process
in proactive way.

>>
>> Cc: Redmond <u93410091@gmail.com>
>> Cc: ZhaoJunmin Zhao(Junmin) <zhaojunmin@huawei.com>
>> Cc: Vinayak Menon <vinmenon@codeaurora.org>
>> Cc: Juneho Choi <juno.choi@lge.com>
>> Cc: Sangwoo Park <sangwoo2.park@lge.com>
>> Cc: Chan Gyun Jeong <chan.jeong@lge.com>
>>
>> Minchan Kim (3):
>>    mm: vmscan: refactoring force_reclaim
>>    mm: vmscan: shrink_page_list with multiple zones
>>    mm: per-process reclaim
>>
>>   Documentation/filesystems/proc.txt |  15 ++++
>>   fs/proc/base.c                     |   1 +
>>   fs/proc/internal.h                 |   1 +
>>   fs/proc/task_mmu.c                 | 149 +++++++++++++++++++++++++++++++++++++
>>   include/linux/rmap.h               |   4 +
>>   mm/vmscan.c                        |  85 ++++++++++++++++-----
>>   6 files changed, 235 insertions(+), 20 deletions(-)
>>
>
>
> .
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/3] per-process reclaim
  2016-06-13  7:50 [PATCH v1 0/3] per-process reclaim Minchan Kim
                   ` (3 preceding siblings ...)
  2016-06-13 11:50 ` [PATCH v1 0/3] " Chen Feng
@ 2016-06-13 13:29 ` Vinayak Menon
  2016-06-15  0:57   ` Minchan Kim
  4 siblings, 1 reply; 19+ messages in thread
From: Vinayak Menon @ 2016-06-13 13:29 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton
  Cc: linux-kernel, linux-mm, Rik van Riel, Redmond,
	ZhaoJunmin Zhao(Junmin),
	Juneho Choi, Sangwoo Park, Chan Gyun Jeong

On 6/13/2016 1:20 PM, Minchan Kim wrote:
> Hi all,
>
> http://thread.gmane.org/gmane.linux.kernel/1480728
>
> I sent per-process reclaim patchset three years ago. Then, last
> feedback from akpm was that he want to know real usecase scenario.
>
> Since then, I got question from several embedded people of various
> company "why it's not merged into mainline" and heard they have used
> the feature as in-house patch and recenlty, I noticed android from
> Qualcomm started to use it.
>
> Of course, our product have used it and released it in real procuct.
>
> Quote from Sangwoo Park <angwoo2.park@lge.com>
> Thanks for the data, Sangwoo!
> "
> - Test scenaro
>   - platform: android
>   - target: MSM8952, 2G DDR, 16G eMMC
>   - scenario
>     retry app launch and Back Home with 16 apps and 16 turns
>     (total app launch count is 256)
>   - result:
> 			  resume count   |  cold launching count
> -----------------------------------------------------------------
>  vanilla           |           85        |          171
>  perproc reclaim   |           184       |           72
> "
>
> Higher resume count is better because cold launching needs loading
> lots of resource data which takes above 15 ~ 20 seconds for some
> games while successful resume just takes 1~5 second.
>
> As perproc reclaim way with new management policy, we could reduce
> cold launching a lot(i.e., 171-72) so that it reduces app startup
> a lot.
>
Thanks Minchan for bringing this up. When we had tried the earlier patchset in its original form,
the resume of the app that was reclaimed, was taking a lot of time. But from the data shown above it looks
to be improving the resume time. Is that the resume time of "other" apps which were able to retain their working set
because of the more efficient swapping of low priority apps with per process reclaim ?
Because of the higher resume time we had to modify the logic a bit and device a way to pick a "set" of low priority
(oom_score_adj) tasks and reclaim certain number of pages (only anon) from each of them (the number of pages reclaimed
from each task being proportional to task size). This deviates from the original intention of the patch to rescue a
particular app of interest, but still using the hints on working set provided by userspace and avoiding high resume stalls.
The increased swapping was helping in maintaining a better memory state and lesser page cache reclaim,
resulting in better app resume time, and lesser task kills.

So would it be better if a userspace knob is provided to tell the kernel, the max number of pages to be reclaimed from a task ?
This way userspace can make calculations depending on priority, task size etc and reclaim the required number of pages from
each task, and thus avoid the resume stall because of reclaiming an entire task.

And also, would it be possible to implement the same using per task memcg by setting the limits and swappiness in such a
way that it results inthe same thing that per process reclaim does ?

Thanks,
Vinayak

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-13  7:50 ` [PATCH v1 3/3] mm: per-process reclaim Minchan Kim
@ 2016-06-13 15:06   ` Johannes Weiner
  2016-06-15  0:40     ` Minchan Kim
  2016-06-17  7:24     ` Balbir Singh
  2016-06-13 17:06   ` Rik van Riel
  1 sibling, 2 replies; 19+ messages in thread
From: Johannes Weiner @ 2016-06-13 15:06 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Sangwoo Park

Hi Minchan,

On Mon, Jun 13, 2016 at 04:50:58PM +0900, Minchan Kim wrote:
> These day, there are many platforms available in the embedded market
> and sometime, they has more hints about workingset than kernel so
> they want to involve memory management more heavily like android's
> lowmemory killer and ashmem or user-daemon with lowmemory notifier.
> 
> This patch adds add new method for userspace to manage memory
> efficiently via knob "/proc/<pid>/reclaim" so platform can reclaim
> any process anytime.

Cgroups are our canonical way to control system resources on a per
process or group-of-processes level. I don't like the idea of adding
ad-hoc interfaces for single-use cases like this.

For this particular case, you can already stick each app into its own
cgroup and use memory.force_empty to target-reclaim them.

Or better yet, set the soft limits / memory.low to guide physical
memory pressure, once it actually occurs, toward the least-important
apps? We usually prefer doing work on-demand rather than proactively.

The one-cgroup-per-app model would give Android much more control and
would also remove a *lot* of overhead during task switches, see this:
https://lkml.org/lkml/2014/12/19/358

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-13  7:50 ` [PATCH v1 3/3] mm: per-process reclaim Minchan Kim
  2016-06-13 15:06   ` Johannes Weiner
@ 2016-06-13 17:06   ` Rik van Riel
  2016-06-15  1:01     ` Minchan Kim
  1 sibling, 1 reply; 19+ messages in thread
From: Rik van Riel @ 2016-06-13 17:06 UTC (permalink / raw)
  To: Minchan Kim, Andrew Morton; +Cc: linux-kernel, linux-mm, Sangwoo Park

[-- Attachment #1: Type: text/plain, Size: 638 bytes --]

On Mon, 2016-06-13 at 16:50 +0900, Minchan Kim wrote:
> These day, there are many platforms available in the embedded market
> and sometime, they has more hints about workingset than kernel so
> they want to involve memory management more heavily like android's
> lowmemory killer and ashmem or user-daemon with lowmemory notifier.
> 
> This patch adds add new method for userspace to manage memory
> efficiently via knob "/proc/<pid>/reclaim" so platform can reclaim
> any process anytime.
> 

Could it make sense to invoke this automatically,
perhaps from the Android low memory killer code?

-- 
All Rights Reversed.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-13 15:06   ` Johannes Weiner
@ 2016-06-15  0:40     ` Minchan Kim
  2016-06-16 11:07       ` Michal Hocko
  2016-06-16 14:41       ` Johannes Weiner
  2016-06-17  7:24     ` Balbir Singh
  1 sibling, 2 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-15  0:40 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Sangwoo Park

Hi Johannes,

On Mon, Jun 13, 2016 at 11:06:53AM -0400, Johannes Weiner wrote:
> Hi Minchan,
> 
> On Mon, Jun 13, 2016 at 04:50:58PM +0900, Minchan Kim wrote:
> > These day, there are many platforms available in the embedded market
> > and sometime, they has more hints about workingset than kernel so
> > they want to involve memory management more heavily like android's
> > lowmemory killer and ashmem or user-daemon with lowmemory notifier.
> > 
> > This patch adds add new method for userspace to manage memory
> > efficiently via knob "/proc/<pid>/reclaim" so platform can reclaim
> > any process anytime.
> 
> Cgroups are our canonical way to control system resources on a per
> process or group-of-processes level. I don't like the idea of adding
> ad-hoc interfaces for single-use cases like this.
> 
> For this particular case, you can already stick each app into its own
> cgroup and use memory.force_empty to target-reclaim them.
> 
> Or better yet, set the soft limits / memory.low to guide physical
> memory pressure, once it actually occurs, toward the least-important
> apps? We usually prefer doing work on-demand rather than proactively.
> 
> The one-cgroup-per-app model would give Android much more control and
> would also remove a *lot* of overhead during task switches, see this:
> https://lkml.org/lkml/2014/12/19/358

I didn't notice that. Thanks for the pointing.
I read the thread you pointed out and read memcg code.

Firstly, I thought one-cgroup-per-app model is abuse of memcg but now
I feel your suggestion does make sense that it's right direction for
control memory from the userspace. Just a concern is that not sure
how hard we can map memory management model from global memory pressure
to per-app pressure model smoothly.

A question is it seems cgroup2 doesn't have per-cgroup swappiness.
Why?

I think we need it in one-cgroup-per-app model.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/3] per-process reclaim
  2016-06-13 11:50 ` [PATCH v1 0/3] " Chen Feng
  2016-06-13 12:22   ` ZhaoJunmin Zhao(Junmin)
@ 2016-06-15  0:43   ` Minchan Kim
  1 sibling, 0 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-15  0:43 UTC (permalink / raw)
  To: Chen Feng
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Redmond,
	ZhaoJunmin Zhao(Junmin),
	Vinayak Menon, Juneho Choi, Sangwoo Park, Chan Gyun Jeong

Hi Chen,

On Mon, Jun 13, 2016 at 07:50:00PM +0800, Chen Feng wrote:
> Hi Minchan,
> 
> On 2016/6/13 15:50, Minchan Kim wrote:
> > Hi all,
> > 
> > http://thread.gmane.org/gmane.linux.kernel/1480728
> > 
> > I sent per-process reclaim patchset three years ago. Then, last
> > feedback from akpm was that he want to know real usecase scenario.
> > 
> > Since then, I got question from several embedded people of various
> > company "why it's not merged into mainline" and heard they have used
> > the feature as in-house patch and recenlty, I noticed android from
> > Qualcomm started to use it.
> > 
> > Of course, our product have used it and released it in real procuct.
> > 
> > Quote from Sangwoo Park <angwoo2.park@lge.com>
> > Thanks for the data, Sangwoo!
> > "
> > - Test scenaro
> >   - platform: android
> >   - target: MSM8952, 2G DDR, 16G eMMC
> >   - scenario
> >     retry app launch and Back Home with 16 apps and 16 turns
> >     (total app launch count is 256)
> >   - result:
> > 			  resume count   |  cold launching count
> > -----------------------------------------------------------------
> >  vanilla           |           85        |          171
> >  perproc reclaim   |           184       |           72
> > "
> > 
> > Higher resume count is better because cold launching needs loading
> > lots of resource data which takes above 15 ~ 20 seconds for some
> > games while successful resume just takes 1~5 second.
> > 
> > As perproc reclaim way with new management policy, we could reduce
> > cold launching a lot(i.e., 171-72) so that it reduces app startup
> > a lot.
> > 
> > Another useful function from this feature is to make swapout easily
> > which is useful for testing swapout stress and workloads.
> > 
> Thanks Minchan.
> 
> Yes, this is useful interface when there are memory pressure and let the userspace(Android)
> to pick process for reclaim. We also take there series into our platform.
> 
> But I have a question on the reduce app startup time. Can you also share your
> theory(management policy) on how can the app reduce it's startup time?

What I meant about start-up time is as follows,

If a app is killed, it should launch from start so if it was the game app,
it should load lots of resource file which takes a long time.
However, if the game was not killed, we can enjoy the game without cold
start so it is very fast startup.

Sorry for confusing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/3] per-process reclaim
  2016-06-13 13:29 ` Vinayak Menon
@ 2016-06-15  0:57   ` Minchan Kim
  2016-06-16  4:21     ` Vinayak Menon
  0 siblings, 1 reply; 19+ messages in thread
From: Minchan Kim @ 2016-06-15  0:57 UTC (permalink / raw)
  To: Vinayak Menon
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Redmond,
	ZhaoJunmin Zhao(Junmin),
	Juneho Choi, Sangwoo Park, Chan Gyun Jeong

On Mon, Jun 13, 2016 at 06:59:40PM +0530, Vinayak Menon wrote:
> On 6/13/2016 1:20 PM, Minchan Kim wrote:
> > Hi all,
> >
> > http://thread.gmane.org/gmane.linux.kernel/1480728
> >
> > I sent per-process reclaim patchset three years ago. Then, last
> > feedback from akpm was that he want to know real usecase scenario.
> >
> > Since then, I got question from several embedded people of various
> > company "why it's not merged into mainline" and heard they have used
> > the feature as in-house patch and recenlty, I noticed android from
> > Qualcomm started to use it.
> >
> > Of course, our product have used it and released it in real procuct.
> >
> > Quote from Sangwoo Park <angwoo2.park@lge.com>
> > Thanks for the data, Sangwoo!
> > "
> > - Test scenaro
> >   - platform: android
> >   - target: MSM8952, 2G DDR, 16G eMMC
> >   - scenario
> >     retry app launch and Back Home with 16 apps and 16 turns
> >     (total app launch count is 256)
> >   - result:
> > 			  resume count   |  cold launching count
> > -----------------------------------------------------------------
> >  vanilla           |           85        |          171
> >  perproc reclaim   |           184       |           72
> > "
> >
> > Higher resume count is better because cold launching needs loading
> > lots of resource data which takes above 15 ~ 20 seconds for some
> > games while successful resume just takes 1~5 second.
> >
> > As perproc reclaim way with new management policy, we could reduce
> > cold launching a lot(i.e., 171-72) so that it reduces app startup
> > a lot.
> >
> Thanks Minchan for bringing this up. When we had tried the earlier patchset in its original form,
> the resume of the app that was reclaimed, was taking a lot of time. But from the data shown above it looks
> to be improving the resume time. Is that the resume time of "other" apps which were able to retain their working set
> because of the more efficient swapping of low priority apps with per process reclaim ?

Sorry for confusing. I meant the app should start from the scratch
if it was killed, which might need load a hundread megabytes while
resume needs to load just workingset memory which would be smaller.

> Because of the higher resume time we had to modify the logic a bit and device a way to pick a "set" of low priority
> (oom_score_adj) tasks and reclaim certain number of pages (only anon) from each of them (the number of pages reclaimed
> from each task being proportional to task size). This deviates from the original intention of the patch to rescue a
> particular app of interest, but still using the hints on working set provided by userspace and avoiding high resume stalls.
> The increased swapping was helping in maintaining a better memory state and lesser page cache reclaim,
> resulting in better app resume time, and lesser task kills.

Fair enough.

> 
> So would it be better if a userspace knob is provided to tell the kernel, the max number of pages to be reclaimed from a task ?
> This way userspace can make calculations depending on priority, task size etc and reclaim the required number of pages from
> each task, and thus avoid the resume stall because of reclaiming an entire task.
> 
> And also, would it be possible to implement the same using per task memcg by setting the limits and swappiness in such a
> way that it results inthe same thing that per process reclaim does ?

Yeb, I read Johannes's thread which suggests one-cgroup-per-app model.
It does make sense to me. It is worth to try although I guess it's not
easy to control memory usage on demand, not proactively.
If we can do, maybe we don't need per-process reclaim policy which
is rather coarse-grained model of reclaim POV.
However, a concern with one-cgroup-per-app model is LRU list size
of a cgroup is much smaller so how LRU aging works well and
LRU churing(e.g., compaction) effect would be severe than old.

I guess codeaurora tried memcg model for android.
Could you share if you know something?

Thanks.


> 
> Thanks,
> Vinayak

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-13 17:06   ` Rik van Riel
@ 2016-06-15  1:01     ` Minchan Kim
  0 siblings, 0 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-15  1:01 UTC (permalink / raw)
  To: Rik van Riel; +Cc: Andrew Morton, linux-kernel, linux-mm, Sangwoo Park

On Mon, Jun 13, 2016 at 01:06:35PM -0400, Rik van Riel wrote:
> On Mon, 2016-06-13 at 16:50 +0900, Minchan Kim wrote:
> > These day, there are many platforms available in the embedded market
> > and sometime, they has more hints about workingset than kernel so
> > they want to involve memory management more heavily like android's
> > lowmemory killer and ashmem or user-daemon with lowmemory notifier.
> > 
> > This patch adds add new method for userspace to manage memory
> > efficiently via knob "/proc/<pid>/reclaim" so platform can reclaim
> > any process anytime.
> > 
> 
> Could it make sense to invoke this automatically,
> perhaps from the Android low memory killer code?

It's doable. In fact, It was first internal implementation of our
product. However, I wanted to use it on platforms which don't have
lowmemory killer. :)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 0/3] per-process reclaim
  2016-06-15  0:57   ` Minchan Kim
@ 2016-06-16  4:21     ` Vinayak Menon
  0 siblings, 0 replies; 19+ messages in thread
From: Vinayak Menon @ 2016-06-16  4:21 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Redmond,
	ZhaoJunmin Zhao(Junmin),
	Juneho Choi, Sangwoo Park, Chan Gyun Jeong


On 6/15/2016 6:27 AM, Minchan Kim wrote:
>
> Yeb, I read Johannes's thread which suggests one-cgroup-per-app model.
> It does make sense to me. It is worth to try although I guess it's not
> easy to control memory usage on demand, not proactively.
> If we can do, maybe we don't need per-process reclaim policy which
> is rather coarse-grained model of reclaim POV.
> However, a concern with one-cgroup-per-app model is LRU list size
> of a cgroup is much smaller so how LRU aging works well and
> LRU churing(e.g., compaction) effect would be severe than old.
And I was thinking what would vmpressure mean and how to use it when cgroup is per task.
>
> I guess codeaurora tried memcg model for android.
> Could you share if you know something?
>
We tried, but had issues with charge migration and then Johannes suggested per task cgroup.
But that's not tried yet.

Thanks

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-15  0:40     ` Minchan Kim
@ 2016-06-16 11:07       ` Michal Hocko
  2016-06-16 14:41       ` Johannes Weiner
  1 sibling, 0 replies; 19+ messages in thread
From: Michal Hocko @ 2016-06-16 11:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Johannes Weiner, Andrew Morton, linux-kernel, linux-mm,
	Rik van Riel, Sangwoo Park

On Wed 15-06-16 09:40:27, Minchan Kim wrote:
[...]
> A question is it seems cgroup2 doesn't have per-cgroup swappiness.
> Why?

There was no strong use case for it AFAICT.
 
> I think we need it in one-cgroup-per-app model.

I wouldn't be opposed if it is really needed.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-15  0:40     ` Minchan Kim
  2016-06-16 11:07       ` Michal Hocko
@ 2016-06-16 14:41       ` Johannes Weiner
  2016-06-17  6:43         ` Minchan Kim
  1 sibling, 1 reply; 19+ messages in thread
From: Johannes Weiner @ 2016-06-16 14:41 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Sangwoo Park

On Wed, Jun 15, 2016 at 09:40:27AM +0900, Minchan Kim wrote:
> A question is it seems cgroup2 doesn't have per-cgroup swappiness.
> Why?
> 
> I think we need it in one-cgroup-per-app model.

Can you explain why you think that?

As we have talked about this recently in the LRU balancing thread,
swappiness is the cost factor between file IO and swapping, so the
only situation I can imagine you'd need a memcg swappiness setting is
when you have different cgroups use different storage devices that do
not have comparable speeds.

So I'm not sure I understand the relationship to an app-group model.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-16 14:41       ` Johannes Weiner
@ 2016-06-17  6:43         ` Minchan Kim
  0 siblings, 0 replies; 19+ messages in thread
From: Minchan Kim @ 2016-06-17  6:43 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Sangwoo Park

Hi Hannes,

On Thu, Jun 16, 2016 at 10:41:02AM -0400, Johannes Weiner wrote:
> On Wed, Jun 15, 2016 at 09:40:27AM +0900, Minchan Kim wrote:
> > A question is it seems cgroup2 doesn't have per-cgroup swappiness.
> > Why?
> > 
> > I think we need it in one-cgroup-per-app model.
> 
> Can you explain why you think that?
> 
> As we have talked about this recently in the LRU balancing thread,
> swappiness is the cost factor between file IO and swapping, so the
> only situation I can imagine you'd need a memcg swappiness setting is
> when you have different cgroups use different storage devices that do
> not have comparable speeds.
> 
> So I'm not sure I understand the relationship to an app-group model.

Sorry for lacking the inforamtion. I should have written more clear.
In fact, what we need is *per-memcg-swap-device*.

What I want is to avoid kill background application although memory
is overflow because cold launcing of app takes a very long time
compared to resume(ie, just switching). I also want to keep a mount
of free pages in the memory so that new application startup cannot
be stuck by reclaim activities.

To get free memory, I want to reclaim less important app rather than
killing. In this time, we can support two swap devices.

A one is zram, other is slow storage but much bigger than zram size.
Then, we can use storage swap to reclaim pages for not-important app
while we can use zram swap for for important app(e.g., forground app,
system services, daemon and so on).

IOW, we want to support mutiple swap device with one-cgroup-per-app
and the storage speed is totally different.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-13 15:06   ` Johannes Weiner
  2016-06-15  0:40     ` Minchan Kim
@ 2016-06-17  7:24     ` Balbir Singh
  2016-06-17  7:57       ` Vinayak Menon
  1 sibling, 1 reply; 19+ messages in thread
From: Balbir Singh @ 2016-06-17  7:24 UTC (permalink / raw)
  To: Johannes Weiner, Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Sangwoo Park



On 14/06/16 01:06, Johannes Weiner wrote:
> Hi Minchan,
> 
> On Mon, Jun 13, 2016 at 04:50:58PM +0900, Minchan Kim wrote:
>> These day, there are many platforms available in the embedded market
>> and sometime, they has more hints about workingset than kernel so
>> they want to involve memory management more heavily like android's
>> lowmemory killer and ashmem or user-daemon with lowmemory notifier.
>>
>> This patch adds add new method for userspace to manage memory
>> efficiently via knob "/proc/<pid>/reclaim" so platform can reclaim
>> any process anytime.
> 
> Cgroups are our canonical way to control system resources on a per
> process or group-of-processes level. I don't like the idea of adding
> ad-hoc interfaces for single-use cases like this.
> 
> For this particular case, you can already stick each app into its own
> cgroup and use memory.force_empty to target-reclaim them.
> 
> Or better yet, set the soft limits / memory.low to guide physical
> memory pressure, once it actually occurs, toward the least-important
> apps? We usually prefer doing work on-demand rather than proactively.
> 
> The one-cgroup-per-app model would give Android much more control and
> would also remove a *lot* of overhead during task switches, see this:
> https://lkml.org/lkml/2014/12/19/358

Yes, I'd agree. cgroups can group many tasks, but the group size can be
1 as well. Could you try the same test with the recommended approach and
see if it works as desired? 

Balbir Singh

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v1 3/3] mm: per-process reclaim
  2016-06-17  7:24     ` Balbir Singh
@ 2016-06-17  7:57       ` Vinayak Menon
  0 siblings, 0 replies; 19+ messages in thread
From: Vinayak Menon @ 2016-06-17  7:57 UTC (permalink / raw)
  To: Balbir Singh, Johannes Weiner, Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Rik van Riel, Sangwoo Park

On 6/17/2016 12:54 PM, Balbir Singh wrote:
>
> On 14/06/16 01:06, Johannes Weiner wrote:
>> Hi Minchan,
>>
>> On Mon, Jun 13, 2016 at 04:50:58PM +0900, Minchan Kim wrote:
>>> These day, there are many platforms available in the embedded market
>>> and sometime, they has more hints about workingset than kernel so
>>> they want to involve memory management more heavily like android's
>>> lowmemory killer and ashmem or user-daemon with lowmemory notifier.
>>>
>>> This patch adds add new method for userspace to manage memory
>>> efficiently via knob "/proc/<pid>/reclaim" so platform can reclaim
>>> any process anytime.
>> Cgroups are our canonical way to control system resources on a per
>> process or group-of-processes level. I don't like the idea of adding
>> ad-hoc interfaces for single-use cases like this.
>>
>> For this particular case, you can already stick each app into its own
>> cgroup and use memory.force_empty to target-reclaim them.
>>
>> Or better yet, set the soft limits / memory.low to guide physical
>> memory pressure, once it actually occurs, toward the least-important
>> apps? We usually prefer doing work on-demand rather than proactively.
>>
>> The one-cgroup-per-app model would give Android much more control and
>> would also remove a *lot* of overhead during task switches, see this:
>> https://lkml.org/lkml/2014/12/19/358
> Yes, I'd agree. cgroups can group many tasks, but the group size can be
> 1 as well. Could you try the same test with the recommended approach and
> see if it works as desired? 
>
With cgroup v2, IIUC there can be only a single hierarchy where all controllers exist, and
a process can be part of only one cgroup. If that is true, with per task cgroup, a task can
be present only in its own cgroup. That being the case would it be feasible to have other
parallel controllers like CPU which would not be able to work efficiently with per task cgroup ?

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-06-17  7:58 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-13  7:50 [PATCH v1 0/3] per-process reclaim Minchan Kim
2016-06-13  7:50 ` [PATCH v1 1/3] mm: vmscan: refactoring force_reclaim Minchan Kim
2016-06-13  7:50 ` [PATCH v1 2/3] mm: vmscan: shrink_page_list with multiple zones Minchan Kim
2016-06-13  7:50 ` [PATCH v1 3/3] mm: per-process reclaim Minchan Kim
2016-06-13 15:06   ` Johannes Weiner
2016-06-15  0:40     ` Minchan Kim
2016-06-16 11:07       ` Michal Hocko
2016-06-16 14:41       ` Johannes Weiner
2016-06-17  6:43         ` Minchan Kim
2016-06-17  7:24     ` Balbir Singh
2016-06-17  7:57       ` Vinayak Menon
2016-06-13 17:06   ` Rik van Riel
2016-06-15  1:01     ` Minchan Kim
2016-06-13 11:50 ` [PATCH v1 0/3] " Chen Feng
2016-06-13 12:22   ` ZhaoJunmin Zhao(Junmin)
2016-06-15  0:43   ` Minchan Kim
2016-06-13 13:29 ` Vinayak Menon
2016-06-15  0:57   ` Minchan Kim
2016-06-16  4:21     ` Vinayak Menon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).