linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/4] ZRAM: make it just store the high compression rate page
@ 2016-08-22  8:25 Hui Zhu
  2016-08-22  8:25 ` [RFC 1/4] vmscan.c: shrink_page_list: unmap anon pages after pageout Hui Zhu
                   ` (6 more replies)
  0 siblings, 7 replies; 16+ messages in thread
From: Hui Zhu @ 2016-08-22  8:25 UTC (permalink / raw)
  To: minchan, ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo,
	peterz, acme, alexander.shishkin, akpm, mhocko, hannes, mgorman,
	vbabka, zhuhui, redkoi, luto, kirill.shutemov, geliangtang,
	baiyaowei, dan.j.williams, vdavydov, aarcange, dvlasenk,
	jmarchan, koct9i, yang.shi, dave.hansen, vkuznets, vitalywool,
	ross.zwisler, tglx, kwapulinski.piotr, axboe, mchristi, joe,
	namit, riel, linux-kernel, linux-mm
  Cc: teawater

Current ZRAM just can store all pages even if the compression rate
of a page is really low.  So the compression rate of ZRAM is out of
control when it is running.
In my part, I did some test and record with ZRAM.  The compression rate
is about 40%.

This series of patches make ZRAM can just store the page that the
compressed size is smaller than a value.
With these patches, I set the value to 2048 and did the same test with
before.  The compression rate is about 20%.  The times of lowmemorykiller
also decreased.

Hui Zhu (4):
vmscan.c: shrink_page_list: unmap anon pages after pageout
Add non-swap page flag to mark a page will not swap
ZRAM: do not swap the pages that compressed size bigger than non_swap
vmscan.c: zram: add non swap support for shmem file pages

 drivers/block/zram/Kconfig     |   11 +++
 drivers/block/zram/zram_drv.c  |   38 +++++++++++
 drivers/block/zram/zram_drv.h  |    4 +
 fs/proc/meminfo.c              |    6 +
 include/linux/mm_inline.h      |   20 +++++
 include/linux/mmzone.h         |    3 
 include/linux/page-flags.h     |    8 ++
 include/linux/rmap.h           |    5 +
 include/linux/shmem_fs.h       |    6 +
 include/trace/events/mmflags.h |    9 ++
 kernel/events/uprobes.c        |   16 ++++
 mm/Kconfig                     |    9 ++
 mm/memory.c                    |   34 ++++++++++
 mm/migrate.c                   |    4 +
 mm/mprotect.c                  |    8 ++
 mm/page_io.c                   |   11 ++-
 mm/rmap.c                      |   23 ++++++
 mm/shmem.c                     |   77 +++++++++++++++++-----
 mm/vmscan.c                    |  139 +++++++++++++++++++++++++++++++++++------
 19 files changed, 387 insertions(+), 44 deletions(-)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [RFC 1/4] vmscan.c: shrink_page_list: unmap anon pages after pageout
  2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
@ 2016-08-22  8:25 ` Hui Zhu
  2016-08-22  8:25 ` [RFC 2/4] Add non-swap page flag to mark a page will not swap Hui Zhu
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Hui Zhu @ 2016-08-22  8:25 UTC (permalink / raw)
  To: minchan, ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo,
	peterz, acme, alexander.shishkin, akpm, mhocko, hannes, mgorman,
	vbabka, zhuhui, redkoi, luto, kirill.shutemov, geliangtang,
	baiyaowei, dan.j.williams, vdavydov, aarcange, dvlasenk,
	jmarchan, koct9i, yang.shi, dave.hansen, vkuznets, vitalywool,
	ross.zwisler, tglx, kwapulinski.piotr, axboe, mchristi, joe,
	namit, riel, linux-kernel, linux-mm
  Cc: teawater

The page is unmapped when ZRAM get the compressed size.  At it is added
to swapcache.
To remove it from swapcache need set each pte back to point to pfn.
But these is not a way to do it.

This patch set each pte readonly before pageout.  Then when the page is
written when save its data to ZRAM, its pte will be set to dirty.
After pageout, shrink_page_list will check the pte and re-dirty the page.
After pageout successfully and page is not dirty, unmap the page.

This patch doesn't handle the shmem file pages that use swap too.
The reason is I just find a hack way the make sure a page is shmem file
page. Then I separate code of shmem file pages to last patch of this
series.

Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
---
 include/linux/rmap.h |  5 ++++
 mm/Kconfig           |  4 +++
 mm/page_io.c         | 11 ++++---
 mm/rmap.c            | 28 ++++++++++++++++++
 mm/vmscan.c          | 81 +++++++++++++++++++++++++++++++++++++++++-----------
 5 files changed, 108 insertions(+), 21 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b46bb56..4259c46 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -88,6 +88,11 @@ enum ttu_flags {
 	TTU_LZFREE = 8,			/* lazy free mode */
 	TTU_SPLIT_HUGE_PMD = 16,	/* split huge PMD if any */
 
+#ifdef CONFIG_LATE_UNMAP
+	TTU_CHECK_DIRTY = (1 << 5),	/* Check dirty mode */
+	TTU_READONLY = (1 << 6),	/* Change readonly mode */
+#endif
+
 	TTU_IGNORE_MLOCK = (1 << 8),	/* ignore mlock */
 	TTU_IGNORE_ACCESS = (1 << 9),	/* don't age */
 	TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
diff --git a/mm/Kconfig b/mm/Kconfig
index 78a23c5..57ecdb3 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -704,3 +704,7 @@ config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
 	bool
+
+config LATE_UNMAP
+	bool
+	depends on SWAP
diff --git a/mm/page_io.c b/mm/page_io.c
index 16bd82fa..adaf801 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -237,10 +237,13 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 {
 	int ret = 0;
 
-	if (try_to_free_swap(page)) {
-		unlock_page(page);
-		goto out;
-	}
+#ifdef CONFIG_LATE_UNMAP
+	if (!(PageAnon(page) && page_mapped(page)))
+#endif
+		if (try_to_free_swap(page)) {
+			unlock_page(page);
+			goto out;
+		}
 	if (frontswap_store(page) == 0) {
 		set_page_writeback(page);
 		unlock_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index 1ef3640..d484f95 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1488,6 +1488,29 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		}
   	}
 
+#ifdef CONFIG_LATE_UNMAP
+	if ((flags & TTU_CHECK_DIRTY) || (flags & TTU_READONLY)) {
+		BUG_ON(!PageAnon(page));
+
+		pteval = *pte;
+
+		BUG_ON(pte_write(pteval) &&
+		       page_mapcount(page) + page_swapcount(page) > 1);
+
+		if ((flags & TTU_CHECK_DIRTY) && pte_dirty(pteval)) {
+			set_page_dirty(page);
+			pteval = pte_mkclean(pteval);
+		}
+
+		if (flags & TTU_READONLY)
+			pteval = pte_wrprotect(pteval);
+
+		if (!pte_same(*pte, pteval))
+			set_pte_at(mm, address, pte, pteval);
+		goto out_unmap;
+	}
+#endif
+
 	/* Nuke the page table entry. */
 	flush_cache_page(vma, address, page_to_pfn(page));
 	if (should_defer_flush(mm, flags)) {
@@ -1657,6 +1680,11 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
 	else
 		ret = rmap_walk(page, &rwc);
 
+#ifdef CONFIG_LATE_UNMAP
+	if ((flags & (TTU_READONLY | TTU_CHECK_DIRTY)) &&
+	    ret == SWAP_AGAIN)
+		ret = SWAP_SUCCESS;
+#endif
 	if (ret != SWAP_MLOCK && !page_mapcount(page)) {
 		ret = SWAP_SUCCESS;
 		if (rp.lazyfreed && !PageDirty(page))
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 374d95d..32fef7d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -494,12 +494,19 @@ void drop_slab(void)
 
 static inline int is_page_cache_freeable(struct page *page)
 {
+	int count = page_count(page) - page_has_private(page);
+
+#ifdef CONFIG_LATE_UNMAP
+	if (PageAnon(page))
+		count -= page_mapcount(page);
+#endif
+
 	/*
 	 * A freeable page cache page is referenced only by the caller
 	 * that isolated the page, the page cache radix tree and
 	 * optional buffer heads at page->private.
 	 */
-	return page_count(page) - page_has_private(page) == 2;
+	return count == 2;
 }
 
 static int may_write_to_inode(struct inode *inode, struct scan_control *sc)
@@ -894,6 +901,22 @@ static void page_check_dirty_writeback(struct page *page,
 		mapping->a_ops->is_dirty_writeback(page, dirty, writeback);
 }
 
+#define TRY_TO_UNMAP(_page, _ttu_flags)				\
+	do {							\
+		switch (try_to_unmap(_page, _ttu_flags)) {	\
+		case SWAP_FAIL:					\
+			goto activate_locked;			\
+		case SWAP_AGAIN:				\
+			goto keep_locked;			\
+		case SWAP_MLOCK:				\
+			goto cull_mlocked;			\
+		case SWAP_LZFREE:				\
+			goto lazyfree;				\
+		case SWAP_SUCCESS:				\
+			; /* try to free the page below */	\
+		}						\
+	} while (0)
+
 /*
  * shrink_page_list() returns the number of reclaimed pages
  */
@@ -925,7 +948,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		struct page *page;
 		int may_enter_fs;
 		enum page_references references = PAGEREF_RECLAIM_CLEAN;
-		bool dirty, writeback;
+		bool dirty, writeback, anon;
 		bool lazyfree = false;
 		int ret = SWAP_SUCCESS;
 
@@ -1061,11 +1084,13 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			; /* try to reclaim the page below */
 		}
 
+		anon = PageAnon(page);
+
 		/*
 		 * Anonymous process memory has backing store?
 		 * Try to allocate it some swap space here.
 		 */
-		if (PageAnon(page) && !PageSwapCache(page)) {
+		if (anon && !PageSwapCache(page)) {
 			if (!(sc->gfp_mask & __GFP_IO))
 				goto keep_locked;
 			if (!add_to_swap(page, page_list))
@@ -1083,25 +1108,28 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 
 		VM_BUG_ON_PAGE(PageTransHuge(page), page);
 
+		ttu_flags = lazyfree ?
+				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
+				(ttu_flags | TTU_BATCH_FLUSH);
+
 		/*
 		 * The page is mapped into the page tables of one or more
 		 * processes. Try to unmap it here.
 		 */
 		if (page_mapped(page) && mapping) {
-			switch (ret = try_to_unmap(page, lazyfree ?
-				(ttu_flags | TTU_BATCH_FLUSH | TTU_LZFREE) :
-				(ttu_flags | TTU_BATCH_FLUSH))) {
-			case SWAP_FAIL:
-				goto activate_locked;
-			case SWAP_AGAIN:
-				goto keep_locked;
-			case SWAP_MLOCK:
-				goto cull_mlocked;
-			case SWAP_LZFREE:
-				goto lazyfree;
-			case SWAP_SUCCESS:
-				; /* try to free the page below */
-			}
+			enum ttu_flags l_ttu_flags = ttu_flags;
+
+#ifdef CONFIG_LATE_UNMAP
+			/* Hanle the pte_dirty
+			   and change pte to readonly.
+			   Write behavior before unmap will make
+			   pte dirty again.  Then we can check
+			   pte_dirty before unmap to make sure
+			   the page was written or not.  */
+			if (anon)
+				l_ttu_flags |= TTU_CHECK_DIRTY | TTU_READONLY;
+#endif
+			TRY_TO_UNMAP(page, l_ttu_flags);
 		}
 
 		if (PageDirty(page)) {
@@ -1157,6 +1185,25 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 					goto keep;
 				if (PageDirty(page) || PageWriteback(page))
 					goto keep_locked;
+
+#ifdef CONFIG_LATE_UNMAP
+				if (anon) {
+					if (!PageSwapCache(page))
+						goto keep_locked;
+
+					/* Check if pte dirty by do_swap_page
+					   or do_wp_page.  */
+					TRY_TO_UNMAP(page,
+						     ttu_flags |
+						     TTU_CHECK_DIRTY);
+					if (PageDirty(page))
+						goto keep_locked;
+
+					if (page_mapped(page) && mapping)
+						TRY_TO_UNMAP(page, ttu_flags);
+				}
+#endif
+
 				mapping = page_mapping(page);
 			case PAGE_CLEAN:
 				; /* try to free the page below */
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 2/4] Add non-swap page flag to mark a page will not swap
  2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
  2016-08-22  8:25 ` [RFC 1/4] vmscan.c: shrink_page_list: unmap anon pages after pageout Hui Zhu
@ 2016-08-22  8:25 ` Hui Zhu
  2016-09-06 15:35   ` Steven Rostedt
  2016-08-22  8:25 ` [RFC 3/4] ZRAM: do not swap the page that compressed size bigger than non_swap Hui Zhu
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 16+ messages in thread
From: Hui Zhu @ 2016-08-22  8:25 UTC (permalink / raw)
  To: minchan, ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo,
	peterz, acme, alexander.shishkin, akpm, mhocko, hannes, mgorman,
	vbabka, zhuhui, redkoi, luto, kirill.shutemov, geliangtang,
	baiyaowei, dan.j.williams, vdavydov, aarcange, dvlasenk,
	jmarchan, koct9i, yang.shi, dave.hansen, vkuznets, vitalywool,
	ross.zwisler, tglx, kwapulinski.piotr, axboe, mchristi, joe,
	namit, riel, linux-kernel, linux-mm
  Cc: teawater

After a page marked non-swap flag in swap driver, it will add to
unevictable lru list.
This page will be kept in this status before its data changed.

Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
---
 fs/proc/meminfo.c              |  6 ++++++
 include/linux/mm_inline.h      | 20 ++++++++++++++++++--
 include/linux/mmzone.h         |  3 +++
 include/linux/page-flags.h     |  8 ++++++++
 include/trace/events/mmflags.h |  9 ++++++++-
 kernel/events/uprobes.c        | 16 +++++++++++++++-
 mm/Kconfig                     |  5 +++++
 mm/memory.c                    | 34 ++++++++++++++++++++++++++++++++++
 mm/migrate.c                   |  4 ++++
 mm/mprotect.c                  |  8 ++++++++
 mm/vmscan.c                    | 41 ++++++++++++++++++++++++++++++++++++++++-
 11 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index b9a8c81..5c79b2e 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -79,6 +79,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 		"SwapTotal:      %8lu kB\n"
 		"SwapFree:       %8lu kB\n"
+#ifdef CONFIG_NON_SWAP
+		"NonSwap:        %8lu kB\n"
+#endif
 		"Dirty:          %8lu kB\n"
 		"Writeback:      %8lu kB\n"
 		"AnonPages:      %8lu kB\n"
@@ -138,6 +141,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
 		K(i.totalswap),
 		K(i.freeswap),
+#ifdef CONFIG_NON_SWAP
+		K(global_page_state(NR_NON_SWAP)),
+#endif
 		K(global_node_page_state(NR_FILE_DIRTY)),
 		K(global_node_page_state(NR_WRITEBACK)),
 		K(global_node_page_state(NR_ANON_MAPPED)),
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 71613e8..92298ce 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -46,15 +46,31 @@ static __always_inline void update_lru_size(struct lruvec *lruvec,
 static __always_inline void add_page_to_lru_list(struct page *page,
 				struct lruvec *lruvec, enum lru_list lru)
 {
-	update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page));
+	int nr_pages = hpage_nr_pages(page);
+	enum zone_type zid = page_zonenum(page);
+#ifdef CONFIG_NON_SWAP
+	if (PageNonSwap(page)) {
+		lru = LRU_UNEVICTABLE;
+		update_lru_size(lruvec, NR_NON_SWAP, zid, nr_pages);
+	}
+#endif
+	update_lru_size(lruvec, lru, zid, nr_pages);
 	list_add(&page->lru, &lruvec->lists[lru]);
 }
 
 static __always_inline void del_page_from_lru_list(struct page *page,
 				struct lruvec *lruvec, enum lru_list lru)
 {
+	int nr_pages = hpage_nr_pages(page);
+	enum zone_type zid = page_zonenum(page);
+#ifdef CONFIG_NON_SWAP
+	if (PageNonSwap(page)) {
+		lru = LRU_UNEVICTABLE;
+		update_lru_size(lruvec, NR_NON_SWAP, zid, -nr_pages);
+	}
+#endif
 	list_del(&page->lru);
-	update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page));
+	update_lru_size(lruvec, lru, zid, -nr_pages);
 }
 
 /**
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d572b78..da08d20 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -138,6 +138,9 @@ enum zone_stat_item {
 	NUMA_OTHER,		/* allocation from other node */
 #endif
 	NR_FREE_CMA_PAGES,
+#ifdef CONFIG_NON_SWAP
+	NR_NON_SWAP,
+#endif
 	NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 74e4dda..0cd80db9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -105,6 +105,9 @@ enum pageflags {
 	PG_young,
 	PG_idle,
 #endif
+#ifdef CONFIG_NON_SWAP
+	PG_non_swap,
+#endif
 	__NR_PAGEFLAGS,
 
 	/* Filesystems */
@@ -303,6 +306,11 @@ PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
 PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
 	TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
 
+#ifdef CONFIG_NON_SWAP
+PAGEFLAG(NonSwap, non_swap, PF_NO_TAIL)
+	TESTSCFLAG(NonSwap, non_swap, PF_NO_TAIL)
+#endif
+
 #ifdef CONFIG_HIGHMEM
 /*
  * Must use a macro here due to header dependency issues. page_zone() is not
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 5a81ab4..1c0ccc9 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -79,6 +79,12 @@
 #define IF_HAVE_PG_IDLE(flag,string)
 #endif
 
+#ifdef CONFIG_NON_SWAP
+#define IF_HAVE_PG_NON_SWAP(flag,string) ,{1UL << flag, string}
+#else
+#define IF_HAVE_PG_NON_SWAP(flag,string)
+#endif
+
 #define __def_pageflag_names						\
 	{1UL << PG_locked,		"locked"	},		\
 	{1UL << PG_error,		"error"		},		\
@@ -104,7 +110,8 @@ IF_HAVE_PG_MLOCK(PG_mlocked,		"mlocked"	)		\
 IF_HAVE_PG_UNCACHED(PG_uncached,	"uncached"	)		\
 IF_HAVE_PG_HWPOISON(PG_hwpoison,	"hwpoison"	)		\
 IF_HAVE_PG_IDLE(PG_young,		"young"		)		\
-IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
+IF_HAVE_PG_IDLE(PG_idle,		"idle"		)		\
+IF_HAVE_PG_NON_SWAP(PG_non_swap,	"non_swap"	)
 
 #define show_page_flags(flags)						\
 	(flags) ? __print_flags(flags, "|",				\
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index b7a525a..a7e4153 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -160,6 +160,10 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	const unsigned long mmun_start = addr;
 	const unsigned long mmun_end   = addr + PAGE_SIZE;
 	struct mem_cgroup *memcg;
+	pte_t pte;
+#ifdef CONFIG_NON_SWAP
+	bool non_swap;
+#endif
 
 	err = mem_cgroup_try_charge(kpage, vma->vm_mm, GFP_KERNEL, &memcg,
 			false);
@@ -176,6 +180,11 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 		goto unlock;
 
 	get_page(kpage);
+#ifdef CONFIG_NON_SWAP
+	non_swap = TestClearPageNonSwap(page);
+	if (non_swap)
+		SetPageNonSwap(kpage);
+#endif
 	page_add_new_anon_rmap(kpage, vma, addr, false);
 	mem_cgroup_commit_charge(kpage, memcg, false, false);
 	lru_cache_add_active_or_unevictable(kpage, vma);
@@ -187,7 +196,12 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 
 	flush_cache_page(vma, addr, pte_pfn(*ptep));
 	ptep_clear_flush_notify(vma, addr, ptep);
-	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
+	pte = mk_pte(kpage, vma->vm_page_prot);
+#ifdef CONFIG_NON_SWAP
+	if (non_swap)
+		pte = pte_wrprotect(pte);
+#endif
+	set_pte_at_notify(mm, addr, ptep, pte);
 
 	page_remove_rmap(page, false);
 	if (!page_mapped(page))
diff --git a/mm/Kconfig b/mm/Kconfig
index 57ecdb3..d8d4b41 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -708,3 +708,8 @@ config ARCH_HAS_PKEYS
 config LATE_UNMAP
 	bool
 	depends on SWAP
+
+config NON_SWAP
+	bool
+	depends on SWAP
+	select LATE_UNMAP
diff --git a/mm/memory.c b/mm/memory.c
index 83be99d..2448004 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -64,6 +64,7 @@
 #include <linux/debugfs.h>
 #include <linux/userfaultfd_k.h>
 #include <linux/dax.h>
+#include <linux/mm_inline.h>
 
 #include <asm/io.h>
 #include <asm/mmu_context.h>
@@ -2338,6 +2339,26 @@ static int wp_page_shared(struct fault_env *fe, pte_t orig_pte,
 	return wp_page_reuse(fe, orig_pte, old_page, page_mkwrite, 1);
 }
 
+#ifdef CONFIG_NON_SWAP
+static void
+clear_page_non_swap(struct page *page)
+{
+	struct zone *zone;
+	struct lruvec *lruvec;
+
+	if (!PageLRU(page) || !page_evictable(page))
+		return;
+
+	zone = page_zone(page);
+	spin_lock_irq(zone_lru_lock(zone));
+	__dec_zone_page_state(page, NR_NON_SWAP);
+	lruvec = mem_cgroup_page_lruvec(page, zone->zone_pgdat);
+	del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE);
+	add_page_to_lru_list(page, lruvec, page_lru(page));
+	spin_unlock_irq(zone_lru_lock(zone));
+}
+#endif
+
 /*
  * This routine handles present pages, when users try to write
  * to a shared page. It is done by copying the page to a new address
@@ -2400,6 +2421,10 @@ static int do_wp_page(struct fault_env *fe, pte_t orig_pte)
 			put_page(old_page);
 		}
 		if (reuse_swap_page(old_page, &total_mapcount)) {
+#ifdef CONFIG_NON_SWAP
+			if (unlikely(TestClearPageNonSwap(old_page)))
+				clear_page_non_swap(old_page);
+#endif
 			if (total_mapcount == 1) {
 				/*
 				 * The page is all ours. Move it to
@@ -2581,6 +2606,11 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 		goto out_release;
 	}
 
+#ifdef CONFIG_NON_SWAP
+	if ((fe->flags & FAULT_FLAG_WRITE) && unlikely(TestClearPageNonSwap(page)))
+		clear_page_non_swap(page);
+#endif
+
 	/*
 	 * Make sure try_to_free_swap or reuse_swap_page or swapoff did not
 	 * release the swapcache from under us.  The page pin, and pte_same
@@ -2638,6 +2668,10 @@ int do_swap_page(struct fault_env *fe, pte_t orig_pte)
 	flush_icache_page(vma, page);
 	if (pte_swp_soft_dirty(orig_pte))
 		pte = pte_mksoft_dirty(pte);
+#ifdef CONFIG_NON_SWAP
+	if (!(fe->flags & FAULT_FLAG_WRITE) && PageNonSwap(page))
+		pte = pte_wrprotect(pte);
+#endif
 	set_pte_at(vma->vm_mm, fe->address, fe->pte, pte);
 	if (page == swapcache) {
 		do_page_add_anon_rmap(page, vma, fe->address, exclusive);
diff --git a/mm/migrate.c b/mm/migrate.c
index f7ee04a..46ac926 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -640,6 +640,10 @@ void migrate_page_copy(struct page *newpage, struct page *page)
 		SetPageChecked(newpage);
 	if (PageMappedToDisk(page))
 		SetPageMappedToDisk(newpage);
+#ifdef CONFIG_NON_SWAP
+	if (TestClearPageNonSwap(page))
+		SetPageNonSwap(newpage);
+#endif
 
 	/* Move dirty on pages not done by migrate_page_move_mapping() */
 	if (PageDirty(page))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index a4830f0..6539c6e 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -79,6 +79,9 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 		if (pte_present(oldpte)) {
 			pte_t ptent;
 			bool preserve_write = prot_numa && pte_write(oldpte);
+#ifdef CONFIG_NON_SWAP
+			struct page *page;
+#endif
 
 			/*
 			 * Avoid trapping faults against the zero or KSM
@@ -107,6 +110,11 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
 					 !(vma->vm_flags & VM_SOFTDIRTY))) {
 				ptent = pte_mkwrite(ptent);
 			}
+#ifdef CONFIG_NON_SWAP
+			page = vm_normal_page(vma, addr, oldpte);
+			if (page && PageNonSwap(page))
+				ptent = pte_wrprotect(ptent);
+#endif
 			ptep_modify_prot_commit(mm, addr, pte, ptent);
 			pages++;
 		} else if (IS_ENABLED(CONFIG_MIGRATION)) {
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 32fef7d..14d49cd 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -758,14 +758,38 @@ redo:
 	ClearPageUnevictable(page);
 
 	if (page_evictable(page)) {
+#ifdef CONFIG_NON_SWAP
+		bool added = false;
+
+		if (unlikely(PageNonSwap(page))) {
+			struct zone *zone = page_zone(page);
+
+			BUG_ON(irqs_disabled());
+
+			spin_lock_irq(zone_lru_lock(zone));
+			if (likely(PageNonSwap(page))) {
+				struct lruvec *lruvec;
+
+				lruvec = mem_cgroup_page_lruvec(page,
+							zone->zone_pgdat);
+				SetPageLRU(page);
+				add_page_to_lru_list(page, lruvec,
+						     LRU_UNEVICTABLE);
+				added = true;
+			}
+			spin_unlock_irq(zone_lru_lock(zone));
+		}
+
 		/*
 		 * For evictable pages, we can use the cache.
 		 * In event of a race, worst case is we end up with an
 		 * unevictable page on [in]active list.
 		 * We know how to handle that.
 		 */
+		if (!added)
+#endif
+			lru_cache_add(page);
 		is_unevictable = false;
-		lru_cache_add(page);
 	} else {
 		/*
 		 * Put unevictable pages directly on zone's unevictable
@@ -1199,6 +1223,14 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 					if (PageDirty(page))
 						goto keep_locked;
 
+#ifdef CONFIG_NON_SWAP
+					if (PageNonSwap(page)) {
+						try_to_free_swap(page);
+						unlock_page(page);
+						goto non_swap_keep;
+					}
+#endif
+
 					if (page_mapped(page) && mapping)
 						TRY_TO_UNMAP(page, ttu_flags);
 				}
@@ -1281,6 +1313,9 @@ cull_mlocked:
 		if (PageSwapCache(page))
 			try_to_free_swap(page);
 		unlock_page(page);
+#ifdef CONFIG_NON_SWAP
+		ClearPageNonSwap(page);
+#endif
 		list_add(&page->lru, &ret_pages);
 		continue;
 
@@ -1294,6 +1329,10 @@ activate_locked:
 keep_locked:
 		unlock_page(page);
 keep:
+#ifdef CONFIG_NON_SWAP
+		ClearPageNonSwap(page);
+non_swap_keep:
+#endif
 		list_add(&page->lru, &ret_pages);
 		VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page);
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 3/4] ZRAM: do not swap the page that compressed size bigger than non_swap
  2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
  2016-08-22  8:25 ` [RFC 1/4] vmscan.c: shrink_page_list: unmap anon pages after pageout Hui Zhu
  2016-08-22  8:25 ` [RFC 2/4] Add non-swap page flag to mark a page will not swap Hui Zhu
@ 2016-08-22  8:25 ` Hui Zhu
  2016-08-22  8:25 ` [RFC 4/4] vmscan.c: zram: add non swap support for shmem file pages Hui Zhu
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Hui Zhu @ 2016-08-22  8:25 UTC (permalink / raw)
  To: minchan, ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo,
	peterz, acme, alexander.shishkin, akpm, mhocko, hannes, mgorman,
	vbabka, zhuhui, redkoi, luto, kirill.shutemov, geliangtang,
	baiyaowei, dan.j.williams, vdavydov, aarcange, dvlasenk,
	jmarchan, koct9i, yang.shi, dave.hansen, vkuznets, vitalywool,
	ross.zwisler, tglx, kwapulinski.piotr, axboe, mchristi, joe,
	namit, riel, linux-kernel, linux-mm
  Cc: teawater

New option ZRAM_NON_SWAP add a interface "non_swap" to zram.
User can set a unsigned int value to zram.
If a page that compressed size is bigger than limit, mark it as
non-swap.  Then this page will add to unevictable lru list.

This patch doesn't handle the shmem file pages.

Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
---
 drivers/block/zram/Kconfig    | 11 +++++++++++
 drivers/block/zram/zram_drv.c | 39 +++++++++++++++++++++++++++++++++++++++
 drivers/block/zram/zram_drv.h |  4 ++++
 3 files changed, 54 insertions(+)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index b8ecba6..525caaa 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -13,3 +13,14 @@ config ZRAM
 	  disks and maybe many more.
 
 	  See zram.txt for more information.
+
+config ZRAM_NON_SWAP
+	bool "Enable zram non-swap support"
+	depends on ZRAM
+	select NON_SWAP
+	default n
+	help
+	  This option add a interface "non_swap" to zram.  User can set
+	  a unsigned int value to zram.
+	  If a page that compressed size is bigger than limit, mark it as
+	  non-swap.  Then this page will add to unevictable lru list.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 04365b1..8f7f1ec 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -714,6 +714,14 @@ compress_again:
 		goto out;
 	}
 
+#ifdef CONFIG_ZRAM_NON_SWAP
+	if (!is_partial_io(bvec) && PageAnon(page) &&
+	    zram->non_swap && clen > zram->non_swap) {
+		ret = 0;
+		SetPageNonSwap(page);
+		goto out;
+	}
+#endif
 	src = zstrm->buffer;
 	if (unlikely(clen > max_zpage_size)) {
 		clen = PAGE_SIZE;
@@ -1180,6 +1188,31 @@ static const struct block_device_operations zram_devops = {
 	.owner = THIS_MODULE
 };
 
+#ifdef CONFIG_ZRAM_NON_SWAP
+static ssize_t non_swap_show(struct device *dev,
+			     struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%u\n", zram->non_swap);
+}
+
+static ssize_t non_swap_store(struct device *dev,
+			      struct device_attribute *attr, const char *buf,
+			      size_t len)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	zram->non_swap = (unsigned int)memparse(buf, NULL);
+
+	if (zram->non_swap > max_zpage_size)
+		pr_warn("Nonswap should small than max_zpage_size %zu\n",
+			max_zpage_size);
+
+	return len;
+}
+#endif
+
 static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
@@ -1190,6 +1223,9 @@ static DEVICE_ATTR_RW(mem_limit);
 static DEVICE_ATTR_RW(mem_used_max);
 static DEVICE_ATTR_RW(max_comp_streams);
 static DEVICE_ATTR_RW(comp_algorithm);
+#ifdef CONFIG_ZRAM_NON_SWAP
+static DEVICE_ATTR_RW(non_swap);
+#endif
 
 static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_disksize.attr,
@@ -1210,6 +1246,9 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_mem_used_max.attr,
 	&dev_attr_max_comp_streams.attr,
 	&dev_attr_comp_algorithm.attr,
+#ifdef CONFIG_ZRAM_NON_SWAP
+	&dev_attr_non_swap.attr,
+#endif
 	&dev_attr_io_stat.attr,
 	&dev_attr_mm_stat.attr,
 	&dev_attr_debug_stat.attr,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 74fcf10..bd5f38a 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -119,5 +119,9 @@ struct zram {
 	 * zram is claimed so open request will be failed
 	 */
 	bool claim; /* Protected by bdev->bd_mutex */
+
+#ifdef CONFIG_ZRAM_NON_SWAP
+	unsigned int non_swap;
+#endif
 };
 #endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [RFC 4/4] vmscan.c: zram: add non swap support for shmem file pages
  2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
                   ` (2 preceding siblings ...)
  2016-08-22  8:25 ` [RFC 3/4] ZRAM: do not swap the page that compressed size bigger than non_swap Hui Zhu
@ 2016-08-22  8:25 ` Hui Zhu
  2016-08-24  1:04 ` [RFC 0/4] ZRAM: make it just store the high compression rate page Minchan Kim
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 16+ messages in thread
From: Hui Zhu @ 2016-08-22  8:25 UTC (permalink / raw)
  To: minchan, ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo,
	peterz, acme, alexander.shishkin, akpm, mhocko, hannes, mgorman,
	vbabka, zhuhui, redkoi, luto, kirill.shutemov, geliangtang,
	baiyaowei, dan.j.williams, vdavydov, aarcange, dvlasenk,
	jmarchan, koct9i, yang.shi, dave.hansen, vkuznets, vitalywool,
	ross.zwisler, tglx, kwapulinski.piotr, axboe, mchristi, joe,
	namit, riel, linux-kernel, linux-mm
  Cc: teawater

This patch add the whole support for shmem file pages non swap.
To make sure a page is shmem file page, check mapping->a_ops == &shmem_aops.
I think it is really a hack way.

There are not a lot of shmem file pages will be swapped out.

Signed-off-by: Hui Zhu <zhuhui@xiaomi.com>
---
 drivers/block/zram/zram_drv.c |  3 +-
 include/linux/shmem_fs.h      |  6 ++++
 mm/page_io.c                  |  2 +-
 mm/rmap.c                     |  5 ---
 mm/shmem.c                    | 77 ++++++++++++++++++++++++++++++++++---------
 mm/vmscan.c                   | 27 +++++++++++----
 6 files changed, 89 insertions(+), 31 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 8f7f1ec..914c096 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -715,8 +715,7 @@ compress_again:
 	}
 
 #ifdef CONFIG_ZRAM_NON_SWAP
-	if (!is_partial_io(bvec) && PageAnon(page) &&
-	    zram->non_swap && clen > zram->non_swap) {
+	if (!is_partial_io(bvec) && zram->non_swap && clen > zram->non_swap) {
 		ret = 0;
 		SetPageNonSwap(page);
 		goto out;
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index ff078e7..fd44473 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -124,4 +124,10 @@ static inline bool shmem_huge_enabled(struct vm_area_struct *vma)
 }
 #endif
 
+extern const struct address_space_operations shmem_aops;
+
+#ifdef CONFIG_LATE_UNMAP
+extern void shmem_page_unmap(struct page *page);
+#endif
+
 #endif
diff --git a/mm/page_io.c b/mm/page_io.c
index adaf801..5fd3069 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -238,7 +238,7 @@ int swap_writepage(struct page *page, struct writeback_control *wbc)
 	int ret = 0;
 
 #ifdef CONFIG_LATE_UNMAP
-	if (!(PageAnon(page) && page_mapped(page)))
+	if (!page_mapped(page))
 #endif
 		if (try_to_free_swap(page)) {
 			unlock_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index d484f95..418f731 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1490,13 +1490,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 
 #ifdef CONFIG_LATE_UNMAP
 	if ((flags & TTU_CHECK_DIRTY) || (flags & TTU_READONLY)) {
-		BUG_ON(!PageAnon(page));
-
 		pteval = *pte;
 
-		BUG_ON(pte_write(pteval) &&
-		       page_mapcount(page) + page_swapcount(page) > 1);
-
 		if ((flags & TTU_CHECK_DIRTY) && pte_dirty(pteval)) {
 			set_page_dirty(page);
 			pteval = pte_mkclean(pteval);
diff --git a/mm/shmem.c b/mm/shmem.c
index fd8b2b5..556d853 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -182,7 +182,6 @@ static inline void shmem_unacct_blocks(unsigned long flags, long pages)
 }
 
 static const struct super_operations shmem_ops;
-static const struct address_space_operations shmem_aops;
 static const struct file_operations shmem_file_operations;
 static const struct inode_operations shmem_inode_operations;
 static const struct inode_operations shmem_dir_inode_operations;
@@ -1178,6 +1177,55 @@ out:
 	return error;
 }
 
+#define SHMEM_WRITEPAGE_LOCK				\
+	do {						\
+		mutex_lock(&shmem_swaplist_mutex);	\
+		if (list_empty(&info->swaplist))	\
+			list_add_tail(&info->swaplist,	\
+				      &shmem_swaplist);	\
+	} while (0)
+
+#define SHMEM_WRITEPAGE_SWAP						\
+	do {								\
+		spin_lock(&info->lock);					\
+		shmem_recalc_inode(inode);				\
+		info->swapped++;					\
+		spin_unlock(&info->lock);				\
+		swap_shmem_alloc(swap);					\
+		shmem_delete_from_page_cache(page,			\
+					     swp_to_radix_entry(swap));	\
+	} while (0)
+
+#define SHMEM_WRITEPAGE_UNLOCK				\
+	do {						\
+		mutex_unlock(&shmem_swaplist_mutex);	\
+	} while (0)
+
+#define SHMEM_WRITEPAGE_BUG_ON				\
+	do {						\
+		BUG_ON(page_mapped(page));		\
+	} while (0)
+
+#ifdef CONFIG_LATE_UNMAP
+void
+shmem_page_unmap(struct page *page)
+{
+	struct shmem_inode_info *info;
+	struct address_space *mapping;
+	struct inode *inode;
+	swp_entry_t swap = { .val = page_private(page) };
+
+	mapping = page->mapping;
+	inode = mapping->host;
+	info = SHMEM_I(inode);
+
+	SHMEM_WRITEPAGE_LOCK;
+	SHMEM_WRITEPAGE_SWAP;
+	SHMEM_WRITEPAGE_UNLOCK;
+	SHMEM_WRITEPAGE_BUG_ON;
+}
+#endif
+
 /*
  * Move the page from the page cache to the swap cache.
  */
@@ -1259,26 +1307,23 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	 * we've incremented swapped, because shmem_unuse_inode() will
 	 * prune a !swapped inode from the swaplist under this mutex.
 	 */
-	mutex_lock(&shmem_swaplist_mutex);
-	if (list_empty(&info->swaplist))
-		list_add_tail(&info->swaplist, &shmem_swaplist);
+#ifndef CONFIG_LATE_UNMAP
+	SHMEM_WRITEPAGE_LOCK;
+#endif
 
 	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
-		spin_lock_irq(&info->lock);
-		shmem_recalc_inode(inode);
-		info->swapped++;
-		spin_unlock_irq(&info->lock);
-
-		swap_shmem_alloc(swap);
-		shmem_delete_from_page_cache(page, swp_to_radix_entry(swap));
-
-		mutex_unlock(&shmem_swaplist_mutex);
-		BUG_ON(page_mapped(page));
+#ifndef CONFIG_LATE_UNMAP
+		SHMEM_WRITEPAGE_SWAP;
+		SHMEM_WRITEPAGE_UNLOCK;
+		SHMEM_WRITEPAGE_BUG_ON;
+#endif
 		swap_writepage(page, wbc);
 		return 0;
 	}
 
-	mutex_unlock(&shmem_swaplist_mutex);
+#ifndef CONFIG_LATE_UNMAP
+	SHMEM_WRITEPAGE_UNLOCK;
+#endif
 free_swap:
 	swapcache_free(swap);
 redirty:
@@ -3764,7 +3809,7 @@ static void shmem_destroy_inodecache(void)
 	kmem_cache_destroy(shmem_inode_cachep);
 }
 
-static const struct address_space_operations shmem_aops = {
+const struct address_space_operations shmem_aops = {
 	.writepage	= shmem_writepage,
 	.set_page_dirty	= __set_page_dirty_no_writeback,
 #ifdef CONFIG_TMPFS
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 14d49cd..effb6c4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -54,6 +54,8 @@
 #include <linux/swapops.h>
 #include <linux/balloon_compaction.h>
 
+#include <linux/shmem_fs.h>
+
 #include "internal.h"
 
 #define CREATE_TRACE_POINTS
@@ -492,12 +494,13 @@ void drop_slab(void)
 		drop_slab_node(nid);
 }
 
-static inline int is_page_cache_freeable(struct page *page)
+static inline int is_page_cache_freeable(struct page *page,
+					 struct address_space *mapping)
 {
 	int count = page_count(page) - page_has_private(page);
 
 #ifdef CONFIG_LATE_UNMAP
-	if (PageAnon(page))
+	if (PageAnon(page) || (mapping && mapping->a_ops == &shmem_aops))
 		count -= page_mapcount(page);
 #endif
 
@@ -576,7 +579,7 @@ static pageout_t pageout(struct page *page, struct address_space *mapping,
 	 * swap_backing_dev_info is bust: it doesn't reflect the
 	 * congestion state of the swapdevs.  Easy to fix, if needed.
 	 */
-	if (!is_page_cache_freeable(page))
+	if (!is_page_cache_freeable(page, mapping))
 		return PAGE_KEEP;
 	if (!mapping) {
 		/*
@@ -972,7 +975,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		struct page *page;
 		int may_enter_fs;
 		enum page_references references = PAGEREF_RECLAIM_CLEAN;
-		bool dirty, writeback, anon;
+		bool dirty, writeback, anon, late_unmap;
 		bool lazyfree = false;
 		int ret = SWAP_SUCCESS;
 
@@ -1109,6 +1112,10 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 		}
 
 		anon = PageAnon(page);
+		if (anon)
+			late_unmap = true;
+		else
+			late_unmap = false;
 
 		/*
 		 * Anonymous process memory has backing store?
@@ -1144,13 +1151,16 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			enum ttu_flags l_ttu_flags = ttu_flags;
 
 #ifdef CONFIG_LATE_UNMAP
+			if (mapping->a_ops == &shmem_aops)
+				late_unmap = true;
+
 			/* Hanle the pte_dirty
 			   and change pte to readonly.
 			   Write behavior before unmap will make
 			   pte dirty again.  Then we can check
 			   pte_dirty before unmap to make sure
 			   the page was written or not.  */
-			if (anon)
+			if (late_unmap)
 				l_ttu_flags |= TTU_CHECK_DIRTY | TTU_READONLY;
 #endif
 			TRY_TO_UNMAP(page, l_ttu_flags);
@@ -1211,7 +1221,7 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 					goto keep_locked;
 
 #ifdef CONFIG_LATE_UNMAP
-				if (anon) {
+				if (late_unmap) {
 					if (!PageSwapCache(page))
 						goto keep_locked;
 
@@ -1231,8 +1241,11 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 					}
 #endif
 
-					if (page_mapped(page) && mapping)
+					if (page_mapped(page) && mapping) {
 						TRY_TO_UNMAP(page, ttu_flags);
+						if (!anon)
+							shmem_page_unmap(page);
+					}
 				}
 #endif
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
                   ` (3 preceding siblings ...)
  2016-08-22  8:25 ` [RFC 4/4] vmscan.c: zram: add non swap support for shmem file pages Hui Zhu
@ 2016-08-24  1:04 ` Minchan Kim
  2016-08-24  1:29   ` Hui Zhu
  2016-08-25  6:09 ` Sergey Senozhatsky
  2016-09-05  2:12 ` Minchan Kim
  6 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2016-08-24  1:04 UTC (permalink / raw)
  To: Hui Zhu
  Cc: ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo, peterz,
	acme, alexander.shishkin, akpm, mhocko, hannes, mgorman, vbabka,
	redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler, tglx,
	kwapulinski.piotr, axboe, mchristi, joe, namit, riel,
	linux-kernel, linux-mm, teawater

Hi Hui,

On Mon, Aug 22, 2016 at 04:25:05PM +0800, Hui Zhu wrote:
> Current ZRAM just can store all pages even if the compression rate
> of a page is really low.  So the compression rate of ZRAM is out of
> control when it is running.
> In my part, I did some test and record with ZRAM.  The compression rate
> is about 40%.
> 
> This series of patches make ZRAM can just store the page that the
> compressed size is smaller than a value.
> With these patches, I set the value to 2048 and did the same test with
> before.  The compression rate is about 20%.  The times of lowmemorykiller
> also decreased.

I have an interest about the feature for a long time but didn't work on it
because I didn't have a good idea to implment it with generic approach
without layer violation. I will look into this after handling urgent works.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-08-24  1:04 ` [RFC 0/4] ZRAM: make it just store the high compression rate page Minchan Kim
@ 2016-08-24  1:29   ` Hui Zhu
  0 siblings, 0 replies; 16+ messages in thread
From: Hui Zhu @ 2016-08-24  1:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hui Zhu, ngupta, sergey.senozhatsky.work, Hugh Dickins,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, acme,
	alexander.shishkin, Andrew Morton, mhocko, hannes, mgorman,
	vbabka, redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler,
	Thomas Gleixner, kwapulinski.piotr, axboe, mchristi, Joe Perches,
	namit, Rik van Riel, linux-kernel, Linux Memory Management List

Hi Minchan,

On Wed, Aug 24, 2016 at 9:04 AM, Minchan Kim <minchan@kernel.org> wrote:
> Hi Hui,
>
> On Mon, Aug 22, 2016 at 04:25:05PM +0800, Hui Zhu wrote:
>> Current ZRAM just can store all pages even if the compression rate
>> of a page is really low.  So the compression rate of ZRAM is out of
>> control when it is running.
>> In my part, I did some test and record with ZRAM.  The compression rate
>> is about 40%.
>>
>> This series of patches make ZRAM can just store the page that the
>> compressed size is smaller than a value.
>> With these patches, I set the value to 2048 and did the same test with
>> before.  The compression rate is about 20%.  The times of lowmemorykiller
>> also decreased.
>
> I have an interest about the feature for a long time but didn't work on it
> because I didn't have a good idea to implment it with generic approach
> without layer violation. I will look into this after handling urgent works.
>
> Thanks.

That will be great.  Thanks.

Best,
Hui

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
                   ` (4 preceding siblings ...)
  2016-08-24  1:04 ` [RFC 0/4] ZRAM: make it just store the high compression rate page Minchan Kim
@ 2016-08-25  6:09 ` Sergey Senozhatsky
  2016-08-25  8:25   ` Hui Zhu
  2016-09-05  2:12 ` Minchan Kim
  6 siblings, 1 reply; 16+ messages in thread
From: Sergey Senozhatsky @ 2016-08-25  6:09 UTC (permalink / raw)
  To: Hui Zhu
  Cc: minchan, ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo,
	peterz, acme, alexander.shishkin, akpm, mhocko, hannes, mgorman,
	vbabka, redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler, tglx,
	kwapulinski.piotr, axboe, mchristi, joe, namit, riel,
	linux-kernel, linux-mm, teawater

Hello,

On (08/22/16 16:25), Hui Zhu wrote:
> 
> Current ZRAM just can store all pages even if the compression rate
> of a page is really low.  So the compression rate of ZRAM is out of
> control when it is running.
> In my part, I did some test and record with ZRAM.  The compression rate
> is about 40%.
> 
> This series of patches make ZRAM can just store the page that the
> compressed size is smaller than a value.
> With these patches, I set the value to 2048 and did the same test with
> before.  The compression rate is about 20%.  The times of lowmemorykiller
> also decreased.

I haven't looked at the patches in details yet. can you educate me a bit?
is your test stable? why the number of lowmemorykill-s has decreased?
... or am reading "The times of lowmemorykiller also decreased" wrong?

suppose you have X pages that result in bad compression size (from zram
point of view). zram stores such pages uncompressed, IOW we have no memory
savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
don't try to store those pages in zsmalloc, but keep them as unevictable.
so the page still occupies PAGE_SIZE; no memory saving again. why did it
improve LMK?

	-ss

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-08-25  6:09 ` Sergey Senozhatsky
@ 2016-08-25  8:25   ` Hui Zhu
  2016-09-05  2:18     ` Minchan Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Hui Zhu @ 2016-08-25  8:25 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Hui Zhu, minchan, ngupta, Hugh Dickins, Steven Rostedt,
	Ingo Molnar, Peter Zijlstra, acme, alexander.shishkin,
	Andrew Morton, mhocko, hannes, mgorman, vbabka, redkoi, luto,
	kirill.shutemov, geliangtang, baiyaowei, dan.j.williams,
	vdavydov, aarcange, dvlasenk, jmarchan, koct9i, yang.shi,
	dave.hansen, vkuznets, vitalywool, ross.zwisler, Thomas Gleixner,
	kwapulinski.piotr, axboe, mchristi, Joe Perches, namit,
	Rik van Riel, linux-kernel, Linux Memory Management List

On Thu, Aug 25, 2016 at 2:09 PM, Sergey Senozhatsky
<sergey.senozhatsky.work@gmail.com> wrote:
> Hello,
>
> On (08/22/16 16:25), Hui Zhu wrote:
>>
>> Current ZRAM just can store all pages even if the compression rate
>> of a page is really low.  So the compression rate of ZRAM is out of
>> control when it is running.
>> In my part, I did some test and record with ZRAM.  The compression rate
>> is about 40%.
>>
>> This series of patches make ZRAM can just store the page that the
>> compressed size is smaller than a value.
>> With these patches, I set the value to 2048 and did the same test with
>> before.  The compression rate is about 20%.  The times of lowmemorykiller
>> also decreased.
>
> I haven't looked at the patches in details yet. can you educate me a bit?
> is your test stable? why the number of lowmemorykill-s has decreased?
> ... or am reading "The times of lowmemorykiller also decreased" wrong?
>
> suppose you have X pages that result in bad compression size (from zram
> point of view). zram stores such pages uncompressed, IOW we have no memory
> savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
> don't try to store those pages in zsmalloc, but keep them as unevictable.
> so the page still occupies PAGE_SIZE; no memory saving again. why did it
> improve LMK?

No, zram will not save this page uncompressed with these patches.  It
will set it as non-swap and kick back to shrink_page_list.
Shrink_page_list will remove this page from swapcache and kick it to
unevictable list.
Then this page will not be swaped before it get write.
That is why most of code are around vmscan.c.

Thanks,
Hui

>
>         -ss

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
                   ` (5 preceding siblings ...)
  2016-08-25  6:09 ` Sergey Senozhatsky
@ 2016-09-05  2:12 ` Minchan Kim
  6 siblings, 0 replies; 16+ messages in thread
From: Minchan Kim @ 2016-09-05  2:12 UTC (permalink / raw)
  To: Hui Zhu
  Cc: ngupta, sergey.senozhatsky.work, hughd, rostedt, mingo, peterz,
	acme, alexander.shishkin, akpm, mhocko, hannes, mgorman, vbabka,
	redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler, tglx,
	kwapulinski.piotr, axboe, mchristi, joe, namit, riel,
	linux-kernel, linux-mm, teawater

Hello Hui,

On Mon, Aug 22, 2016 at 04:25:05PM +0800, Hui Zhu wrote:
> Current ZRAM just can store all pages even if the compression rate
> of a page is really low.  So the compression rate of ZRAM is out of
> control when it is running.
> In my part, I did some test and record with ZRAM.  The compression rate
> is about 40%.
> 
> This series of patches make ZRAM can just store the page that the
> compressed size is smaller than a value.
> With these patches, I set the value to 2048 and did the same test with
> before.  The compression rate is about 20%.  The times of lowmemorykiller
> also decreased.
> 
> Hui Zhu (4):
> vmscan.c: shrink_page_list: unmap anon pages after pageout
> Add non-swap page flag to mark a page will not swap
> ZRAM: do not swap the pages that compressed size bigger than non_swap
> vmscan.c: zram: add non swap support for shmem file pages
> 
>  drivers/block/zram/Kconfig     |   11 +++
>  drivers/block/zram/zram_drv.c  |   38 +++++++++++
>  drivers/block/zram/zram_drv.h  |    4 +
>  fs/proc/meminfo.c              |    6 +
>  include/linux/mm_inline.h      |   20 +++++
>  include/linux/mmzone.h         |    3 
>  include/linux/page-flags.h     |    8 ++
>  include/linux/rmap.h           |    5 +
>  include/linux/shmem_fs.h       |    6 +
>  include/trace/events/mmflags.h |    9 ++
>  kernel/events/uprobes.c        |   16 ++++
>  mm/Kconfig                     |    9 ++
>  mm/memory.c                    |   34 ++++++++++
>  mm/migrate.c                   |    4 +
>  mm/mprotect.c                  |    8 ++
>  mm/page_io.c                   |   11 ++-
>  mm/rmap.c                      |   23 ++++++
>  mm/shmem.c                     |   77 +++++++++++++++++-----
>  mm/vmscan.c                    |  139 +++++++++++++++++++++++++++++++++++------
>  19 files changed, 387 insertions(+), 44 deletions(-)

I look over the patchset now and I feel it's really hard to accept
in mainline, unfortunately. Sorry.
It spreads out lots of tricky code in MM for a special usecase
so it's hard to justify, I think.

A thing I can think to avoid no-good-comp-ratio page storing into zram
is that zram can return AOP_WRITEPAGE_ACTIVATE if it found the page is
uncompressible in zram_rw_page so that VM can promote the page to
active LRU. With that, the uncompressible page will have more time to
redirty with hope that it can have compressible data this time.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-08-25  8:25   ` Hui Zhu
@ 2016-09-05  2:18     ` Minchan Kim
  2016-09-05  3:59       ` Sergey Senozhatsky
  2016-09-05  5:12       ` Hui Zhu
  0 siblings, 2 replies; 16+ messages in thread
From: Minchan Kim @ 2016-09-05  2:18 UTC (permalink / raw)
  To: Hui Zhu
  Cc: Sergey Senozhatsky, Hui Zhu, ngupta, Hugh Dickins,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, acme,
	alexander.shishkin, Andrew Morton, mhocko, hannes, mgorman,
	vbabka, redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler,
	Thomas Gleixner, kwapulinski.piotr, axboe, mchristi, Joe Perches,
	namit, Rik van Riel, linux-kernel, Linux Memory Management List

On Thu, Aug 25, 2016 at 04:25:30PM +0800, Hui Zhu wrote:
> On Thu, Aug 25, 2016 at 2:09 PM, Sergey Senozhatsky
> <sergey.senozhatsky.work@gmail.com> wrote:
> > Hello,
> >
> > On (08/22/16 16:25), Hui Zhu wrote:
> >>
> >> Current ZRAM just can store all pages even if the compression rate
> >> of a page is really low.  So the compression rate of ZRAM is out of
> >> control when it is running.
> >> In my part, I did some test and record with ZRAM.  The compression rate
> >> is about 40%.
> >>
> >> This series of patches make ZRAM can just store the page that the
> >> compressed size is smaller than a value.
> >> With these patches, I set the value to 2048 and did the same test with
> >> before.  The compression rate is about 20%.  The times of lowmemorykiller
> >> also decreased.
> >
> > I haven't looked at the patches in details yet. can you educate me a bit?
> > is your test stable? why the number of lowmemorykill-s has decreased?
> > ... or am reading "The times of lowmemorykiller also decreased" wrong?
> >
> > suppose you have X pages that result in bad compression size (from zram
> > point of view). zram stores such pages uncompressed, IOW we have no memory
> > savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
> > don't try to store those pages in zsmalloc, but keep them as unevictable.
> > so the page still occupies PAGE_SIZE; no memory saving again. why did it
> > improve LMK?
> 
> No, zram will not save this page uncompressed with these patches.  It
> will set it as non-swap and kick back to shrink_page_list.
> Shrink_page_list will remove this page from swapcache and kick it to
> unevictable list.
> Then this page will not be swaped before it get write.
> That is why most of code are around vmscan.c.

If I understand Sergey's point right, he means there is no gain
to save memory between before and after.

With your approach, you can prevent unnecessary pageout(i.e.,
uncompressible page swap out) but it doesn't mean you save the
memory compared to old so why does your patch decrease the number of
lowmemory killing?

A thing I can imagine is without this feature, zram could be full of
uncompressible pages so good-compressible page cannot be swapped out.
Hui, is this scenario right for your case?

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-09-05  2:18     ` Minchan Kim
@ 2016-09-05  3:59       ` Sergey Senozhatsky
  2016-09-05  5:12       ` Hui Zhu
  1 sibling, 0 replies; 16+ messages in thread
From: Sergey Senozhatsky @ 2016-09-05  3:59 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hui Zhu, Sergey Senozhatsky, Hui Zhu, ngupta, Hugh Dickins,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, acme,
	alexander.shishkin, Andrew Morton, mhocko, hannes, mgorman,
	vbabka, redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler,
	Thomas Gleixner, kwapulinski.piotr, axboe, mchristi, Joe Perches,
	namit, Rik van Riel, linux-kernel, Linux Memory Management List

Hello,

On (09/05/16 11:18), Minchan Kim wrote:
[..]
> If I understand Sergey's point right, he means there is no gain
> to save memory between before and after.
> 
> With your approach, you can prevent unnecessary pageout(i.e.,
> uncompressible page swap out) but it doesn't mean you save the
> memory compared to old so why does your patch decrease the number of
> lowmemory killing?

you are right Minchan, that was exactly my point. every compressed page
that does not end up in huge_object zspage should result in some memory
saving (somewhere in the range from bytes to kilobytes).

> A thing I can imagine is without this feature, zram could be full of
> uncompressible pages so good-compressible page cannot be swapped out.

a good theory.

in general, a selective compression of N first pages that fall under the
given compression limit is not the same as a selective compression of N
"best" compressible pages. so I'm a bit uncertain about the guarantees
that the patch can provide.

let's assume the following case.
- zram compression size limit set to 2400 bytes (only pages smaller than
  that will be stored in zsmalloc)
- first K pages to swapout have compression size of 2350 +/- 10%
- next L pages have compression size of 2500 +/- 10%
- last M pages are un-compressible - PAGE_SIZE.
- zram disksize can fit N pages
- N > K + L

so instead of compressing and swapping out K + L pages, you would compress
only K pages, leaving (L + M) * PAGE_SIZE untouched. thus I'd say that we
might have bigger chances of LMK/OOM/etc. in some cases.

	-ss

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-09-05  2:18     ` Minchan Kim
  2016-09-05  3:59       ` Sergey Senozhatsky
@ 2016-09-05  5:12       ` Hui Zhu
  2016-09-05  5:51         ` Minchan Kim
  1 sibling, 1 reply; 16+ messages in thread
From: Hui Zhu @ 2016-09-05  5:12 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Hui Zhu, ngupta, Hugh Dickins,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, acme,
	alexander.shishkin, Andrew Morton, mhocko, hannes, mgorman,
	vbabka, redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler,
	Thomas Gleixner, kwapulinski.piotr, axboe, mchristi, Joe Perches,
	namit, Rik van Riel, linux-kernel, Linux Memory Management List

On Mon, Sep 5, 2016 at 10:18 AM, Minchan Kim <minchan@kernel.org> wrote:
> On Thu, Aug 25, 2016 at 04:25:30PM +0800, Hui Zhu wrote:
>> On Thu, Aug 25, 2016 at 2:09 PM, Sergey Senozhatsky
>> <sergey.senozhatsky.work@gmail.com> wrote:
>> > Hello,
>> >
>> > On (08/22/16 16:25), Hui Zhu wrote:
>> >>
>> >> Current ZRAM just can store all pages even if the compression rate
>> >> of a page is really low.  So the compression rate of ZRAM is out of
>> >> control when it is running.
>> >> In my part, I did some test and record with ZRAM.  The compression rate
>> >> is about 40%.
>> >>
>> >> This series of patches make ZRAM can just store the page that the
>> >> compressed size is smaller than a value.
>> >> With these patches, I set the value to 2048 and did the same test with
>> >> before.  The compression rate is about 20%.  The times of lowmemorykiller
>> >> also decreased.
>> >
>> > I haven't looked at the patches in details yet. can you educate me a bit?
>> > is your test stable? why the number of lowmemorykill-s has decreased?
>> > ... or am reading "The times of lowmemorykiller also decreased" wrong?
>> >
>> > suppose you have X pages that result in bad compression size (from zram
>> > point of view). zram stores such pages uncompressed, IOW we have no memory
>> > savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
>> > don't try to store those pages in zsmalloc, but keep them as unevictable.
>> > so the page still occupies PAGE_SIZE; no memory saving again. why did it
>> > improve LMK?
>>
>> No, zram will not save this page uncompressed with these patches.  It
>> will set it as non-swap and kick back to shrink_page_list.
>> Shrink_page_list will remove this page from swapcache and kick it to
>> unevictable list.
>> Then this page will not be swaped before it get write.
>> That is why most of code are around vmscan.c.
>
> If I understand Sergey's point right, he means there is no gain
> to save memory between before and after.
>
> With your approach, you can prevent unnecessary pageout(i.e.,
> uncompressible page swap out) but it doesn't mean you save the
> memory compared to old so why does your patch decrease the number of
> lowmemory killing?
>
> A thing I can imagine is without this feature, zram could be full of
> uncompressible pages so good-compressible page cannot be swapped out.
> Hui, is this scenario right for your case?
>

That is one reason.  But it is not the principal one.

Another reason is when swap is running to put page to zram, what the
system wants is to get memory.
Then the deal is system spends cpu time and memory to get memory. If
the zram just access the high compression rate pages, system can get
more memory with the same amount of memory. It will pull system from
low memory status earlier. (Maybe more cpu time, because the
compression rate checks. But maybe less, because fewer pages need to
digress. That is the interesting part. :)
I think that is why lmk times decrease.

And yes, all of this depends on the number of high compression rate
pages. So you cannot just set a non_swap limit to the system and get
everything. You need to do a lot of test around it to make sure the
non_swap limit is good for your system.

And I think use AOP_WRITEPAGE_ACTIVATE without kicking page to a
special list will make cpu too busy sometimes.
I did some tests before I kick page to a special list. The shrink task
will be moved around, around and around because low compression rate
pages just moved from one list to another a lot of times, again, again
and again.
And all this low compression rate pages always stay together.

Thanks,
Hui


> Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-09-05  5:12       ` Hui Zhu
@ 2016-09-05  5:51         ` Minchan Kim
  2016-09-05  6:02           ` Hui Zhu
  0 siblings, 1 reply; 16+ messages in thread
From: Minchan Kim @ 2016-09-05  5:51 UTC (permalink / raw)
  To: Hui Zhu
  Cc: Sergey Senozhatsky, Hui Zhu, ngupta, Hugh Dickins,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, acme,
	alexander.shishkin, Andrew Morton, mhocko, hannes, mgorman,
	vbabka, redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler,
	Thomas Gleixner, kwapulinski.piotr, axboe, mchristi, Joe Perches,
	namit, Rik van Riel, linux-kernel, Linux Memory Management List

On Mon, Sep 05, 2016 at 01:12:05PM +0800, Hui Zhu wrote:
> On Mon, Sep 5, 2016 at 10:18 AM, Minchan Kim <minchan@kernel.org> wrote:
> > On Thu, Aug 25, 2016 at 04:25:30PM +0800, Hui Zhu wrote:
> >> On Thu, Aug 25, 2016 at 2:09 PM, Sergey Senozhatsky
> >> <sergey.senozhatsky.work@gmail.com> wrote:
> >> > Hello,
> >> >
> >> > On (08/22/16 16:25), Hui Zhu wrote:
> >> >>
> >> >> Current ZRAM just can store all pages even if the compression rate
> >> >> of a page is really low.  So the compression rate of ZRAM is out of
> >> >> control when it is running.
> >> >> In my part, I did some test and record with ZRAM.  The compression rate
> >> >> is about 40%.
> >> >>
> >> >> This series of patches make ZRAM can just store the page that the
> >> >> compressed size is smaller than a value.
> >> >> With these patches, I set the value to 2048 and did the same test with
> >> >> before.  The compression rate is about 20%.  The times of lowmemorykiller
> >> >> also decreased.
> >> >
> >> > I haven't looked at the patches in details yet. can you educate me a bit?
> >> > is your test stable? why the number of lowmemorykill-s has decreased?
> >> > ... or am reading "The times of lowmemorykiller also decreased" wrong?
> >> >
> >> > suppose you have X pages that result in bad compression size (from zram
> >> > point of view). zram stores such pages uncompressed, IOW we have no memory
> >> > savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
> >> > don't try to store those pages in zsmalloc, but keep them as unevictable.
> >> > so the page still occupies PAGE_SIZE; no memory saving again. why did it
> >> > improve LMK?
> >>
> >> No, zram will not save this page uncompressed with these patches.  It
> >> will set it as non-swap and kick back to shrink_page_list.
> >> Shrink_page_list will remove this page from swapcache and kick it to
> >> unevictable list.
> >> Then this page will not be swaped before it get write.
> >> That is why most of code are around vmscan.c.
> >
> > If I understand Sergey's point right, he means there is no gain
> > to save memory between before and after.
> >
> > With your approach, you can prevent unnecessary pageout(i.e.,
> > uncompressible page swap out) but it doesn't mean you save the
> > memory compared to old so why does your patch decrease the number of
> > lowmemory killing?
> >
> > A thing I can imagine is without this feature, zram could be full of
> > uncompressible pages so good-compressible page cannot be swapped out.
> > Hui, is this scenario right for your case?
> >
> 
> That is one reason.  But it is not the principal one.
> 
> Another reason is when swap is running to put page to zram, what the
> system wants is to get memory.
> Then the deal is system spends cpu time and memory to get memory. If
> the zram just access the high compression rate pages, system can get
> more memory with the same amount of memory. It will pull system from
> low memory status earlier. (Maybe more cpu time, because the
> compression rate checks. But maybe less, because fewer pages need to
> digress. That is the interesting part. :)
> I think that is why lmk times decrease.
> 
> And yes, all of this depends on the number of high compression rate
> pages. So you cannot just set a non_swap limit to the system and get
> everything. You need to do a lot of test around it to make sure the
> non_swap limit is good for your system.
> 
> And I think use AOP_WRITEPAGE_ACTIVATE without kicking page to a
> special list will make cpu too busy sometimes.

Yes, and it would same with your patch if new arraival write on CoWed
page is uncompressible data.

> I did some tests before I kick page to a special list. The shrink task

What kinds of test? Could you elaborate a bit more?
shrink task. What does it mean?

> will be moved around, around and around because low compression rate
> pages just moved from one list to another a lot of times, again, again
> and again.
> And all this low compression rate pages always stay together.

I cannot understand with detail description. :(
Could you explain more?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 0/4] ZRAM: make it just store the high compression rate page
  2016-09-05  5:51         ` Minchan Kim
@ 2016-09-05  6:02           ` Hui Zhu
  0 siblings, 0 replies; 16+ messages in thread
From: Hui Zhu @ 2016-09-05  6:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Hui Zhu, ngupta, Hugh Dickins,
	Steven Rostedt, Ingo Molnar, Peter Zijlstra, acme,
	alexander.shishkin, Andrew Morton, mhocko, hannes, mgorman,
	vbabka, redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler,
	Thomas Gleixner, kwapulinski.piotr, axboe, mchristi, Joe Perches,
	namit, Rik van Riel, linux-kernel, Linux Memory Management List

On Mon, Sep 5, 2016 at 1:51 PM, Minchan Kim <minchan@kernel.org> wrote:
> On Mon, Sep 05, 2016 at 01:12:05PM +0800, Hui Zhu wrote:
>> On Mon, Sep 5, 2016 at 10:18 AM, Minchan Kim <minchan@kernel.org> wrote:
>> > On Thu, Aug 25, 2016 at 04:25:30PM +0800, Hui Zhu wrote:
>> >> On Thu, Aug 25, 2016 at 2:09 PM, Sergey Senozhatsky
>> >> <sergey.senozhatsky.work@gmail.com> wrote:
>> >> > Hello,
>> >> >
>> >> > On (08/22/16 16:25), Hui Zhu wrote:
>> >> >>
>> >> >> Current ZRAM just can store all pages even if the compression rate
>> >> >> of a page is really low.  So the compression rate of ZRAM is out of
>> >> >> control when it is running.
>> >> >> In my part, I did some test and record with ZRAM.  The compression rate
>> >> >> is about 40%.
>> >> >>
>> >> >> This series of patches make ZRAM can just store the page that the
>> >> >> compressed size is smaller than a value.
>> >> >> With these patches, I set the value to 2048 and did the same test with
>> >> >> before.  The compression rate is about 20%.  The times of lowmemorykiller
>> >> >> also decreased.
>> >> >
>> >> > I haven't looked at the patches in details yet. can you educate me a bit?
>> >> > is your test stable? why the number of lowmemorykill-s has decreased?
>> >> > ... or am reading "The times of lowmemorykiller also decreased" wrong?
>> >> >
>> >> > suppose you have X pages that result in bad compression size (from zram
>> >> > point of view). zram stores such pages uncompressed, IOW we have no memory
>> >> > savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
>> >> > don't try to store those pages in zsmalloc, but keep them as unevictable.
>> >> > so the page still occupies PAGE_SIZE; no memory saving again. why did it
>> >> > improve LMK?
>> >>
>> >> No, zram will not save this page uncompressed with these patches.  It
>> >> will set it as non-swap and kick back to shrink_page_list.
>> >> Shrink_page_list will remove this page from swapcache and kick it to
>> >> unevictable list.
>> >> Then this page will not be swaped before it get write.
>> >> That is why most of code are around vmscan.c.
>> >
>> > If I understand Sergey's point right, he means there is no gain
>> > to save memory between before and after.
>> >
>> > With your approach, you can prevent unnecessary pageout(i.e.,
>> > uncompressible page swap out) but it doesn't mean you save the
>> > memory compared to old so why does your patch decrease the number of
>> > lowmemory killing?
>> >
>> > A thing I can imagine is without this feature, zram could be full of
>> > uncompressible pages so good-compressible page cannot be swapped out.
>> > Hui, is this scenario right for your case?
>> >
>>
>> That is one reason.  But it is not the principal one.
>>
>> Another reason is when swap is running to put page to zram, what the
>> system wants is to get memory.
>> Then the deal is system spends cpu time and memory to get memory. If
>> the zram just access the high compression rate pages, system can get
>> more memory with the same amount of memory. It will pull system from
>> low memory status earlier. (Maybe more cpu time, because the
>> compression rate checks. But maybe less, because fewer pages need to
>> digress. That is the interesting part. :)
>> I think that is why lmk times decrease.
>>
>> And yes, all of this depends on the number of high compression rate
>> pages. So you cannot just set a non_swap limit to the system and get
>> everything. You need to do a lot of test around it to make sure the
>> non_swap limit is good for your system.
>>
>> And I think use AOP_WRITEPAGE_ACTIVATE without kicking page to a
>> special list will make cpu too busy sometimes.
>
> Yes, and it would same with your patch if new arraival write on CoWed
> page is uncompressible data.
>
>> I did some tests before I kick page to a special list. The shrink task
>
> What kinds of test? Could you elaborate a bit more?
> shrink task. What does it mean?
>



Sorry for this part.  It should be function shrink_page_list.

I will do more test for that and post the patch later.

Thanks,
Hui


>> will be moved around, around and around because low compression rate
>> pages just moved from one list to another a lot of times, again, again
>> and again.
>> And all this low compression rate pages always stay together.
>
> I cannot understand with detail description. :(
> Could you explain more?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [RFC 2/4] Add non-swap page flag to mark a page will not swap
  2016-08-22  8:25 ` [RFC 2/4] Add non-swap page flag to mark a page will not swap Hui Zhu
@ 2016-09-06 15:35   ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2016-09-06 15:35 UTC (permalink / raw)
  To: Hui Zhu
  Cc: minchan, ngupta, sergey.senozhatsky.work, hughd, mingo, peterz,
	acme, alexander.shishkin, akpm, mhocko, hannes, mgorman, vbabka,
	redkoi, luto, kirill.shutemov, geliangtang, baiyaowei,
	dan.j.williams, vdavydov, aarcange, dvlasenk, jmarchan, koct9i,
	yang.shi, dave.hansen, vkuznets, vitalywool, ross.zwisler, tglx,
	kwapulinski.piotr, axboe, mchristi, joe, namit, riel,
	linux-kernel, linux-mm, teawater

On Mon, 22 Aug 2016 16:25:07 +0800
Hui Zhu <zhuhui@xiaomi.com> wrote:

>
> --- a/include/linux/mm_inline.h
> +++ b/include/linux/mm_inline.h
> @@ -46,15 +46,31 @@ static __always_inline void update_lru_size(struct lruvec *lruvec,
>  static __always_inline void add_page_to_lru_list(struct page *page,
>  				struct lruvec *lruvec, enum lru_list lru)
>  {
> -	update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page));
> +	int nr_pages = hpage_nr_pages(page);
> +	enum zone_type zid = page_zonenum(page);
> +#ifdef CONFIG_NON_SWAP
> +	if (PageNonSwap(page)) {

Can't we just have PageNonSwap() return false when CONFIG_NON_SWAP is
not defined, and lose the ugly #ifdef? It will make this much cleaner.

> +		lru = LRU_UNEVICTABLE;
> +		update_lru_size(lruvec, NR_NON_SWAP, zid, nr_pages);
> +	}
> +#endif
> +	update_lru_size(lruvec, lru, zid, nr_pages);
>  	list_add(&page->lru, &lruvec->lists[lru]);
>  }
>  
>  static __always_inline void del_page_from_lru_list(struct page *page,
>  				struct lruvec *lruvec, enum lru_list lru)
>  {
> +	int nr_pages = hpage_nr_pages(page);
> +	enum zone_type zid = page_zonenum(page);
> +#ifdef CONFIG_NON_SWAP
> +	if (PageNonSwap(page)) {
> +		lru = LRU_UNEVICTABLE;
> +		update_lru_size(lruvec, NR_NON_SWAP, zid, -nr_pages);
> +	}
> +#endif
>  	list_del(&page->lru);
> -	update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page));
> +	update_lru_size(lruvec, lru, zid, -nr_pages);
>  }
>  
>  /**
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index d572b78..da08d20 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -138,6 +138,9 @@ enum zone_stat_item {
>  	NUMA_OTHER,		/* allocation from other node */
>  #endif
>  	NR_FREE_CMA_PAGES,
> +#ifdef CONFIG_NON_SWAP
> +	NR_NON_SWAP,
> +#endif

Is it bad to have NR_NON_SWAP defined as an enum if CONFIG_NON_SWAP is
not defined?

>  	NR_VM_ZONE_STAT_ITEMS };
>  
>  enum node_stat_item {
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 74e4dda..0cd80db9 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -105,6 +105,9 @@ enum pageflags {
>  	PG_young,
>  	PG_idle,
>  #endif
> +#ifdef CONFIG_NON_SWAP
> +	PG_non_swap,
> +#endif

Here too.

>  	__NR_PAGEFLAGS,
>  
>  	/* Filesystems */
> @@ -303,6 +306,11 @@ PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
>  PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>  	TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
>  
> +#ifdef CONFIG_NON_SWAP
> +PAGEFLAG(NonSwap, non_swap, PF_NO_TAIL)
> +	TESTSCFLAG(NonSwap, non_swap, PF_NO_TAIL)
> +#endif
> +
>  #ifdef CONFIG_HIGHMEM
>  /*
>   * Must use a macro here due to header dependency issues. page_zone() is not
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index 5a81ab4..1c0ccc9 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -79,6 +79,12 @@
>  #define IF_HAVE_PG_IDLE(flag,string)
>  #endif
>  
> +#ifdef CONFIG_NON_SWAP
> +#define IF_HAVE_PG_NON_SWAP(flag,string) ,{1UL << flag, string}
> +#else
> +#define IF_HAVE_PG_NON_SWAP(flag,string)
> +#endif
> +
>  #define __def_pageflag_names						\
>  	{1UL << PG_locked,		"locked"	},		\
>  	{1UL << PG_error,		"error"		},		\
> @@ -104,7 +110,8 @@ IF_HAVE_PG_MLOCK(PG_mlocked,		"mlocked"	)		\
>  IF_HAVE_PG_UNCACHED(PG_uncached,	"uncached"	)		\
>  IF_HAVE_PG_HWPOISON(PG_hwpoison,	"hwpoison"	)		\
>  IF_HAVE_PG_IDLE(PG_young,		"young"		)		\
> -IF_HAVE_PG_IDLE(PG_idle,		"idle"		)
> +IF_HAVE_PG_IDLE(PG_idle,		"idle"		)		\
> +IF_HAVE_PG_NON_SWAP(PG_non_swap,	"non_swap"	)
>  
>  #define show_page_flags(flags)						\
>  	(flags) ? __print_flags(flags, "|",				\
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index b7a525a..a7e4153 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -160,6 +160,10 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  	const unsigned long mmun_start = addr;
>  	const unsigned long mmun_end   = addr + PAGE_SIZE;
>  	struct mem_cgroup *memcg;
> +	pte_t pte;
> +#ifdef CONFIG_NON_SWAP
> +	bool non_swap;
> +#endif
>  
>  	err = mem_cgroup_try_charge(kpage, vma->vm_mm, GFP_KERNEL, &memcg,
>  			false);
> @@ -176,6 +180,11 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  		goto unlock;
>  
>  	get_page(kpage);
> +#ifdef CONFIG_NON_SWAP
> +	non_swap = TestClearPageNonSwap(page);

Can't we have TestClearPageNonSwap() return false when CONFIG_NON_SWAP
is not defined, and lose the ugly #ifdefs here in the code?

> +	if (non_swap)
> +		SetPageNonSwap(kpage);

Make SetPageNonSwap() a nop (or warning) if CONFIG_NON_SWAP is not
defined.

> +#endif
>  	page_add_new_anon_rmap(kpage, vma, addr, false);
>  	mem_cgroup_commit_charge(kpage, memcg, false, false);
>  	lru_cache_add_active_or_unevictable(kpage, vma);
> @@ -187,7 +196,12 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  
>  	flush_cache_page(vma, addr, pte_pfn(*ptep));
>  	ptep_clear_flush_notify(vma, addr, ptep);
> -	set_pte_at_notify(mm, addr, ptep, mk_pte(kpage, vma->vm_page_prot));
> +	pte = mk_pte(kpage, vma->vm_page_prot);
> +#ifdef CONFIG_NON_SWAP
> +	if (non_swap)
> +		pte = pte_wrprotect(pte);
> +#endif

Again, I hate the added #ifdef in code, when we can have stub functions
make non_swap false.

A lot of the #ifdef's can be nuked with proper stub functions, which
makes maintaining and reviewing the code much easier.

-- Steve

> +	set_pte_at_notify(mm, addr, ptep, pte);
>  
>  	page_remove_rmap(page, false);
>  	if (!page_mapped(page))
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 57ecdb3..d8d4b41 100644
> --- a/mm/Kconfig

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-09-06 15:35 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22  8:25 [RFC 0/4] ZRAM: make it just store the high compression rate page Hui Zhu
2016-08-22  8:25 ` [RFC 1/4] vmscan.c: shrink_page_list: unmap anon pages after pageout Hui Zhu
2016-08-22  8:25 ` [RFC 2/4] Add non-swap page flag to mark a page will not swap Hui Zhu
2016-09-06 15:35   ` Steven Rostedt
2016-08-22  8:25 ` [RFC 3/4] ZRAM: do not swap the page that compressed size bigger than non_swap Hui Zhu
2016-08-22  8:25 ` [RFC 4/4] vmscan.c: zram: add non swap support for shmem file pages Hui Zhu
2016-08-24  1:04 ` [RFC 0/4] ZRAM: make it just store the high compression rate page Minchan Kim
2016-08-24  1:29   ` Hui Zhu
2016-08-25  6:09 ` Sergey Senozhatsky
2016-08-25  8:25   ` Hui Zhu
2016-09-05  2:18     ` Minchan Kim
2016-09-05  3:59       ` Sergey Senozhatsky
2016-09-05  5:12       ` Hui Zhu
2016-09-05  5:51         ` Minchan Kim
2016-09-05  6:02           ` Hui Zhu
2016-09-05  2:12 ` Minchan Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).