All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
@ 2018-11-07 19:16 Yang Shi
  2018-11-07 19:16 ` [PATCH 2/2] mm: ksm: do not block on page lock when searching stable tree Yang Shi
  2018-12-20 22:45 ` [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low Andrew Morton
  0 siblings, 2 replies; 9+ messages in thread
From: Yang Shi @ 2018-11-07 19:16 UTC (permalink / raw)
  To: mhocko, vbabka, hannes, hughd, akpm; +Cc: yang.shi, linux-mm, linux-kernel

When running some stress test, we ran into the below hung issue
occasionally:

INFO: task ksmd:205 blocked for more than 360 seconds.
      Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ksmd            D    0   205      2 0x00000000
 ffff882fa00418c0 0000000000000000 ffff882fa4b10000 ffff882fbf059d00
 ffff882fa5bc1800 ffffc900190c7c28 ffffffff81725e58 ffffffff810777c0
 00ffc900190c7c88 ffff882fbf059d00 ffffffff8138cc09 ffff882fa4b10000
Call Trace:
 [<ffffffff81725e58>] ? __schedule+0x258/0x720
 [<ffffffff810777c0>] ? do_flush_tlb_all+0x30/0x30
 [<ffffffff8138cc09>] ? free_cpumask_var+0x9/0x10
 [<ffffffff81726356>] schedule+0x36/0x80
 [<ffffffff81729916>] schedule_timeout+0x206/0x4b0
 [<ffffffff81077d0f>] ? native_flush_tlb_others+0x11f/0x180
 [<ffffffff8110ca40>] ? ktime_get+0x40/0xb0
 [<ffffffff81725b6a>] io_schedule_timeout+0xda/0x170
 [<ffffffff81726c50>] ? bit_wait+0x60/0x60
 [<ffffffff81726c6b>] bit_wait_io+0x1b/0x60
 [<ffffffff81726759>] __wait_on_bit_lock+0x59/0xc0
 [<ffffffff811aff76>] __lock_page+0x86/0xa0
 [<ffffffff810d53e0>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffff8121a269>] ksm_scan_thread+0xeb9/0x1430
 [<ffffffff810d5340>] ? prepare_to_wait_event+0x100/0x100
 [<ffffffff812193b0>] ? try_to_merge_with_ksm_page+0x850/0x850
 [<ffffffff810ac226>] kthread+0xe6/0x100
 [<ffffffff810ac140>] ? kthread_park+0x60/0x60
 [<ffffffff8172b196>] ret_from_fork+0x46/0x60

ksmd found the suitable KSM page on the stable tree, an is trying to
lock it. But, it is locked by direct reclaim path when walking its rmap
to get the number of referenced PTEs.

The KSM page rmap walk need iterate all rmap_item of the page and all
rmap anon_vma of each rmap_item. So, it may take (# rmap_item * #
children processes) loops. The number of loop might be very big in the
worst case, and may take long time.

Typically, direct reclaim will not intend to reclaim too many pages, and
it is latency sensitive. So, it sounds not worth doing the long ksm page
rmap walk to just reclaim one page.

Skip KSM page in direct reclaim if the reclaim priority is low, but
still try to reclaim KSM page with high priority.

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 mm/vmscan.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 62ac0c48..e821ad3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1260,8 +1260,17 @@ static unsigned long shrink_page_list(struct list_head *page_list,
 			}
 		}
 
-		if (!force_reclaim)
-			references = page_check_references(page, sc);
+		if (!force_reclaim) {
+			/*
+			 * Don't try to reclaim KSM page in direct reclaim if
+			 * the priority is not high enough.
+			 */
+			if (PageKsm(page) && !current_is_kswapd() &&
+			    sc->priority > (DEF_PRIORITY - 2))
+				references = PAGEREF_KEEP;
+			else
+				references = page_check_references(page, sc);
+		}
 
 		switch (references) {
 		case PAGEREF_ACTIVATE:
@@ -2136,6 +2145,16 @@ static void shrink_active_list(unsigned long nr_to_scan,
 			}
 		}
 
+		/*
+		 * Skip KSM page in direct reclaim if priority is not
+		 * high enough.
+		 */
+		if (PageKsm(page) && !current_is_kswapd() &&
+		    sc->priority > (DEF_PRIORITY - 2)) {
+			putback_lru_page(page);
+			continue;
+		}
+
 		if (page_referenced(page, 0, sc->target_mem_cgroup,
 				    &vm_flags)) {
 			nr_rotated += hpage_nr_pages(page);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] mm: ksm: do not block on page lock when searching stable tree
  2018-11-07 19:16 [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low Yang Shi
@ 2018-11-07 19:16 ` Yang Shi
  2018-11-23  7:03   ` Kirill Tkhai
  2018-12-20 22:45 ` [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low Andrew Morton
  1 sibling, 1 reply; 9+ messages in thread
From: Yang Shi @ 2018-11-07 19:16 UTC (permalink / raw)
  To: mhocko, vbabka, hannes, hughd, akpm; +Cc: yang.shi, linux-mm, linux-kernel

ksmd need search stable tree to look for the suitable KSM page, but the
KSM page might be locked for long time due to i.e. KSM page rmap walk.

It sounds not worth waiting for the lock, the page can be skip, then try
to merge it in the next scan to avoid long stall if its content is
still intact.

Introduce async mode to get_ksm_page() to not block on page lock, like
what try_to_merge_one_page() does.

Return -EBUSY if trylock fails, since NULL means not find suitable KSM
page, which is a valid case.

Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 mm/ksm.c | 29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 5b0894b..576803d 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -667,7 +667,7 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
 }
 
 /*
- * get_ksm_page: checks if the page indicated by the stable node
+ * __get_ksm_page: checks if the page indicated by the stable node
  * is still its ksm page, despite having held no reference to it.
  * In which case we can trust the content of the page, and it
  * returns the gotten page; but if the page has now been zapped,
@@ -685,7 +685,8 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
  * a page to put something that might look like our key in page->mapping.
  * is on its way to being freed; but it is an anomaly to bear in mind.
  */
-static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
+static struct page *__get_ksm_page(struct stable_node *stable_node,
+				   bool lock_it, bool async)
 {
 	struct page *page;
 	void *expected_mapping;
@@ -728,7 +729,14 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	}
 
 	if (lock_it) {
-		lock_page(page);
+		if (async) {
+			if (!trylock_page(page)) {
+				put_page(page);
+				return ERR_PTR(-EBUSY);
+			}
+		} else
+			lock_page(page);
+
 		if (READ_ONCE(page->mapping) != expected_mapping) {
 			unlock_page(page);
 			put_page(page);
@@ -751,6 +759,11 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
 	return NULL;
 }
 
+static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
+{
+	return __get_ksm_page(stable_node, lock_it, false);
+}
+
 /*
  * Removing rmap_item from stable or unstable tree.
  * This function will clean the information from the stable/unstable tree.
@@ -1675,7 +1688,11 @@ static struct page *stable_tree_search(struct page *page)
 			 * It would be more elegant to return stable_node
 			 * than kpage, but that involves more changes.
 			 */
-			tree_page = get_ksm_page(stable_node_dup, true);
+			tree_page = __get_ksm_page(stable_node_dup, true, true);
+
+			if (PTR_ERR(tree_page) == -EBUSY)
+				return ERR_PTR(-EBUSY);
+
 			if (unlikely(!tree_page))
 				/*
 				 * The tree may have been rebalanced,
@@ -2062,6 +2079,10 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
 
 	/* We first start with searching the page inside the stable tree */
 	kpage = stable_tree_search(page);
+
+	if (PTR_ERR(kpage) == -EBUSY)
+		return;
+
 	if (kpage == page && rmap_item->head == stable_node) {
 		put_page(kpage);
 		return;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] mm: ksm: do not block on page lock when searching stable tree
  2018-11-07 19:16 ` [PATCH 2/2] mm: ksm: do not block on page lock when searching stable tree Yang Shi
@ 2018-11-23  7:03   ` Kirill Tkhai
  0 siblings, 0 replies; 9+ messages in thread
From: Kirill Tkhai @ 2018-11-23  7:03 UTC (permalink / raw)
  To: Yang Shi, mhocko, vbabka, hannes, hughd, akpm; +Cc: linux-mm, linux-kernel

On 07.11.2018 22:16, Yang Shi wrote:
> ksmd need search stable tree to look for the suitable KSM page, but the
> KSM page might be locked for long time due to i.e. KSM page rmap walk.
> 
> It sounds not worth waiting for the lock, the page can be skip, then try
> to merge it in the next scan to avoid long stall if its content is
> still intact.
> 
> Introduce async mode to get_ksm_page() to not block on page lock, like
> what try_to_merge_one_page() does.
> 
> Return -EBUSY if trylock fails, since NULL means not find suitable KSM
> page, which is a valid case.
> 
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>

Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com>

> ---
>  mm/ksm.c | 29 +++++++++++++++++++++++++----
>  1 file changed, 25 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/ksm.c b/mm/ksm.c
> index 5b0894b..576803d 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -667,7 +667,7 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
>  }
>  
>  /*
> - * get_ksm_page: checks if the page indicated by the stable node
> + * __get_ksm_page: checks if the page indicated by the stable node
>   * is still its ksm page, despite having held no reference to it.
>   * In which case we can trust the content of the page, and it
>   * returns the gotten page; but if the page has now been zapped,
> @@ -685,7 +685,8 @@ static void remove_node_from_stable_tree(struct stable_node *stable_node)
>   * a page to put something that might look like our key in page->mapping.
>   * is on its way to being freed; but it is an anomaly to bear in mind.
>   */
> -static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
> +static struct page *__get_ksm_page(struct stable_node *stable_node,
> +				   bool lock_it, bool async)
>  {
>  	struct page *page;
>  	void *expected_mapping;
> @@ -728,7 +729,14 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
>  	}
>  
>  	if (lock_it) {
> -		lock_page(page);
> +		if (async) {
> +			if (!trylock_page(page)) {
> +				put_page(page);
> +				return ERR_PTR(-EBUSY);
> +			}
> +		} else
> +			lock_page(page);
> +
>  		if (READ_ONCE(page->mapping) != expected_mapping) {
>  			unlock_page(page);
>  			put_page(page);
> @@ -751,6 +759,11 @@ static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
>  	return NULL;
>  }
>  
> +static struct page *get_ksm_page(struct stable_node *stable_node, bool lock_it)
> +{
> +	return __get_ksm_page(stable_node, lock_it, false);
> +}
> +
>  /*
>   * Removing rmap_item from stable or unstable tree.
>   * This function will clean the information from the stable/unstable tree.
> @@ -1675,7 +1688,11 @@ static struct page *stable_tree_search(struct page *page)
>  			 * It would be more elegant to return stable_node
>  			 * than kpage, but that involves more changes.
>  			 */
> -			tree_page = get_ksm_page(stable_node_dup, true);
> +			tree_page = __get_ksm_page(stable_node_dup, true, true);
> +
> +			if (PTR_ERR(tree_page) == -EBUSY)
> +				return ERR_PTR(-EBUSY);
> +
>  			if (unlikely(!tree_page))
>  				/*
>  				 * The tree may have been rebalanced,
> @@ -2062,6 +2079,10 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
>  
>  	/* We first start with searching the page inside the stable tree */
>  	kpage = stable_tree_search(page);
> +
> +	if (PTR_ERR(kpage) == -EBUSY)
> +		return;
> +
>  	if (kpage == page && rmap_item->head == stable_node) {
>  		put_page(kpage);
>  		return;
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
  2018-11-07 19:16 [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low Yang Shi
  2018-11-07 19:16 ` [PATCH 2/2] mm: ksm: do not block on page lock when searching stable tree Yang Shi
@ 2018-12-20 22:45 ` Andrew Morton
  2018-12-21  6:04     ` Hugh Dickins
  1 sibling, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2018-12-20 22:45 UTC (permalink / raw)
  To: Yang Shi
  Cc: mhocko, vbabka, hannes, hughd, linux-mm, linux-kernel, Kirill Tkhai


Is anyone interested in reviewing this?  Seems somewhat serious. 
Thanks.

From: Yang Shi <yang.shi@linux.alibaba.com>
Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low

When running a stress test, we occasionally run into the below hang issue:

INFO: task ksmd:205 blocked for more than 360 seconds.
      Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ksmd            D    0   205      2 0x00000000
 ffff882fa00418c0 0000000000000000 ffff882fa4b10000 ffff882fbf059d00
 ffff882fa5bc1800 ffffc900190c7c28 ffffffff81725e58 ffffffff810777c0
 00ffc900190c7c88 ffff882fbf059d00 ffffffff8138cc09 ffff882fa4b10000
Call Trace:
 [<ffffffff81725e58>] ? __schedule+0x258/0x720
 [<ffffffff810777c0>] ? do_flush_tlb_all+0x30/0x30
 [<ffffffff8138cc09>] ? free_cpumask_var+0x9/0x10
 [<ffffffff81726356>] schedule+0x36/0x80
 [<ffffffff81729916>] schedule_timeout+0x206/0x4b0
 [<ffffffff81077d0f>] ? native_flush_tlb_others+0x11f/0x180
 [<ffffffff8110ca40>] ? ktime_get+0x40/0xb0
 [<ffffffff81725b6a>] io_schedule_timeout+0xda/0x170
 [<ffffffff81726c50>] ? bit_wait+0x60/0x60
 [<ffffffff81726c6b>] bit_wait_io+0x1b/0x60
 [<ffffffff81726759>] __wait_on_bit_lock+0x59/0xc0
 [<ffffffff811aff76>] __lock_page+0x86/0xa0
 [<ffffffff810d53e0>] ? wake_atomic_t_function+0x60/0x60
 [<ffffffff8121a269>] ksm_scan_thread+0xeb9/0x1430
 [<ffffffff810d5340>] ? prepare_to_wait_event+0x100/0x100
 [<ffffffff812193b0>] ? try_to_merge_with_ksm_page+0x850/0x850
 [<ffffffff810ac226>] kthread+0xe6/0x100
 [<ffffffff810ac140>] ? kthread_park+0x60/0x60
 [<ffffffff8172b196>] ret_from_fork+0x46/0x60

ksmd found a suitable KSM page on the stable tree and is trying to lock
it.  But it is locked by the direct reclaim path which is walking the
page's rmap to get the number of referenced PTEs.

The KSM page rmap walk needs to iterate all rmap_items of the page and all
rmap anon_vmas of each rmap_item.  So it may take (# rmap_item * #
children processes) loops.  This number of loops might be very large in
the worst case, and may take a long time.

Typically, direct reclaim will not intend to reclaim too many pages, and
it is latency sensitive.  So it is not worth doing the long ksm page rmap
walk to reclaim just one page.

Skip KSM pages in direct reclaim if the reclaim priority is low, but still
try to reclaim KSM pages with high priority.

Link: http://lkml.kernel.org/r/1541618201-120667-1-git-send-email-yang.shi@linux.alibaba.com
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/vmscan.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

--- a/mm/vmscan.c~mm-vmscan-skip-ksm-page-in-direct-reclaim-if-priority-is-low
+++ a/mm/vmscan.c
@@ -1260,8 +1260,17 @@ static unsigned long shrink_page_list(st
 			}
 		}
 
-		if (!force_reclaim)
-			references = page_check_references(page, sc);
+		if (!force_reclaim) {
+			/*
+			 * Don't try to reclaim KSM page in direct reclaim if
+			 * the priority is not high enough.
+			 */
+			if (PageKsm(page) && !current_is_kswapd() &&
+			    sc->priority > (DEF_PRIORITY - 2))
+				references = PAGEREF_KEEP;
+			else
+				references = page_check_references(page, sc);
+		}
 
 		switch (references) {
 		case PAGEREF_ACTIVATE:
@@ -2136,6 +2145,16 @@ static void shrink_active_list(unsigned
 			}
 		}
 
+		/*
+		 * Skip KSM page in direct reclaim if priority is not
+		 * high enough.
+		 */
+		if (PageKsm(page) && !current_is_kswapd() &&
+		    sc->priority > (DEF_PRIORITY - 2)) {
+			putback_lru_page(page);
+			continue;
+		}
+
 		if (page_referenced(page, 0, sc->target_mem_cgroup,
 				    &vm_flags)) {
 			nr_rotated += hpage_nr_pages(page);
_


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
@ 2018-12-21  6:04     ` Hugh Dickins
  0 siblings, 0 replies; 9+ messages in thread
From: Hugh Dickins @ 2018-12-21  6:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yang Shi, mhocko, vbabka, hannes, hughd, linux-mm, linux-kernel,
	Kirill Tkhai, Andrea Arcangeli

On Thu, 20 Dec 2018, Andrew Morton wrote:
> 
> Is anyone interested in reviewing this?  Seems somewhat serious. 
> Thanks.

Somewhat serious, but no need to rush.

> 
> From: Yang Shi <yang.shi@linux.alibaba.com>
> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low
> 
> When running a stress test, we occasionally run into the below hang issue:

Artificial load presumably.

> 
> INFO: task ksmd:205 blocked for more than 360 seconds.
>       Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1

4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae
("ksm: introduce ksm_max_page_sharing per page deduplication limit").

The patch below is more economical than Andrea's, but I don't think
a second workaround should be added, unless Andrea's is shown to be
insufficient, even with its ksm_max_page_sharing tuned down to suit.

Yang, please try to reproduce on upstream, or backport Andrea's to
4.9-stable - thanks.

Hugh

> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ksmd            D    0   205      2 0x00000000
>  ffff882fa00418c0 0000000000000000 ffff882fa4b10000 ffff882fbf059d00
>  ffff882fa5bc1800 ffffc900190c7c28 ffffffff81725e58 ffffffff810777c0
>  00ffc900190c7c88 ffff882fbf059d00 ffffffff8138cc09 ffff882fa4b10000
> Call Trace:
>  [<ffffffff81725e58>] ? __schedule+0x258/0x720
>  [<ffffffff810777c0>] ? do_flush_tlb_all+0x30/0x30
>  [<ffffffff8138cc09>] ? free_cpumask_var+0x9/0x10
>  [<ffffffff81726356>] schedule+0x36/0x80
>  [<ffffffff81729916>] schedule_timeout+0x206/0x4b0
>  [<ffffffff81077d0f>] ? native_flush_tlb_others+0x11f/0x180
>  [<ffffffff8110ca40>] ? ktime_get+0x40/0xb0
>  [<ffffffff81725b6a>] io_schedule_timeout+0xda/0x170
>  [<ffffffff81726c50>] ? bit_wait+0x60/0x60
>  [<ffffffff81726c6b>] bit_wait_io+0x1b/0x60
>  [<ffffffff81726759>] __wait_on_bit_lock+0x59/0xc0
>  [<ffffffff811aff76>] __lock_page+0x86/0xa0
>  [<ffffffff810d53e0>] ? wake_atomic_t_function+0x60/0x60
>  [<ffffffff8121a269>] ksm_scan_thread+0xeb9/0x1430
>  [<ffffffff810d5340>] ? prepare_to_wait_event+0x100/0x100
>  [<ffffffff812193b0>] ? try_to_merge_with_ksm_page+0x850/0x850
>  [<ffffffff810ac226>] kthread+0xe6/0x100
>  [<ffffffff810ac140>] ? kthread_park+0x60/0x60
>  [<ffffffff8172b196>] ret_from_fork+0x46/0x60
> 
> ksmd found a suitable KSM page on the stable tree and is trying to lock
> it.  But it is locked by the direct reclaim path which is walking the
> page's rmap to get the number of referenced PTEs.
> 
> The KSM page rmap walk needs to iterate all rmap_items of the page and all
> rmap anon_vmas of each rmap_item.  So it may take (# rmap_item * #
> children processes) loops.  This number of loops might be very large in
> the worst case, and may take a long time.
> 
> Typically, direct reclaim will not intend to reclaim too many pages, and
> it is latency sensitive.  So it is not worth doing the long ksm page rmap
> walk to reclaim just one page.
> 
> Skip KSM pages in direct reclaim if the reclaim priority is low, but still
> try to reclaim KSM pages with high priority.
> 
> Link: http://lkml.kernel.org/r/1541618201-120667-1-git-send-email-yang.shi@linux.alibaba.com
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/vmscan.c |   23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> --- a/mm/vmscan.c~mm-vmscan-skip-ksm-page-in-direct-reclaim-if-priority-is-low
> +++ a/mm/vmscan.c
> @@ -1260,8 +1260,17 @@ static unsigned long shrink_page_list(st
>  			}
>  		}
>  
> -		if (!force_reclaim)
> -			references = page_check_references(page, sc);
> +		if (!force_reclaim) {
> +			/*
> +			 * Don't try to reclaim KSM page in direct reclaim if
> +			 * the priority is not high enough.
> +			 */
> +			if (PageKsm(page) && !current_is_kswapd() &&
> +			    sc->priority > (DEF_PRIORITY - 2))
> +				references = PAGEREF_KEEP;
> +			else
> +				references = page_check_references(page, sc);
> +		}
>  
>  		switch (references) {
>  		case PAGEREF_ACTIVATE:
> @@ -2136,6 +2145,16 @@ static void shrink_active_list(unsigned
>  			}
>  		}
>  
> +		/*
> +		 * Skip KSM page in direct reclaim if priority is not
> +		 * high enough.
> +		 */
> +		if (PageKsm(page) && !current_is_kswapd() &&
> +		    sc->priority > (DEF_PRIORITY - 2)) {
> +			putback_lru_page(page);
> +			continue;
> +		}
> +
>  		if (page_referenced(page, 0, sc->target_mem_cgroup,
>  				    &vm_flags)) {
>  			nr_rotated += hpage_nr_pages(page);
> _

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
@ 2018-12-21  6:04     ` Hugh Dickins
  0 siblings, 0 replies; 9+ messages in thread
From: Hugh Dickins @ 2018-12-21  6:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Yang Shi, mhocko, vbabka, hannes, hughd, linux-mm, linux-kernel,
	Kirill Tkhai, Andrea Arcangeli

On Thu, 20 Dec 2018, Andrew Morton wrote:
> 
> Is anyone interested in reviewing this?  Seems somewhat serious. 
> Thanks.

Somewhat serious, but no need to rush.

> 
> From: Yang Shi <yang.shi@linux.alibaba.com>
> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low
> 
> When running a stress test, we occasionally run into the below hang issue:

Artificial load presumably.

> 
> INFO: task ksmd:205 blocked for more than 360 seconds.
>       Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1

4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae
("ksm: introduce ksm_max_page_sharing per page deduplication limit").

The patch below is more economical than Andrea's, but I don't think
a second workaround should be added, unless Andrea's is shown to be
insufficient, even with its ksm_max_page_sharing tuned down to suit.

Yang, please try to reproduce on upstream, or backport Andrea's to
4.9-stable - thanks.

Hugh

> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> ksmd            D    0   205      2 0x00000000
>  ffff882fa00418c0 0000000000000000 ffff882fa4b10000 ffff882fbf059d00
>  ffff882fa5bc1800 ffffc900190c7c28 ffffffff81725e58 ffffffff810777c0
>  00ffc900190c7c88 ffff882fbf059d00 ffffffff8138cc09 ffff882fa4b10000
> Call Trace:
>  [<ffffffff81725e58>] ? __schedule+0x258/0x720
>  [<ffffffff810777c0>] ? do_flush_tlb_all+0x30/0x30
>  [<ffffffff8138cc09>] ? free_cpumask_var+0x9/0x10
>  [<ffffffff81726356>] schedule+0x36/0x80
>  [<ffffffff81729916>] schedule_timeout+0x206/0x4b0
>  [<ffffffff81077d0f>] ? native_flush_tlb_others+0x11f/0x180
>  [<ffffffff8110ca40>] ? ktime_get+0x40/0xb0
>  [<ffffffff81725b6a>] io_schedule_timeout+0xda/0x170
>  [<ffffffff81726c50>] ? bit_wait+0x60/0x60
>  [<ffffffff81726c6b>] bit_wait_io+0x1b/0x60
>  [<ffffffff81726759>] __wait_on_bit_lock+0x59/0xc0
>  [<ffffffff811aff76>] __lock_page+0x86/0xa0
>  [<ffffffff810d53e0>] ? wake_atomic_t_function+0x60/0x60
>  [<ffffffff8121a269>] ksm_scan_thread+0xeb9/0x1430
>  [<ffffffff810d5340>] ? prepare_to_wait_event+0x100/0x100
>  [<ffffffff812193b0>] ? try_to_merge_with_ksm_page+0x850/0x850
>  [<ffffffff810ac226>] kthread+0xe6/0x100
>  [<ffffffff810ac140>] ? kthread_park+0x60/0x60
>  [<ffffffff8172b196>] ret_from_fork+0x46/0x60
> 
> ksmd found a suitable KSM page on the stable tree and is trying to lock
> it.  But it is locked by the direct reclaim path which is walking the
> page's rmap to get the number of referenced PTEs.
> 
> The KSM page rmap walk needs to iterate all rmap_items of the page and all
> rmap anon_vmas of each rmap_item.  So it may take (# rmap_item * #
> children processes) loops.  This number of loops might be very large in
> the worst case, and may take a long time.
> 
> Typically, direct reclaim will not intend to reclaim too many pages, and
> it is latency sensitive.  So it is not worth doing the long ksm page rmap
> walk to reclaim just one page.
> 
> Skip KSM pages in direct reclaim if the reclaim priority is low, but still
> try to reclaim KSM pages with high priority.
> 
> Link: http://lkml.kernel.org/r/1541618201-120667-1-git-send-email-yang.shi@linux.alibaba.com
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/vmscan.c |   23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> --- a/mm/vmscan.c~mm-vmscan-skip-ksm-page-in-direct-reclaim-if-priority-is-low
> +++ a/mm/vmscan.c
> @@ -1260,8 +1260,17 @@ static unsigned long shrink_page_list(st
>  			}
>  		}
>  
> -		if (!force_reclaim)
> -			references = page_check_references(page, sc);
> +		if (!force_reclaim) {
> +			/*
> +			 * Don't try to reclaim KSM page in direct reclaim if
> +			 * the priority is not high enough.
> +			 */
> +			if (PageKsm(page) && !current_is_kswapd() &&
> +			    sc->priority > (DEF_PRIORITY - 2))
> +				references = PAGEREF_KEEP;
> +			else
> +				references = page_check_references(page, sc);
> +		}
>  
>  		switch (references) {
>  		case PAGEREF_ACTIVATE:
> @@ -2136,6 +2145,16 @@ static void shrink_active_list(unsigned
>  			}
>  		}
>  
> +		/*
> +		 * Skip KSM page in direct reclaim if priority is not
> +		 * high enough.
> +		 */
> +		if (PageKsm(page) && !current_is_kswapd() &&
> +		    sc->priority > (DEF_PRIORITY - 2)) {
> +			putback_lru_page(page);
> +			continue;
> +		}
> +
>  		if (page_referenced(page, 0, sc->target_mem_cgroup,
>  				    &vm_flags)) {
>  			nr_rotated += hpage_nr_pages(page);
> _


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
  2018-12-21  6:04     ` Hugh Dickins
  (?)
@ 2018-12-21  6:33     ` Yang Shi
  2018-12-21 14:01       ` Andrea Arcangeli
  -1 siblings, 1 reply; 9+ messages in thread
From: Yang Shi @ 2018-12-21  6:33 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton
  Cc: mhocko, vbabka, hannes, linux-mm, linux-kernel, Kirill Tkhai,
	Andrea Arcangeli



On 12/20/18 10:04 PM, Hugh Dickins wrote:
> On Thu, 20 Dec 2018, Andrew Morton wrote:
>> Is anyone interested in reviewing this?  Seems somewhat serious.
>> Thanks.
> Somewhat serious, but no need to rush.
>
>> From: Yang Shi <yang.shi@linux.alibaba.com>
>> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low
>>
>> When running a stress test, we occasionally run into the below hang issue:
> Artificial load presumably.
>
>> INFO: task ksmd:205 blocked for more than 360 seconds.
>>        Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1
> 4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae
> ("ksm: introduce ksm_max_page_sharing per page deduplication limit").
>
> The patch below is more economical than Andrea's, but I don't think
> a second workaround should be added, unless Andrea's is shown to be
> insufficient, even with its ksm_max_page_sharing tuned down to suit.
>
> Yang, please try to reproduce on upstream, or backport Andrea's to
> 4.9-stable - thanks.

I believe Andrea's commit could workaround this problem too by limiting 
the number of sharing pages.

However, IMHO, even though we just have a few hundred pages share one 
KSM page, it still sounds not worth reclaiming it in direct reclaim in 
low priority. According to Andrea's commit log, it still takes a few 
msec to walk the rmap for 256 shared pages.

Thanks,
Yang

>
> Hugh
>
>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> ksmd            D    0   205      2 0x00000000
>>   ffff882fa00418c0 0000000000000000 ffff882fa4b10000 ffff882fbf059d00
>>   ffff882fa5bc1800 ffffc900190c7c28 ffffffff81725e58 ffffffff810777c0
>>   00ffc900190c7c88 ffff882fbf059d00 ffffffff8138cc09 ffff882fa4b10000
>> Call Trace:
>>   [<ffffffff81725e58>] ? __schedule+0x258/0x720
>>   [<ffffffff810777c0>] ? do_flush_tlb_all+0x30/0x30
>>   [<ffffffff8138cc09>] ? free_cpumask_var+0x9/0x10
>>   [<ffffffff81726356>] schedule+0x36/0x80
>>   [<ffffffff81729916>] schedule_timeout+0x206/0x4b0
>>   [<ffffffff81077d0f>] ? native_flush_tlb_others+0x11f/0x180
>>   [<ffffffff8110ca40>] ? ktime_get+0x40/0xb0
>>   [<ffffffff81725b6a>] io_schedule_timeout+0xda/0x170
>>   [<ffffffff81726c50>] ? bit_wait+0x60/0x60
>>   [<ffffffff81726c6b>] bit_wait_io+0x1b/0x60
>>   [<ffffffff81726759>] __wait_on_bit_lock+0x59/0xc0
>>   [<ffffffff811aff76>] __lock_page+0x86/0xa0
>>   [<ffffffff810d53e0>] ? wake_atomic_t_function+0x60/0x60
>>   [<ffffffff8121a269>] ksm_scan_thread+0xeb9/0x1430
>>   [<ffffffff810d5340>] ? prepare_to_wait_event+0x100/0x100
>>   [<ffffffff812193b0>] ? try_to_merge_with_ksm_page+0x850/0x850
>>   [<ffffffff810ac226>] kthread+0xe6/0x100
>>   [<ffffffff810ac140>] ? kthread_park+0x60/0x60
>>   [<ffffffff8172b196>] ret_from_fork+0x46/0x60
>>
>> ksmd found a suitable KSM page on the stable tree and is trying to lock
>> it.  But it is locked by the direct reclaim path which is walking the
>> page's rmap to get the number of referenced PTEs.
>>
>> The KSM page rmap walk needs to iterate all rmap_items of the page and all
>> rmap anon_vmas of each rmap_item.  So it may take (# rmap_item * #
>> children processes) loops.  This number of loops might be very large in
>> the worst case, and may take a long time.
>>
>> Typically, direct reclaim will not intend to reclaim too many pages, and
>> it is latency sensitive.  So it is not worth doing the long ksm page rmap
>> walk to reclaim just one page.
>>
>> Skip KSM pages in direct reclaim if the reclaim priority is low, but still
>> try to reclaim KSM pages with high priority.
>>
>> Link: http://lkml.kernel.org/r/1541618201-120667-1-git-send-email-yang.shi@linux.alibaba.com
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>> Cc: Hugh Dickins <hughd@google.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>> ---
>>
>>   mm/vmscan.c |   23 +++++++++++++++++++++--
>>   1 file changed, 21 insertions(+), 2 deletions(-)
>>
>> --- a/mm/vmscan.c~mm-vmscan-skip-ksm-page-in-direct-reclaim-if-priority-is-low
>> +++ a/mm/vmscan.c
>> @@ -1260,8 +1260,17 @@ static unsigned long shrink_page_list(st
>>   			}
>>   		}
>>   
>> -		if (!force_reclaim)
>> -			references = page_check_references(page, sc);
>> +		if (!force_reclaim) {
>> +			/*
>> +			 * Don't try to reclaim KSM page in direct reclaim if
>> +			 * the priority is not high enough.
>> +			 */
>> +			if (PageKsm(page) && !current_is_kswapd() &&
>> +			    sc->priority > (DEF_PRIORITY - 2))
>> +				references = PAGEREF_KEEP;
>> +			else
>> +				references = page_check_references(page, sc);
>> +		}
>>   
>>   		switch (references) {
>>   		case PAGEREF_ACTIVATE:
>> @@ -2136,6 +2145,16 @@ static void shrink_active_list(unsigned
>>   			}
>>   		}
>>   
>> +		/*
>> +		 * Skip KSM page in direct reclaim if priority is not
>> +		 * high enough.
>> +		 */
>> +		if (PageKsm(page) && !current_is_kswapd() &&
>> +		    sc->priority > (DEF_PRIORITY - 2)) {
>> +			putback_lru_page(page);
>> +			continue;
>> +		}
>> +
>>   		if (page_referenced(page, 0, sc->target_mem_cgroup,
>>   				    &vm_flags)) {
>>   			nr_rotated += hpage_nr_pages(page);
>> _


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
  2018-12-21  6:33     ` Yang Shi
@ 2018-12-21 14:01       ` Andrea Arcangeli
  2018-12-22  5:36         ` Yang Shi
  0 siblings, 1 reply; 9+ messages in thread
From: Andrea Arcangeli @ 2018-12-21 14:01 UTC (permalink / raw)
  To: Yang Shi
  Cc: Hugh Dickins, Andrew Morton, mhocko, vbabka, hannes, linux-mm,
	linux-kernel, Kirill Tkhai

Hello Yang,

On Thu, Dec 20, 2018 at 10:33:26PM -0800, Yang Shi wrote:
> 
> 
> On 12/20/18 10:04 PM, Hugh Dickins wrote:
> > On Thu, 20 Dec 2018, Andrew Morton wrote:
> >> Is anyone interested in reviewing this?  Seems somewhat serious.
> >> Thanks.
> > Somewhat serious, but no need to rush.
> >
> >> From: Yang Shi <yang.shi@linux.alibaba.com>
> >> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low
> >>
> >> When running a stress test, we occasionally run into the below hang issue:
> > Artificial load presumably.
> >
> >> INFO: task ksmd:205 blocked for more than 360 seconds.
> >>        Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1
> > 4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae
> > ("ksm: introduce ksm_max_page_sharing per page deduplication limit").
> >
> > The patch below is more economical than Andrea's, but I don't think
> > a second workaround should be added, unless Andrea's is shown to be
> > insufficient, even with its ksm_max_page_sharing tuned down to suit.
> >
> > Yang, please try to reproduce on upstream, or backport Andrea's to
> > 4.9-stable - thanks.

I think it's reasonable to backport it and it should be an easy
backport. Just make sure to backport
b4fecc67cc569b14301f5a1111363d5818b8da5e too which was the only bug
there was in the initial patch and it happened with
"merge_across_nodes = 0" (not the default).

We shipped it in production years ago and it was pretty urgent for
those workloads that initially run into this issue.

> 
> I believe Andrea's commit could workaround this problem too by limiting 
> the number of sharing pages.
> 
> However, IMHO, even though we just have a few hundred pages share one 
> KSM page, it still sounds not worth reclaiming it in direct reclaim in 
> low priority. According to Andrea's commit log, it still takes a few 

You've still to walk the entire chain for compaction and memory
hotplug, otherwise the KSM page becomes practically
unmovable. Allowing the rmap chain to grow to infinitely is still not
ok.

If the page should be reclaimed or not in direct reclaim is already
told by page_referenced(), the more mappings there are the more likely
at least one was touched and has the young bit set in the pte.

> msec to walk the rmap for 256 shared pages.

Those ~2.5msec was in the context of page migration: in the previous
sentence I specified it takes 10usec for the IPI and all other stuff
page migration has to do (which also largely depends on multiple
factors like the total number of CPUs).

page_referenced() doesn't flush the TLB during the rmap walk when it
clears the accessed bit, so it's orders of magnitude faster than the
real page migration at walking the KSM rmap chain.

If the page migration latency of 256 max mappings is a concern the max
sharing can be configured at runtime or the default max sharing can be
reduced to 10 to give a max latency of ~100usec and it would still
give a fairly decent x10 compression ratio. That's a minor detail to
change if that's a concern.

The only difference compared to all other page types is KSM pages can
occasionally merge very aggressively and the apps have no way to limit
the merging or even avoid it. We simply can't ask the app to create
fewer equal pages..

This is why the max sharing has to be limited inside KSM, then we
don't need anything special in the VM anymore to threat KSM pages.

As opposed the max sharing of COW anon memory post fork is limited by
the number of fork invocations, for MAP_SHARED the sharing is limited
by the number of mmaps, those don't tend to escalate to the million or
they would run into other limits first. It's reasonable to expect the
developer to optimize the app to create fewer mmaps or to use thread
instead of processes to reduce the VM overhead in general (which will
improve the rmap walks too).

Note the MAP_SHARED/PRIVATE/anon-COW sharing can exceed 256 mappings
too, you've just to fork 257 times in a row or much more realistically
mmap the same glibc library 257 times in a row, so if something KSM is
now less of a concern for occasional page_referenced worst case
latencies, than all the rest of the page types.

KSM by enforcing the max sharing is now the most RMAP walk
computational complexity friendly of all the page types out there. So
there's no need to threat it specially at low priority reclaim scans.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low
  2018-12-21 14:01       ` Andrea Arcangeli
@ 2018-12-22  5:36         ` Yang Shi
  0 siblings, 0 replies; 9+ messages in thread
From: Yang Shi @ 2018-12-22  5:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Andrew Morton, mhocko, vbabka, hannes, linux-mm,
	linux-kernel, Kirill Tkhai



On 12/21/18 6:01 AM, Andrea Arcangeli wrote:
> Hello Yang,
>
> On Thu, Dec 20, 2018 at 10:33:26PM -0800, Yang Shi wrote:
>>
>> On 12/20/18 10:04 PM, Hugh Dickins wrote:
>>> On Thu, 20 Dec 2018, Andrew Morton wrote:
>>>> Is anyone interested in reviewing this?  Seems somewhat serious.
>>>> Thanks.
>>> Somewhat serious, but no need to rush.
>>>
>>>> From: Yang Shi <yang.shi@linux.alibaba.com>
>>>> Subject: mm: vmscan: skip KSM page in direct reclaim if priority is low
>>>>
>>>> When running a stress test, we occasionally run into the below hang issue:
>>> Artificial load presumably.
>>>
>>>> INFO: task ksmd:205 blocked for more than 360 seconds.
>>>>         Tainted: G            E 4.9.128-001.ali3000_nightly_20180925_264.alios7.x86_64 #1
>>> 4.9-stable does not contain Andrea's 4.13 commit 2c653d0ee2ae
>>> ("ksm: introduce ksm_max_page_sharing per page deduplication limit").
>>>
>>> The patch below is more economical than Andrea's, but I don't think
>>> a second workaround should be added, unless Andrea's is shown to be
>>> insufficient, even with its ksm_max_page_sharing tuned down to suit.
>>>
>>> Yang, please try to reproduce on upstream, or backport Andrea's to
>>> 4.9-stable - thanks.
> I think it's reasonable to backport it and it should be an easy
> backport. Just make sure to backport
> b4fecc67cc569b14301f5a1111363d5818b8da5e too which was the only bug
> there was in the initial patch and it happened with
> "merge_across_nodes = 0" (not the default).
>
> We shipped it in production years ago and it was pretty urgent for
> those workloads that initially run into this issue.

Hi Andrea,

Thank you and Hugh for pointing out this commit. I will backport them to 
our kernel. Not sure if 4.9-stable needs this or not.

>
>> I believe Andrea's commit could workaround this problem too by limiting
>> the number of sharing pages.
>>
>> However, IMHO, even though we just have a few hundred pages share one
>> KSM page, it still sounds not worth reclaiming it in direct reclaim in
>> low priority. According to Andrea's commit log, it still takes a few
> You've still to walk the entire chain for compaction and memory
> hotplug, otherwise the KSM page becomes practically
> unmovable. Allowing the rmap chain to grow to infinitely is still not
> ok.

Yes, definitely agree.

>
> If the page should be reclaimed or not in direct reclaim is already
> told by page_referenced(), the more mappings there are the more likely
> at least one was touched and has the young bit set in the pte.
>
>> msec to walk the rmap for 256 shared pages.
> Those ~2.5msec was in the context of page migration: in the previous
> sentence I specified it takes 10usec for the IPI and all other stuff
> page migration has to do (which also largely depends on multiple
> factors like the total number of CPUs).
>
> page_referenced() doesn't flush the TLB during the rmap walk when it
> clears the accessed bit, so it's orders of magnitude faster than the
> real page migration at walking the KSM rmap chain.
>
> If the page migration latency of 256 max mappings is a concern the max
> sharing can be configured at runtime or the default max sharing can be
> reduced to 10 to give a max latency of ~100usec and it would still
> give a fairly decent x10 compression ratio. That's a minor detail to
> change if that's a concern.
>
> The only difference compared to all other page types is KSM pages can
> occasionally merge very aggressively and the apps have no way to limit
> the merging or even avoid it. We simply can't ask the app to create
> fewer equal pages..
>
> This is why the max sharing has to be limited inside KSM, then we
> don't need anything special in the VM anymore to threat KSM pages.
>
> As opposed the max sharing of COW anon memory post fork is limited by
> the number of fork invocations, for MAP_SHARED the sharing is limited
> by the number of mmaps, those don't tend to escalate to the million or
> they would run into other limits first. It's reasonable to expect the
> developer to optimize the app to create fewer mmaps or to use thread
> instead of processes to reduce the VM overhead in general (which will
> improve the rmap walks too).
>
> Note the MAP_SHARED/PRIVATE/anon-COW sharing can exceed 256 mappings
> too, you've just to fork 257 times in a row or much more realistically
> mmap the same glibc library 257 times in a row, so if something KSM is
> now less of a concern for occasional page_referenced worst case
> latencies, than all the rest of the page types.
>
> KSM by enforcing the max sharing is now the most RMAP walk
> computational complexity friendly of all the page types out there. So
> there's no need to threat it specially at low priority reclaim scans.

Thanks a lot. The above is very informative and helpful. I agree KSM 
page can't grow insanely and make it less concerned in reclaim path with 
max sharing limit. I don't insist on keeping my patch although we still 
can think of some artificial scenarios which may go insane. But, it 
should be very unlikely in real world workload with a sane max sharing 
page limit.

BTW, happy holiday guys.

Regards,
Yang

>
> Thanks,
> Andrea

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-12-22  5:36 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-11-07 19:16 [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low Yang Shi
2018-11-07 19:16 ` [PATCH 2/2] mm: ksm: do not block on page lock when searching stable tree Yang Shi
2018-11-23  7:03   ` Kirill Tkhai
2018-12-20 22:45 ` [PATCH 1/2] mm: vmscan: skip KSM page in direct reclaim if priority is low Andrew Morton
2018-12-21  6:04   ` Hugh Dickins
2018-12-21  6:04     ` Hugh Dickins
2018-12-21  6:33     ` Yang Shi
2018-12-21 14:01       ` Andrea Arcangeli
2018-12-22  5:36         ` Yang Shi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.