All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim
@ 2019-01-29 22:11 Yang Shi
  2019-01-29 22:53 ` Johannes Weiner
  2019-01-31  6:47 ` Michal Hocko
  0 siblings, 2 replies; 3+ messages in thread
From: Yang Shi @ 2019-01-29 22:11 UTC (permalink / raw)
  To: mhocko, hannes, akpm; +Cc: yang.shi, linux-mm, linux-kernel

In current implementation, both kswapd and direct reclaim has to iterate
all mem cgroups.  It is not a problem before offline mem cgroups could
be iterated.  But, currently with iterating offline mem cgroups, it
could be very time consuming.  In our workloads, we saw over 400K mem
cgroups accumulated in some cases, only a few hundred are online memcgs.
Although kswapd could help out to reduce the number of memcgs, direct
reclaim still get hit with iterating a number of offline memcgs in some
cases.  We experienced the responsiveness problems due to this
occassionally.

A simple test with pref shows it may take around 220ms to iterate 8K memcgs
in direct reclaim:
             dd 13873 [011]   578.542919: vmscan:mm_vmscan_direct_reclaim_begin
             dd 13873 [011]   578.758689: vmscan:mm_vmscan_direct_reclaim_end
So for 400K, it may take around 11 seconds to iterate all memcgs.

Here just break the iteration once it reclaims enough pages as what
memcg direct reclaim does.  This may hurt the fairness among memcgs.  But
the cached iterator cookie could help to achieve the fairness more or
less.

Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
v2: Added some test data in the commit log
    Updated commit log about iterator cookie could maintain fairness
    Dropped !global_reclaim() since !current_is_kswapd() is good enough

 mm/vmscan.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a714c4f..5e35796 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2764,16 +2764,15 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
 				   sc->nr_reclaimed - reclaimed);
 
 			/*
-			 * Direct reclaim and kswapd have to scan all memory
-			 * cgroups to fulfill the overall scan target for the
-			 * node.
+			 * Kswapd have to scan all memory cgroups to fulfill
+			 * the overall scan target for the node.
 			 *
 			 * Limit reclaim, on the other hand, only cares about
 			 * nr_to_reclaim pages to be reclaimed and it will
 			 * retry with decreasing priority if one round over the
 			 * whole hierarchy is not sufficient.
 			 */
-			if (!global_reclaim(sc) &&
+			if (!current_is_kswapd() &&
 					sc->nr_reclaimed >= sc->nr_to_reclaim) {
 				mem_cgroup_iter_break(root, memcg);
 				break;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC v2 PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim
  2019-01-29 22:11 [RFC v2 PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim Yang Shi
@ 2019-01-29 22:53 ` Johannes Weiner
  2019-01-31  6:47 ` Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Johannes Weiner @ 2019-01-29 22:53 UTC (permalink / raw)
  To: Yang Shi; +Cc: mhocko, akpm, linux-mm, linux-kernel

On Wed, Jan 30, 2019 at 06:11:17AM +0800, Yang Shi wrote:
> In current implementation, both kswapd and direct reclaim has to iterate
> all mem cgroups.  It is not a problem before offline mem cgroups could
> be iterated.  But, currently with iterating offline mem cgroups, it
> could be very time consuming.  In our workloads, we saw over 400K mem
> cgroups accumulated in some cases, only a few hundred are online memcgs.
> Although kswapd could help out to reduce the number of memcgs, direct
> reclaim still get hit with iterating a number of offline memcgs in some
> cases.  We experienced the responsiveness problems due to this
> occassionally.
> 
> A simple test with pref shows it may take around 220ms to iterate 8K memcgs
> in direct reclaim:
>              dd 13873 [011]   578.542919: vmscan:mm_vmscan_direct_reclaim_begin
>              dd 13873 [011]   578.758689: vmscan:mm_vmscan_direct_reclaim_end
> So for 400K, it may take around 11 seconds to iterate all memcgs.
> 
> Here just break the iteration once it reclaims enough pages as what
> memcg direct reclaim does.  This may hurt the fairness among memcgs.  But
> the cached iterator cookie could help to achieve the fairness more or
> less.
> 
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>

Looks sane to me, thanks Yang.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC v2 PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim
  2019-01-29 22:11 [RFC v2 PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim Yang Shi
  2019-01-29 22:53 ` Johannes Weiner
@ 2019-01-31  6:47 ` Michal Hocko
  1 sibling, 0 replies; 3+ messages in thread
From: Michal Hocko @ 2019-01-31  6:47 UTC (permalink / raw)
  To: Yang Shi; +Cc: hannes, akpm, linux-mm, linux-kernel

On Wed 30-01-19 06:11:17, Yang Shi wrote:
> In current implementation, both kswapd and direct reclaim has to iterate
> all mem cgroups.  It is not a problem before offline mem cgroups could
> be iterated.  But, currently with iterating offline mem cgroups, it
> could be very time consuming.  In our workloads, we saw over 400K mem
> cgroups accumulated in some cases, only a few hundred are online memcgs.
> Although kswapd could help out to reduce the number of memcgs, direct
> reclaim still get hit with iterating a number of offline memcgs in some
> cases.  We experienced the responsiveness problems due to this
> occassionally.
> 
> A simple test with pref shows it may take around 220ms to iterate 8K memcgs
> in direct reclaim:
>              dd 13873 [011]   578.542919: vmscan:mm_vmscan_direct_reclaim_begin
>              dd 13873 [011]   578.758689: vmscan:mm_vmscan_direct_reclaim_end
> So for 400K, it may take around 11 seconds to iterate all memcgs.
> 
> Here just break the iteration once it reclaims enough pages as what
> memcg direct reclaim does.  This may hurt the fairness among memcgs.  But
> the cached iterator cookie could help to achieve the fairness more or
> less.
> 
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@suse.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> v2: Added some test data in the commit log
>     Updated commit log about iterator cookie could maintain fairness
>     Dropped !global_reclaim() since !current_is_kswapd() is good enough
> 
>  mm/vmscan.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index a714c4f..5e35796 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2764,16 +2764,15 @@ static bool shrink_node(pg_data_t *pgdat, struct scan_control *sc)
>  				   sc->nr_reclaimed - reclaimed);
>  
>  			/*
> -			 * Direct reclaim and kswapd have to scan all memory
> -			 * cgroups to fulfill the overall scan target for the
> -			 * node.
> +			 * Kswapd have to scan all memory cgroups to fulfill
> +			 * the overall scan target for the node.
>  			 *
>  			 * Limit reclaim, on the other hand, only cares about
>  			 * nr_to_reclaim pages to be reclaimed and it will
>  			 * retry with decreasing priority if one round over the
>  			 * whole hierarchy is not sufficient.
>  			 */
> -			if (!global_reclaim(sc) &&
> +			if (!current_is_kswapd() &&
>  					sc->nr_reclaimed >= sc->nr_to_reclaim) {
>  				mem_cgroup_iter_break(root, memcg);
>  				break;
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-01-31  6:47 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-29 22:11 [RFC v2 PATCH] mm: vmscan: do not iterate all mem cgroups for global direct reclaim Yang Shi
2019-01-29 22:53 ` Johannes Weiner
2019-01-31  6:47 ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.