linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure
@ 2022-02-23 19:48 Suren Baghdasaryan
  2022-02-24  7:10 ` Shakeel Butt
  2022-02-24  8:53 ` Michal Hocko
  0 siblings, 2 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2022-02-23 19:48 UTC (permalink / raw)
  To: akpm
  Cc: hannes, mhocko, pmladek, peterz, guro, shakeelb, minchan,
	timmurray, linux-mm, linux-kernel, kernel-team, surenb

When page allocation in direct reclaim path fails, the system will
make one attempt to shrink per-cpu page lists and free pages from
high alloc reserves. Draining per-cpu pages into buddy allocator can
be a very slow operation because it's done using workqueues and the
task in direct reclaim waits for all of them to finish before
proceeding. Currently this time is not accounted as psi memory stall.

While testing mobile devices under extreme memory pressure, when
allocations are failing during direct reclaim, we notices that psi
events which would be expected in such conditions were not triggered.
After profiling these cases it was determined that the reason for
missing psi events was that a big chunk of time spent in direct
reclaim is not accounted as memory stall, therefore psi would not
reach the levels at which an event is generated. Further investigation
revealed that the bulk of that unaccounted time was spent inside
drain_all_pages call.

A typical captured case when drain_all_pages path gets activated:

__alloc_pages_slowpath  took 44.644.613ns
    __perform_reclaim   took    751.668ns (1.7%)
    drain_all_pages     took 43.887.167ns (98.3%)

PSI in this case records the time spent in __perform_reclaim but
ignores drain_all_pages, IOW it misses 98.3% of the time spent in
__alloc_pages_slowpath.

Annotate __alloc_pages_direct_reclaim in its entirety so that delays
from handling page allocation failure in the direct reclaim path are
accounted as memory stall.

Reported-by: Tim Murray <timmurray@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
---
changes in v3:
- Moved psi_memstall_leave after the "out" label

 mm/page_alloc.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3589febc6d31..029bceb79861 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4595,13 +4595,12 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 					const struct alloc_context *ac)
 {
 	unsigned int noreclaim_flag;
-	unsigned long pflags, progress;
+	unsigned long progress;
 
 	cond_resched();
 
 	/* We now go into synchronous reclaim */
 	cpuset_memory_pressure_bump();
-	psi_memstall_enter(&pflags);
 	fs_reclaim_acquire(gfp_mask);
 	noreclaim_flag = memalloc_noreclaim_save();
 
@@ -4610,7 +4609,6 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
 
 	memalloc_noreclaim_restore(noreclaim_flag);
 	fs_reclaim_release(gfp_mask);
-	psi_memstall_leave(&pflags);
 
 	cond_resched();
 
@@ -4624,11 +4622,13 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 		unsigned long *did_some_progress)
 {
 	struct page *page = NULL;
+	unsigned long pflags;
 	bool drained = false;
 
+	psi_memstall_enter(&pflags);
 	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
 	if (unlikely(!(*did_some_progress)))
-		return NULL;
+		goto out;
 
 retry:
 	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
@@ -4644,6 +4644,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 		drained = true;
 		goto retry;
 	}
+out:
+	psi_memstall_leave(&pflags);
 
 	return page;
 }
-- 
2.35.1.473.g83b2b277ed-goog



^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure
  2022-02-23 19:48 [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure Suren Baghdasaryan
@ 2022-02-24  7:10 ` Shakeel Butt
  2022-02-24  8:53 ` Michal Hocko
  1 sibling, 0 replies; 5+ messages in thread
From: Shakeel Butt @ 2022-02-24  7:10 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, hannes, mhocko, pmladek, peterz, guro, minchan, timmurray,
	linux-mm, linux-kernel, kernel-team

On Wed, Feb 23, 2022 at 11:48:12AM -0800, Suren Baghdasaryan wrote:
> When page allocation in direct reclaim path fails, the system will
> make one attempt to shrink per-cpu page lists and free pages from
> high alloc reserves. Draining per-cpu pages into buddy allocator can
> be a very slow operation because it's done using workqueues and the
> task in direct reclaim waits for all of them to finish before
> proceeding. Currently this time is not accounted as psi memory stall.

> While testing mobile devices under extreme memory pressure, when
> allocations are failing during direct reclaim, we notices that psi
> events which would be expected in such conditions were not triggered.
> After profiling these cases it was determined that the reason for
> missing psi events was that a big chunk of time spent in direct
> reclaim is not accounted as memory stall, therefore psi would not
> reach the levels at which an event is generated. Further investigation
> revealed that the bulk of that unaccounted time was spent inside
> drain_all_pages call.

> A typical captured case when drain_all_pages path gets activated:

> __alloc_pages_slowpath  took 44.644.613ns
>      __perform_reclaim   took    751.668ns (1.7%)
>      drain_all_pages     took 43.887.167ns (98.3%)

> PSI in this case records the time spent in __perform_reclaim but
> ignores drain_all_pages, IOW it misses 98.3% of the time spent in
> __alloc_pages_slowpath.

> Annotate __alloc_pages_direct_reclaim in its entirety so that delays
> from handling page allocation failure in the direct reclaim path are
> accounted as memory stall.

> Reported-by: Tim Murray <timmurray@google.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Reviewed-by: Shakeel Butt <shakeelb@google.com>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure
  2022-02-23 19:48 [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure Suren Baghdasaryan
  2022-02-24  7:10 ` Shakeel Butt
@ 2022-02-24  8:53 ` Michal Hocko
  2022-02-24 16:28   ` Suren Baghdasaryan
  1 sibling, 1 reply; 5+ messages in thread
From: Michal Hocko @ 2022-02-24  8:53 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: akpm, hannes, pmladek, peterz, guro, shakeelb, minchan,
	timmurray, linux-mm, linux-kernel, kernel-team

On Wed 23-02-22 11:48:12, Suren Baghdasaryan wrote:
> When page allocation in direct reclaim path fails, the system will
> make one attempt to shrink per-cpu page lists and free pages from
> high alloc reserves. Draining per-cpu pages into buddy allocator can
> be a very slow operation because it's done using workqueues and the
> task in direct reclaim waits for all of them to finish before
> proceeding. Currently this time is not accounted as psi memory stall.
> 
> While testing mobile devices under extreme memory pressure, when
> allocations are failing during direct reclaim, we notices that psi
> events which would be expected in such conditions were not triggered.
> After profiling these cases it was determined that the reason for
> missing psi events was that a big chunk of time spent in direct
> reclaim is not accounted as memory stall, therefore psi would not
> reach the levels at which an event is generated. Further investigation
> revealed that the bulk of that unaccounted time was spent inside
> drain_all_pages call.
> 
> A typical captured case when drain_all_pages path gets activated:
> 
> __alloc_pages_slowpath  took 44.644.613ns
>     __perform_reclaim   took    751.668ns (1.7%)
>     drain_all_pages     took 43.887.167ns (98.3%)

Although the draining is done in the slow path these numbers suggest
that we should really reconsider the use of WQ both for draining and
other purposes (like vmstats).

> PSI in this case records the time spent in __perform_reclaim but
> ignores drain_all_pages, IOW it misses 98.3% of the time spent in
> __alloc_pages_slowpath.
> 
> Annotate __alloc_pages_direct_reclaim in its entirety so that delays
> from handling page allocation failure in the direct reclaim path are
> accounted as memory stall.
> 
> Reported-by: Tim Murray <timmurray@google.com>
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> Acked-by: Johannes Weiner <hannes@cmpxchg.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!

> ---
> changes in v3:
> - Moved psi_memstall_leave after the "out" label
> 
>  mm/page_alloc.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 3589febc6d31..029bceb79861 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -4595,13 +4595,12 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
>  					const struct alloc_context *ac)
>  {
>  	unsigned int noreclaim_flag;
> -	unsigned long pflags, progress;
> +	unsigned long progress;
>  
>  	cond_resched();
>  
>  	/* We now go into synchronous reclaim */
>  	cpuset_memory_pressure_bump();
> -	psi_memstall_enter(&pflags);
>  	fs_reclaim_acquire(gfp_mask);
>  	noreclaim_flag = memalloc_noreclaim_save();
>  
> @@ -4610,7 +4609,6 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
>  
>  	memalloc_noreclaim_restore(noreclaim_flag);
>  	fs_reclaim_release(gfp_mask);
> -	psi_memstall_leave(&pflags);
>  
>  	cond_resched();
>  
> @@ -4624,11 +4622,13 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>  		unsigned long *did_some_progress)
>  {
>  	struct page *page = NULL;
> +	unsigned long pflags;
>  	bool drained = false;
>  
> +	psi_memstall_enter(&pflags);
>  	*did_some_progress = __perform_reclaim(gfp_mask, order, ac);
>  	if (unlikely(!(*did_some_progress)))
> -		return NULL;
> +		goto out;
>  
>  retry:
>  	page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> @@ -4644,6 +4644,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
>  		drained = true;
>  		goto retry;
>  	}
> +out:
> +	psi_memstall_leave(&pflags);
>  
>  	return page;
>  }
> -- 
> 2.35.1.473.g83b2b277ed-goog

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure
  2022-02-24  8:53 ` Michal Hocko
@ 2022-02-24 16:28   ` Suren Baghdasaryan
  2022-02-25  1:31     ` Suren Baghdasaryan
  0 siblings, 1 reply; 5+ messages in thread
From: Suren Baghdasaryan @ 2022-02-24 16:28 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, hannes, pmladek, peterz, guro, shakeelb, minchan,
	timmurray, linux-mm, linux-kernel, kernel-team

On Thu, Feb 24, 2022 at 12:53 AM 'Michal Hocko' via kernel-team
<kernel-team@android.com> wrote:
>
> On Wed 23-02-22 11:48:12, Suren Baghdasaryan wrote:
> > When page allocation in direct reclaim path fails, the system will
> > make one attempt to shrink per-cpu page lists and free pages from
> > high alloc reserves. Draining per-cpu pages into buddy allocator can
> > be a very slow operation because it's done using workqueues and the
> > task in direct reclaim waits for all of them to finish before
> > proceeding. Currently this time is not accounted as psi memory stall.
> >
> > While testing mobile devices under extreme memory pressure, when
> > allocations are failing during direct reclaim, we notices that psi
> > events which would be expected in such conditions were not triggered.
> > After profiling these cases it was determined that the reason for
> > missing psi events was that a big chunk of time spent in direct
> > reclaim is not accounted as memory stall, therefore psi would not
> > reach the levels at which an event is generated. Further investigation
> > revealed that the bulk of that unaccounted time was spent inside
> > drain_all_pages call.
> >
> > A typical captured case when drain_all_pages path gets activated:
> >
> > __alloc_pages_slowpath  took 44.644.613ns
> >     __perform_reclaim   took    751.668ns (1.7%)
> >     drain_all_pages     took 43.887.167ns (98.3%)
>
> Although the draining is done in the slow path these numbers suggest
> that we should really reconsider the use of WQ both for draining and
> other purposes (like vmstats).

Yep, I'm testing the kthread_create_worker_on_cpu approach suggested
by Petr. Will post it later today if nothing regresses.

>
> > PSI in this case records the time spent in __perform_reclaim but
> > ignores drain_all_pages, IOW it misses 98.3% of the time spent in
> > __alloc_pages_slowpath.
> >
> > Annotate __alloc_pages_direct_reclaim in its entirety so that delays
> > from handling page allocation failure in the direct reclaim path are
> > accounted as memory stall.
> >
> > Reported-by: Tim Murray <timmurray@google.com>
> > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
>
> Acked-by: Michal Hocko <mhocko@suse.com>
>
> Thanks!
>
> > ---
> > changes in v3:
> > - Moved psi_memstall_leave after the "out" label
> >
> >  mm/page_alloc.c | 10 ++++++----
> >  1 file changed, 6 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 3589febc6d31..029bceb79861 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -4595,13 +4595,12 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
> >                                       const struct alloc_context *ac)
> >  {
> >       unsigned int noreclaim_flag;
> > -     unsigned long pflags, progress;
> > +     unsigned long progress;
> >
> >       cond_resched();
> >
> >       /* We now go into synchronous reclaim */
> >       cpuset_memory_pressure_bump();
> > -     psi_memstall_enter(&pflags);
> >       fs_reclaim_acquire(gfp_mask);
> >       noreclaim_flag = memalloc_noreclaim_save();
> >
> > @@ -4610,7 +4609,6 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
> >
> >       memalloc_noreclaim_restore(noreclaim_flag);
> >       fs_reclaim_release(gfp_mask);
> > -     psi_memstall_leave(&pflags);
> >
> >       cond_resched();
> >
> > @@ -4624,11 +4622,13 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> >               unsigned long *did_some_progress)
> >  {
> >       struct page *page = NULL;
> > +     unsigned long pflags;
> >       bool drained = false;
> >
> > +     psi_memstall_enter(&pflags);
> >       *did_some_progress = __perform_reclaim(gfp_mask, order, ac);
> >       if (unlikely(!(*did_some_progress)))
> > -             return NULL;
> > +             goto out;
> >
> >  retry:
> >       page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> > @@ -4644,6 +4644,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> >               drained = true;
> >               goto retry;
> >       }
> > +out:
> > +     psi_memstall_leave(&pflags);
> >
> >       return page;
> >  }
> > --
> > 2.35.1.473.g83b2b277ed-goog
>
> --
> Michal Hocko
> SUSE Labs
>
> --
> To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure
  2022-02-24 16:28   ` Suren Baghdasaryan
@ 2022-02-25  1:31     ` Suren Baghdasaryan
  0 siblings, 0 replies; 5+ messages in thread
From: Suren Baghdasaryan @ 2022-02-25  1:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: akpm, hannes, pmladek, peterz, guro, shakeelb, minchan,
	timmurray, linux-mm, linux-kernel, kernel-team

On Thu, Feb 24, 2022 at 8:28 AM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Feb 24, 2022 at 12:53 AM 'Michal Hocko' via kernel-team
> <kernel-team@android.com> wrote:
> >
> > On Wed 23-02-22 11:48:12, Suren Baghdasaryan wrote:
> > > When page allocation in direct reclaim path fails, the system will
> > > make one attempt to shrink per-cpu page lists and free pages from
> > > high alloc reserves. Draining per-cpu pages into buddy allocator can
> > > be a very slow operation because it's done using workqueues and the
> > > task in direct reclaim waits for all of them to finish before
> > > proceeding. Currently this time is not accounted as psi memory stall.
> > >
> > > While testing mobile devices under extreme memory pressure, when
> > > allocations are failing during direct reclaim, we notices that psi
> > > events which would be expected in such conditions were not triggered.
> > > After profiling these cases it was determined that the reason for
> > > missing psi events was that a big chunk of time spent in direct
> > > reclaim is not accounted as memory stall, therefore psi would not
> > > reach the levels at which an event is generated. Further investigation
> > > revealed that the bulk of that unaccounted time was spent inside
> > > drain_all_pages call.
> > >
> > > A typical captured case when drain_all_pages path gets activated:
> > >
> > > __alloc_pages_slowpath  took 44.644.613ns
> > >     __perform_reclaim   took    751.668ns (1.7%)
> > >     drain_all_pages     took 43.887.167ns (98.3%)
> >
> > Although the draining is done in the slow path these numbers suggest
> > that we should really reconsider the use of WQ both for draining and
> > other purposes (like vmstats).
>
> Yep, I'm testing the kthread_create_worker_on_cpu approach suggested
> by Petr. Will post it later today if nothing regresses.

An RFC for kthreads approach is posted at
https://lore.kernel.org/all/20220225012819.1807147-1-surenb@google.com/

>
> >
> > > PSI in this case records the time spent in __perform_reclaim but
> > > ignores drain_all_pages, IOW it misses 98.3% of the time spent in
> > > __alloc_pages_slowpath.
> > >
> > > Annotate __alloc_pages_direct_reclaim in its entirety so that delays
> > > from handling page allocation failure in the direct reclaim path are
> > > accounted as memory stall.
> > >
> > > Reported-by: Tim Murray <timmurray@google.com>
> > > Signed-off-by: Suren Baghdasaryan <surenb@google.com>
> > > Acked-by: Johannes Weiner <hannes@cmpxchg.org>
> >
> > Acked-by: Michal Hocko <mhocko@suse.com>
> >
> > Thanks!
> >
> > > ---
> > > changes in v3:
> > > - Moved psi_memstall_leave after the "out" label
> > >
> > >  mm/page_alloc.c | 10 ++++++----
> > >  1 file changed, 6 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 3589febc6d31..029bceb79861 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -4595,13 +4595,12 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
> > >                                       const struct alloc_context *ac)
> > >  {
> > >       unsigned int noreclaim_flag;
> > > -     unsigned long pflags, progress;
> > > +     unsigned long progress;
> > >
> > >       cond_resched();
> > >
> > >       /* We now go into synchronous reclaim */
> > >       cpuset_memory_pressure_bump();
> > > -     psi_memstall_enter(&pflags);
> > >       fs_reclaim_acquire(gfp_mask);
> > >       noreclaim_flag = memalloc_noreclaim_save();
> > >
> > > @@ -4610,7 +4609,6 @@ __perform_reclaim(gfp_t gfp_mask, unsigned int order,
> > >
> > >       memalloc_noreclaim_restore(noreclaim_flag);
> > >       fs_reclaim_release(gfp_mask);
> > > -     psi_memstall_leave(&pflags);
> > >
> > >       cond_resched();
> > >
> > > @@ -4624,11 +4622,13 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> > >               unsigned long *did_some_progress)
> > >  {
> > >       struct page *page = NULL;
> > > +     unsigned long pflags;
> > >       bool drained = false;
> > >
> > > +     psi_memstall_enter(&pflags);
> > >       *did_some_progress = __perform_reclaim(gfp_mask, order, ac);
> > >       if (unlikely(!(*did_some_progress)))
> > > -             return NULL;
> > > +             goto out;
> > >
> > >  retry:
> > >       page = get_page_from_freelist(gfp_mask, order, alloc_flags, ac);
> > > @@ -4644,6 +4644,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
> > >               drained = true;
> > >               goto retry;
> > >       }
> > > +out:
> > > +     psi_memstall_leave(&pflags);
> > >
> > >       return page;
> > >  }
> > > --
> > > 2.35.1.473.g83b2b277ed-goog
> >
> > --
> > Michal Hocko
> > SUSE Labs
> >
> > --
> > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> >


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-25  1:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-23 19:48 [PATCH v3 1/1] mm: count time in drain_all_pages during direct reclaim as memory pressure Suren Baghdasaryan
2022-02-24  7:10 ` Shakeel Butt
2022-02-24  8:53 ` Michal Hocko
2022-02-24 16:28   ` Suren Baghdasaryan
2022-02-25  1:31     ` Suren Baghdasaryan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).