All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
@ 2020-05-07 20:49 ` Shakeel Butt
  0 siblings, 0 replies; 12+ messages in thread
From: Shakeel Butt @ 2020-05-07 20:49 UTC (permalink / raw)
  To: Mel Gorman, Johannes Weiner, Roman Gushchin, Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Shakeel Butt

One way to measure the efficiency of memory reclaim is to look at the
ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
not updated consistently at the system level and the ratio of these are
not very meaningful. The pgsteal and pgscan are updated for only global
reclaim while pgrefill gets updated for global as well as cgroup
reclaim.

Please note that this difference is only for system level vmstats. The
cgroup stats returned by memory.stat are actually consistent. The
cgroup's pgsteal contains number of reclaimed pages for global as well
as cgroup reclaim. So, one way to get the system level stats is to get
these stats from root's memory.stat but root does not expose that
interface. Also for !CONFIG_MEMCG machines /proc/vmstat is the only way
to get these stats. So, make these stats consistent.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/vmscan.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index cc555903a332..51f7d1efc912 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1943,8 +1943,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT;
-	if (!cgroup_reclaim(sc))
-		__count_vm_events(item, nr_scanned);
+	__count_vm_events(item, nr_scanned);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
 	spin_unlock_irq(&pgdat->lru_lock);
 
@@ -1957,8 +1956,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	spin_lock_irq(&pgdat->lru_lock);
 
 	item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
-	if (!cgroup_reclaim(sc))
-		__count_vm_events(item, nr_reclaimed);
+	__count_vm_events(item, nr_reclaimed);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
 	reclaim_stat->recent_rotated[0] += stat.nr_activate[0];
 	reclaim_stat->recent_rotated[1] += stat.nr_activate[1];
-- 
2.26.2.526.g744177e7f7-goog


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
@ 2020-05-07 20:49 ` Shakeel Butt
  0 siblings, 0 replies; 12+ messages in thread
From: Shakeel Butt @ 2020-05-07 20:49 UTC (permalink / raw)
  To: Mel Gorman, Johannes Weiner, Roman Gushchin, Michal Hocko
  Cc: Andrew Morton, linux-mm, linux-kernel, Shakeel Butt

One way to measure the efficiency of memory reclaim is to look at the
ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
not updated consistently at the system level and the ratio of these are
not very meaningful. The pgsteal and pgscan are updated for only global
reclaim while pgrefill gets updated for global as well as cgroup
reclaim.

Please note that this difference is only for system level vmstats. The
cgroup stats returned by memory.stat are actually consistent. The
cgroup's pgsteal contains number of reclaimed pages for global as well
as cgroup reclaim. So, one way to get the system level stats is to get
these stats from root's memory.stat but root does not expose that
interface. Also for !CONFIG_MEMCG machines /proc/vmstat is the only way
to get these stats. So, make these stats consistent.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---
 mm/vmscan.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index cc555903a332..51f7d1efc912 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1943,8 +1943,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	reclaim_stat->recent_scanned[file] += nr_taken;
 
 	item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT;
-	if (!cgroup_reclaim(sc))
-		__count_vm_events(item, nr_scanned);
+	__count_vm_events(item, nr_scanned);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
 	spin_unlock_irq(&pgdat->lru_lock);
 
@@ -1957,8 +1956,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
 	spin_lock_irq(&pgdat->lru_lock);
 
 	item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
-	if (!cgroup_reclaim(sc))
-		__count_vm_events(item, nr_reclaimed);
+	__count_vm_events(item, nr_reclaimed);
 	__count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
 	reclaim_stat->recent_rotated[0] += stat.nr_activate[0];
 	reclaim_stat->recent_rotated[1] += stat.nr_activate[1];
-- 
2.26.2.526.g744177e7f7-goog



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
  2020-05-07 20:49 ` Shakeel Butt
  (?)
@ 2020-05-07 22:28 ` Roman Gushchin
  -1 siblings, 0 replies; 12+ messages in thread
From: Roman Gushchin @ 2020-05-07 22:28 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mel Gorman, Johannes Weiner, Michal Hocko, Andrew Morton,
	linux-mm, linux-kernel

On Thu, May 07, 2020 at 01:49:13PM -0700, Shakeel Butt wrote:
> One way to measure the efficiency of memory reclaim is to look at the
> ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> not updated consistently at the system level and the ratio of these are
> not very meaningful. The pgsteal and pgscan are updated for only global
> reclaim while pgrefill gets updated for global as well as cgroup
> reclaim.
> 
> Please note that this difference is only for system level vmstats. The
> cgroup stats returned by memory.stat are actually consistent. The
> cgroup's pgsteal contains number of reclaimed pages for global as well
> as cgroup reclaim. So, one way to get the system level stats is to get
> these stats from root's memory.stat but root does not expose that
> interface. Also for !CONFIG_MEMCG machines /proc/vmstat is the only way
> to get these stats. So, make these stats consistent.
> 
> Signed-off-by: Shakeel Butt <shakeelb@google.com>

Acked-by: Roman Gushchin <guro@fb.com>

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
  2020-05-07 20:49 ` Shakeel Butt
@ 2020-05-08 10:34   ` Yafang Shao
  -1 siblings, 0 replies; 12+ messages in thread
From: Yafang Shao @ 2020-05-08 10:34 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mel Gorman, Johannes Weiner, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> One way to measure the efficiency of memory reclaim is to look at the
> ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> not updated consistently at the system level and the ratio of these are
> not very meaningful. The pgsteal and pgscan are updated for only global
> reclaim while pgrefill gets updated for global as well as cgroup
> reclaim.
>

Hi Shakeel,

We always use pgscan and pgsteal for monitoring the system level
memory pressure, for example, by using sysstat(sar) or some other
monitor tools.
But with this change, these two counters include the memcg pressure as
well. It is not easy to know whether the pgscan and pgsteal are caused
by system level pressure or only some specific memcgs reaching their
memory limit.

How about adding  cgroup_reclaim() to pgrefill as well ?

> Please note that this difference is only for system level vmstats. The
> cgroup stats returned by memory.stat are actually consistent. The
> cgroup's pgsteal contains number of reclaimed pages for global as well
> as cgroup reclaim. So, one way to get the system level stats is to get
> these stats from root's memory.stat but root does not expose that
> interface. Also for !CONFIG_MEMCG machines /proc/vmstat is the only way
> to get these stats. So, make these stats consistent.
>
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  mm/vmscan.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cc555903a332..51f7d1efc912 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1943,8 +1943,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>         reclaim_stat->recent_scanned[file] += nr_taken;
>
>         item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT;
> -       if (!cgroup_reclaim(sc))
> -               __count_vm_events(item, nr_scanned);
> +       __count_vm_events(item, nr_scanned);
>         __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
>         spin_unlock_irq(&pgdat->lru_lock);
>
> @@ -1957,8 +1956,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>         spin_lock_irq(&pgdat->lru_lock);
>
>         item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
> -       if (!cgroup_reclaim(sc))
> -               __count_vm_events(item, nr_reclaimed);
> +       __count_vm_events(item, nr_reclaimed);
>         __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
>         reclaim_stat->recent_rotated[0] += stat.nr_activate[0];
>         reclaim_stat->recent_rotated[1] += stat.nr_activate[1];
> --
> 2.26.2.526.g744177e7f7-goog
>
>


-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
@ 2020-05-08 10:34   ` Yafang Shao
  0 siblings, 0 replies; 12+ messages in thread
From: Yafang Shao @ 2020-05-08 10:34 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Mel Gorman, Johannes Weiner, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> One way to measure the efficiency of memory reclaim is to look at the
> ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> not updated consistently at the system level and the ratio of these are
> not very meaningful. The pgsteal and pgscan are updated for only global
> reclaim while pgrefill gets updated for global as well as cgroup
> reclaim.
>

Hi Shakeel,

We always use pgscan and pgsteal for monitoring the system level
memory pressure, for example, by using sysstat(sar) or some other
monitor tools.
But with this change, these two counters include the memcg pressure as
well. It is not easy to know whether the pgscan and pgsteal are caused
by system level pressure or only some specific memcgs reaching their
memory limit.

How about adding  cgroup_reclaim() to pgrefill as well ?

> Please note that this difference is only for system level vmstats. The
> cgroup stats returned by memory.stat are actually consistent. The
> cgroup's pgsteal contains number of reclaimed pages for global as well
> as cgroup reclaim. So, one way to get the system level stats is to get
> these stats from root's memory.stat but root does not expose that
> interface. Also for !CONFIG_MEMCG machines /proc/vmstat is the only way
> to get these stats. So, make these stats consistent.
>
> Signed-off-by: Shakeel Butt <shakeelb@google.com>
> ---
>  mm/vmscan.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index cc555903a332..51f7d1efc912 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1943,8 +1943,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>         reclaim_stat->recent_scanned[file] += nr_taken;
>
>         item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT;
> -       if (!cgroup_reclaim(sc))
> -               __count_vm_events(item, nr_scanned);
> +       __count_vm_events(item, nr_scanned);
>         __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
>         spin_unlock_irq(&pgdat->lru_lock);
>
> @@ -1957,8 +1956,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
>         spin_lock_irq(&pgdat->lru_lock);
>
>         item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
> -       if (!cgroup_reclaim(sc))
> -               __count_vm_events(item, nr_reclaimed);
> +       __count_vm_events(item, nr_reclaimed);
>         __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
>         reclaim_stat->recent_rotated[0] += stat.nr_activate[0];
>         reclaim_stat->recent_rotated[1] += stat.nr_activate[1];
> --
> 2.26.2.526.g744177e7f7-goog
>
>


-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
  2020-05-08 10:34   ` Yafang Shao
@ 2020-05-08 13:25     ` Shakeel Butt
  -1 siblings, 0 replies; 12+ messages in thread
From: Shakeel Butt @ 2020-05-08 13:25 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Mel Gorman, Johannes Weiner, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > One way to measure the efficiency of memory reclaim is to look at the
> > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > not updated consistently at the system level and the ratio of these are
> > not very meaningful. The pgsteal and pgscan are updated for only global
> > reclaim while pgrefill gets updated for global as well as cgroup
> > reclaim.
> >
>
> Hi Shakeel,
>
> We always use pgscan and pgsteal for monitoring the system level
> memory pressure, for example, by using sysstat(sar) or some other
> monitor tools.

Don't you need pgrefill in addition to pgscan and pgsteal to get the
full picture of the reclaim activity?

> But with this change, these two counters include the memcg pressure as
> well. It is not easy to know whether the pgscan and pgsteal are caused
> by system level pressure or only some specific memcgs reaching their
> memory limit.
>
> How about adding  cgroup_reclaim() to pgrefill as well ?
>

I am looking for all the reclaim activity on the system. Adding
!cgroup_reclaim to pgrefill will skip the cgroup reclaim activity.
Maybe adding pgsteal_cgroup and pgscan_cgroup would be better.

> > Please note that this difference is only for system level vmstats. The
> > cgroup stats returned by memory.stat are actually consistent. The
> > cgroup's pgsteal contains number of reclaimed pages for global as well
> > as cgroup reclaim. So, one way to get the system level stats is to get
> > these stats from root's memory.stat but root does not expose that
> > interface. Also for !CONFIG_MEMCG machines /proc/vmstat is the only way
> > to get these stats. So, make these stats consistent.
> >
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
> > ---
> >  mm/vmscan.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index cc555903a332..51f7d1efc912 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1943,8 +1943,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >         reclaim_stat->recent_scanned[file] += nr_taken;
> >
> >         item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT;
> > -       if (!cgroup_reclaim(sc))
> > -               __count_vm_events(item, nr_scanned);
> > +       __count_vm_events(item, nr_scanned);
> >         __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
> >         spin_unlock_irq(&pgdat->lru_lock);
> >
> > @@ -1957,8 +1956,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >         spin_lock_irq(&pgdat->lru_lock);
> >
> >         item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
> > -       if (!cgroup_reclaim(sc))
> > -               __count_vm_events(item, nr_reclaimed);
> > +       __count_vm_events(item, nr_reclaimed);
> >         __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
> >         reclaim_stat->recent_rotated[0] += stat.nr_activate[0];
> >         reclaim_stat->recent_rotated[1] += stat.nr_activate[1];
> > --
> > 2.26.2.526.g744177e7f7-goog
> >
> >
>
>
> --
> Thanks
> Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
@ 2020-05-08 13:25     ` Shakeel Butt
  0 siblings, 0 replies; 12+ messages in thread
From: Shakeel Butt @ 2020-05-08 13:25 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Mel Gorman, Johannes Weiner, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > One way to measure the efficiency of memory reclaim is to look at the
> > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > not updated consistently at the system level and the ratio of these are
> > not very meaningful. The pgsteal and pgscan are updated for only global
> > reclaim while pgrefill gets updated for global as well as cgroup
> > reclaim.
> >
>
> Hi Shakeel,
>
> We always use pgscan and pgsteal for monitoring the system level
> memory pressure, for example, by using sysstat(sar) or some other
> monitor tools.

Don't you need pgrefill in addition to pgscan and pgsteal to get the
full picture of the reclaim activity?

> But with this change, these two counters include the memcg pressure as
> well. It is not easy to know whether the pgscan and pgsteal are caused
> by system level pressure or only some specific memcgs reaching their
> memory limit.
>
> How about adding  cgroup_reclaim() to pgrefill as well ?
>

I am looking for all the reclaim activity on the system. Adding
!cgroup_reclaim to pgrefill will skip the cgroup reclaim activity.
Maybe adding pgsteal_cgroup and pgscan_cgroup would be better.

> > Please note that this difference is only for system level vmstats. The
> > cgroup stats returned by memory.stat are actually consistent. The
> > cgroup's pgsteal contains number of reclaimed pages for global as well
> > as cgroup reclaim. So, one way to get the system level stats is to get
> > these stats from root's memory.stat but root does not expose that
> > interface. Also for !CONFIG_MEMCG machines /proc/vmstat is the only way
> > to get these stats. So, make these stats consistent.
> >
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
> > ---
> >  mm/vmscan.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index cc555903a332..51f7d1efc912 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1943,8 +1943,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >         reclaim_stat->recent_scanned[file] += nr_taken;
> >
> >         item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT;
> > -       if (!cgroup_reclaim(sc))
> > -               __count_vm_events(item, nr_scanned);
> > +       __count_vm_events(item, nr_scanned);
> >         __count_memcg_events(lruvec_memcg(lruvec), item, nr_scanned);
> >         spin_unlock_irq(&pgdat->lru_lock);
> >
> > @@ -1957,8 +1956,7 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec,
> >         spin_lock_irq(&pgdat->lru_lock);
> >
> >         item = current_is_kswapd() ? PGSTEAL_KSWAPD : PGSTEAL_DIRECT;
> > -       if (!cgroup_reclaim(sc))
> > -               __count_vm_events(item, nr_reclaimed);
> > +       __count_vm_events(item, nr_reclaimed);
> >         __count_memcg_events(lruvec_memcg(lruvec), item, nr_reclaimed);
> >         reclaim_stat->recent_rotated[0] += stat.nr_activate[0];
> >         reclaim_stat->recent_rotated[1] += stat.nr_activate[1];
> > --
> > 2.26.2.526.g744177e7f7-goog
> >
> >
>
>
> --
> Thanks
> Yafang


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
  2020-05-08 13:25     ` Shakeel Butt
  (?)
@ 2020-05-08 13:38     ` Johannes Weiner
  2020-05-08 14:05         ` Shakeel Butt
  2020-05-09  6:53         ` Yafang Shao
  -1 siblings, 2 replies; 12+ messages in thread
From: Johannes Weiner @ 2020-05-08 13:38 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Yafang Shao, Mel Gorman, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 08, 2020 at 06:25:14AM -0700, Shakeel Butt wrote:
> On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
> > >
> > > One way to measure the efficiency of memory reclaim is to look at the
> > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > > not updated consistently at the system level and the ratio of these are
> > > not very meaningful. The pgsteal and pgscan are updated for only global
> > > reclaim while pgrefill gets updated for global as well as cgroup
> > > reclaim.
> > >
> >
> > Hi Shakeel,
> >
> > We always use pgscan and pgsteal for monitoring the system level
> > memory pressure, for example, by using sysstat(sar) or some other
> > monitor tools.

I'm in the same boat. It's useful to have activity that happens purely
due to machine capacity rather than localized activity that happens
due to the limits throughout the cgroup tree.

> Don't you need pgrefill in addition to pgscan and pgsteal to get the
> full picture of the reclaim activity?

I actually almost never look at pgrefill.

> > But with this change, these two counters include the memcg pressure as
> > well. It is not easy to know whether the pgscan and pgsteal are caused
> > by system level pressure or only some specific memcgs reaching their
> > memory limit.
> >
> > How about adding  cgroup_reclaim() to pgrefill as well ?
> >
> 
> I am looking for all the reclaim activity on the system. Adding
> !cgroup_reclaim to pgrefill will skip the cgroup reclaim activity.
> Maybe adding pgsteal_cgroup and pgscan_cgroup would be better.

How would you feel about adding memory.stat at the root cgroup level?

There are subtle differences between /proc/vmstat and memory.stat, and
cgroup-aware code that wants to watch the full hierarchy currently has
to know about these intricacies and translate semantics back and forth.

Generally having the fully recursive memory.stat at the root level
could help a broader range of usecases.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
  2020-05-08 13:38     ` Johannes Weiner
@ 2020-05-08 14:05         ` Shakeel Butt
  2020-05-09  6:53         ` Yafang Shao
  1 sibling, 0 replies; 12+ messages in thread
From: Shakeel Butt @ 2020-05-08 14:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yafang Shao, Mel Gorman, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 6:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, May 08, 2020 at 06:25:14AM -0700, Shakeel Butt wrote:
> > On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
> > > >
> > > > One way to measure the efficiency of memory reclaim is to look at the
> > > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > > > not updated consistently at the system level and the ratio of these are
> > > > not very meaningful. The pgsteal and pgscan are updated for only global
> > > > reclaim while pgrefill gets updated for global as well as cgroup
> > > > reclaim.
> > > >
> > >
> > > Hi Shakeel,
> > >
> > > We always use pgscan and pgsteal for monitoring the system level
> > > memory pressure, for example, by using sysstat(sar) or some other
> > > monitor tools.
>
> I'm in the same boat. It's useful to have activity that happens purely
> due to machine capacity rather than localized activity that happens
> due to the limits throughout the cgroup tree.
>
> > Don't you need pgrefill in addition to pgscan and pgsteal to get the
> > full picture of the reclaim activity?
>
> I actually almost never look at pgrefill.
>

Nowadays we are looking at reclaim cost on high utilization
machines/devices and noticed that rmap walk takes more than 60/70% of
the CPU cost of the reclaim. Kernel does rmap walks in
shrink_active_list and shrink_page_list and pgscan and pgrefill are
good approximations of the number of rmap walks during a reclaim.

> > > But with this change, these two counters include the memcg pressure as
> > > well. It is not easy to know whether the pgscan and pgsteal are caused
> > > by system level pressure or only some specific memcgs reaching their
> > > memory limit.
> > >
> > > How about adding  cgroup_reclaim() to pgrefill as well ?
> > >
> >
> > I am looking for all the reclaim activity on the system. Adding
> > !cgroup_reclaim to pgrefill will skip the cgroup reclaim activity.
> > Maybe adding pgsteal_cgroup and pgscan_cgroup would be better.
>
> How would you feel about adding memory.stat at the root cgroup level?
>

Actually I would prefer adding memory.stat at the root cgroup level as
you noted below that more use-cases would benefit from it.

> There are subtle differences between /proc/vmstat and memory.stat, and
> cgroup-aware code that wants to watch the full hierarchy currently has
> to know about these intricacies and translate semantics back and forth.
>
> Generally having the fully recursive memory.stat at the root level
> could help a broader range of usecases.

Thanks for the feedback. I will send the patch with the additional motivation.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
@ 2020-05-08 14:05         ` Shakeel Butt
  0 siblings, 0 replies; 12+ messages in thread
From: Shakeel Butt @ 2020-05-08 14:05 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Yafang Shao, Mel Gorman, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 6:38 AM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, May 08, 2020 at 06:25:14AM -0700, Shakeel Butt wrote:
> > On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
> > > >
> > > > One way to measure the efficiency of memory reclaim is to look at the
> > > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > > > not updated consistently at the system level and the ratio of these are
> > > > not very meaningful. The pgsteal and pgscan are updated for only global
> > > > reclaim while pgrefill gets updated for global as well as cgroup
> > > > reclaim.
> > > >
> > >
> > > Hi Shakeel,
> > >
> > > We always use pgscan and pgsteal for monitoring the system level
> > > memory pressure, for example, by using sysstat(sar) or some other
> > > monitor tools.
>
> I'm in the same boat. It's useful to have activity that happens purely
> due to machine capacity rather than localized activity that happens
> due to the limits throughout the cgroup tree.
>
> > Don't you need pgrefill in addition to pgscan and pgsteal to get the
> > full picture of the reclaim activity?
>
> I actually almost never look at pgrefill.
>

Nowadays we are looking at reclaim cost on high utilization
machines/devices and noticed that rmap walk takes more than 60/70% of
the CPU cost of the reclaim. Kernel does rmap walks in
shrink_active_list and shrink_page_list and pgscan and pgrefill are
good approximations of the number of rmap walks during a reclaim.

> > > But with this change, these two counters include the memcg pressure as
> > > well. It is not easy to know whether the pgscan and pgsteal are caused
> > > by system level pressure or only some specific memcgs reaching their
> > > memory limit.
> > >
> > > How about adding  cgroup_reclaim() to pgrefill as well ?
> > >
> >
> > I am looking for all the reclaim activity on the system. Adding
> > !cgroup_reclaim to pgrefill will skip the cgroup reclaim activity.
> > Maybe adding pgsteal_cgroup and pgscan_cgroup would be better.
>
> How would you feel about adding memory.stat at the root cgroup level?
>

Actually I would prefer adding memory.stat at the root cgroup level as
you noted below that more use-cases would benefit from it.

> There are subtle differences between /proc/vmstat and memory.stat, and
> cgroup-aware code that wants to watch the full hierarchy currently has
> to know about these intricacies and translate semantics back and forth.
>
> Generally having the fully recursive memory.stat at the root level
> could help a broader range of usecases.

Thanks for the feedback. I will send the patch with the additional motivation.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
  2020-05-08 13:38     ` Johannes Weiner
@ 2020-05-09  6:53         ` Yafang Shao
  2020-05-09  6:53         ` Yafang Shao
  1 sibling, 0 replies; 12+ messages in thread
From: Yafang Shao @ 2020-05-09  6:53 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shakeel Butt, Mel Gorman, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 9:38 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, May 08, 2020 at 06:25:14AM -0700, Shakeel Butt wrote:
> > On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
> > > >
> > > > One way to measure the efficiency of memory reclaim is to look at the
> > > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > > > not updated consistently at the system level and the ratio of these are
> > > > not very meaningful. The pgsteal and pgscan are updated for only global
> > > > reclaim while pgrefill gets updated for global as well as cgroup
> > > > reclaim.
> > > >
> > >
> > > Hi Shakeel,
> > >
> > > We always use pgscan and pgsteal for monitoring the system level
> > > memory pressure, for example, by using sysstat(sar) or some other
> > > monitor tools.
>
> I'm in the same boat. It's useful to have activity that happens purely
> due to machine capacity rather than localized activity that happens
> due to the limits throughout the cgroup tree.
>

Hi Johannes,

When I used PSI to monitor memory pressure, I found there's the same
behavoir in PSI that /proc/pressure/{memroy, IO} can be very large due
to some limited cgroups rather the machine capacity.
Should we separate /proc/pressure/XXX from /sys/fs/cgroup/XXX.pressure
as well ? Then /proc/pressure/XXX only indicate the pressure due to
machine capacity and /sys/fs/cgroup/XXX.presssure show the pressure
throughout the cgroup tree.

Besides that, there's another difference between /proc/pressure/XXX
and /sys/fs/cgroup/XXX.pressure, which is when you disable the psi
(i.e. psi=n) /proc/pressure/ will disapear but
/sys/fs/cgroup/XXX.pressure still exist.  If we separate them, this
difference will be reasonable.

> > Don't you need pgrefill in addition to pgscan and pgsteal to get the
> > full picture of the reclaim activity?
>
> I actually almost never look at pgrefill.
>
> > > But with this change, these two counters include the memcg pressure as
> > > well. It is not easy to know whether the pgscan and pgsteal are caused
> > > by system level pressure or only some specific memcgs reaching their
> > > memory limit.
> > >
> > > How about adding  cgroup_reclaim() to pgrefill as well ?
> > >
> >
> > I am looking for all the reclaim activity on the system. Adding
> > !cgroup_reclaim to pgrefill will skip the cgroup reclaim activity.
> > Maybe adding pgsteal_cgroup and pgscan_cgroup would be better.
>
> How would you feel about adding memory.stat at the root cgroup level?
>
> There are subtle differences between /proc/vmstat and memory.stat, and
> cgroup-aware code that wants to watch the full hierarchy currently has
> to know about these intricacies and translate semantics back and forth.
>
> Generally having the fully recursive memory.stat at the root level
> could help a broader range of usecases.



-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm: vmscan: consistent update to pgsteal and pgscan
@ 2020-05-09  6:53         ` Yafang Shao
  0 siblings, 0 replies; 12+ messages in thread
From: Yafang Shao @ 2020-05-09  6:53 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Shakeel Butt, Mel Gorman, Roman Gushchin, Michal Hocko,
	Andrew Morton, Linux MM, LKML

On Fri, May 8, 2020 at 9:38 PM Johannes Weiner <hannes@cmpxchg.org> wrote:
>
> On Fri, May 08, 2020 at 06:25:14AM -0700, Shakeel Butt wrote:
> > On Fri, May 8, 2020 at 3:34 AM Yafang Shao <laoar.shao@gmail.com> wrote:
> > >
> > > On Fri, May 8, 2020 at 4:49 AM Shakeel Butt <shakeelb@google.com> wrote:
> > > >
> > > > One way to measure the efficiency of memory reclaim is to look at the
> > > > ratio (pgscan+pfrefill)/pgsteal. However at the moment these stats are
> > > > not updated consistently at the system level and the ratio of these are
> > > > not very meaningful. The pgsteal and pgscan are updated for only global
> > > > reclaim while pgrefill gets updated for global as well as cgroup
> > > > reclaim.
> > > >
> > >
> > > Hi Shakeel,
> > >
> > > We always use pgscan and pgsteal for monitoring the system level
> > > memory pressure, for example, by using sysstat(sar) or some other
> > > monitor tools.
>
> I'm in the same boat. It's useful to have activity that happens purely
> due to machine capacity rather than localized activity that happens
> due to the limits throughout the cgroup tree.
>

Hi Johannes,

When I used PSI to monitor memory pressure, I found there's the same
behavoir in PSI that /proc/pressure/{memroy, IO} can be very large due
to some limited cgroups rather the machine capacity.
Should we separate /proc/pressure/XXX from /sys/fs/cgroup/XXX.pressure
as well ? Then /proc/pressure/XXX only indicate the pressure due to
machine capacity and /sys/fs/cgroup/XXX.presssure show the pressure
throughout the cgroup tree.

Besides that, there's another difference between /proc/pressure/XXX
and /sys/fs/cgroup/XXX.pressure, which is when you disable the psi
(i.e. psi=n) /proc/pressure/ will disapear but
/sys/fs/cgroup/XXX.pressure still exist.  If we separate them, this
difference will be reasonable.

> > Don't you need pgrefill in addition to pgscan and pgsteal to get the
> > full picture of the reclaim activity?
>
> I actually almost never look at pgrefill.
>
> > > But with this change, these two counters include the memcg pressure as
> > > well. It is not easy to know whether the pgscan and pgsteal are caused
> > > by system level pressure or only some specific memcgs reaching their
> > > memory limit.
> > >
> > > How about adding  cgroup_reclaim() to pgrefill as well ?
> > >
> >
> > I am looking for all the reclaim activity on the system. Adding
> > !cgroup_reclaim to pgrefill will skip the cgroup reclaim activity.
> > Maybe adding pgsteal_cgroup and pgscan_cgroup would be better.
>
> How would you feel about adding memory.stat at the root cgroup level?
>
> There are subtle differences between /proc/vmstat and memory.stat, and
> cgroup-aware code that wants to watch the full hierarchy currently has
> to know about these intricacies and translate semantics back and forth.
>
> Generally having the fully recursive memory.stat at the root level
> could help a broader range of usecases.



-- 
Thanks
Yafang


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-05-09  6:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-07 20:49 [PATCH] mm: vmscan: consistent update to pgsteal and pgscan Shakeel Butt
2020-05-07 20:49 ` Shakeel Butt
2020-05-07 22:28 ` Roman Gushchin
2020-05-08 10:34 ` Yafang Shao
2020-05-08 10:34   ` Yafang Shao
2020-05-08 13:25   ` Shakeel Butt
2020-05-08 13:25     ` Shakeel Butt
2020-05-08 13:38     ` Johannes Weiner
2020-05-08 14:05       ` Shakeel Butt
2020-05-08 14:05         ` Shakeel Butt
2020-05-09  6:53       ` Yafang Shao
2020-05-09  6:53         ` Yafang Shao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.