All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] workingset: fix confusion around eviction vs refault container
@ 2023-01-04 22:29 Nhat Pham
  2023-01-04 23:22 ` Nhat Pham
  0 siblings, 1 reply; 3+ messages in thread
From: Nhat Pham @ 2023-01-04 22:29 UTC (permalink / raw)
  To: akpm; +Cc: hannes, linux-mm, linux-kernel, bfoster, willy, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>

Refault decisions are made based on the lruvec where the page was
evicted, as that determined its LRU order while it was alive. Stats
and workingset aging must then occur on the lruvec of the new page, as
that's the node and cgroup that experience the refault and that's the
lruvec whose nonresident info ages out by a new resident page. Those
lruvecs could be different when a page is shared between cgroups, or
the refaulting page is allocated on a different node.

There are currently two mix-ups:

1. When swap is available, the resident anon set must be considered
   when comparing the refault distance. The comparison is made against
   the right anon set, but the check for swap is not. When pages get
   evicted from a cgroup with swap, and refault in one without, this
   can incorrectly consider a hot refault as cold - and vice
   versa. Fix that by using the eviction cgroup for the swap check.

2. The stats and workingset age are updated against the wrong lruvec
   altogether: the right cgroup but the wrong NUMA node. When a page
   refaults on a different NUMA node, this will have confusing stats
   and distort the workingset age on a different lruvec - again
   possibly resulting in hot/cold misclassifications down the line.

Fix the swap check and the refault pgdat to address both concerns.

This was found during code review. It hasn't caused notable issues in
production, suggesting that those refault-migrations are relatively
rare in practice.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Co-developed-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/workingset.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index ae7e984b23c6..79585d55c45d 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -457,6 +457,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 	 */
 	nr = folio_nr_pages(folio);
 	memcg = folio_memcg(folio);
+	pgdat = folio_pgdat(folio);
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
@@ -474,7 +475,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 		workingset_size += lruvec_page_state(eviction_lruvec,
 						     NR_INACTIVE_FILE);
 	}
-	if (mem_cgroup_get_nr_swap_pages(memcg) > 0) {
+	if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) {
 		workingset_size += lruvec_page_state(eviction_lruvec,
 						     NR_ACTIVE_ANON);
 		if (file) {
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] workingset: fix confusion around eviction vs refault container
  2023-01-04 22:29 [PATCH] workingset: fix confusion around eviction vs refault container Nhat Pham
@ 2023-01-04 23:22 ` Nhat Pham
  0 siblings, 0 replies; 3+ messages in thread
From: Nhat Pham @ 2023-01-04 23:22 UTC (permalink / raw)
  To: akpm; +Cc: hannes, linux-mm, linux-kernel, bfoster, willy, kernel-team

On Wed, Jan 4, 2023 at 2:29 PM Nhat Pham <nphamcs@gmail.com> wrote:
>
> From: Johannes Weiner <hannes@cmpxchg.org>
>
> Refault decisions are made based on the lruvec where the page was
> evicted, as that determined its LRU order while it was alive. Stats
> and workingset aging must then occur on the lruvec of the new page, as
> that's the node and cgroup that experience the refault and that's the
> lruvec whose nonresident info ages out by a new resident page. Those
> lruvecs could be different when a page is shared between cgroups, or
> the refaulting page is allocated on a different node.
>
> There are currently two mix-ups:
>
> 1. When swap is available, the resident anon set must be considered
>    when comparing the refault distance. The comparison is made against
>    the right anon set, but the check for swap is not. When pages get
>    evicted from a cgroup with swap, and refault in one without, this
>    can incorrectly consider a hot refault as cold - and vice
>    versa. Fix that by using the eviction cgroup for the swap check.
>
> 2. The stats and workingset age are updated against the wrong lruvec
>    altogether: the right cgroup but the wrong NUMA node. When a page
>    refaults on a different NUMA node, this will have confusing stats
>    and distort the workingset age on a different lruvec - again
>    possibly resulting in hot/cold misclassifications down the line.
>
> Fix the swap check and the refault pgdat to address both concerns.
>
> This was found during code review. It hasn't caused notable issues in
> production, suggesting that those refault-migrations are relatively
> rare in practice.
>
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> Co-developed-by: Nhat Pham <nphamcs@gmail.com>
> Signed-off-by: Nhat Pham <nphamcs@gmail.com>
> ---
>  mm/workingset.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/mm/workingset.c b/mm/workingset.c
> index ae7e984b23c6..79585d55c45d 100644
> --- a/mm/workingset.c
> +++ b/mm/workingset.c
> @@ -457,6 +457,7 @@ void workingset_refault(struct folio *folio, void *shadow)
>          */
>         nr = folio_nr_pages(folio);
>         memcg = folio_memcg(folio);
> +       pgdat = folio_pgdat(folio);
>         lruvec = mem_cgroup_lruvec(memcg, pgdat);
>
>         mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
> @@ -474,7 +475,7 @@ void workingset_refault(struct folio *folio, void *shadow)
>                 workingset_size += lruvec_page_state(eviction_lruvec,
>                                                      NR_INACTIVE_FILE);
>         }
> -       if (mem_cgroup_get_nr_swap_pages(memcg) > 0) {
> +       if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) {
>                 workingset_size += lruvec_page_state(eviction_lruvec,
>                                                      NR_ACTIVE_ANON);
>                 if (file) {
> --
> 2.30.2

Oh this one is sent out twice too... Something is wrong with my pipeline...
Anyway, please disregard the first email and only review this one!

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH] workingset: fix confusion around eviction vs refault container
@ 2023-01-04 22:29 Nhat Pham
  0 siblings, 0 replies; 3+ messages in thread
From: Nhat Pham @ 2023-01-04 22:29 UTC (permalink / raw)
  To: akpm; +Cc: hannes, linux-mm, linux-kernel, bfoster, willy, kernel-team

From: Johannes Weiner <hannes@cmpxchg.org>

Refault decisions are made based on the lruvec where the page was
evicted, as that determined its LRU order while it was alive. Stats
and workingset aging must then occur on the lruvec of the new page, as
that's the node and cgroup that experience the refault and that's the
lruvec whose nonresident info ages out by a new resident page. Those
lruvecs could be different when a page is shared between cgroups, or
the refaulting page is allocated on a different node.

There are currently two mix-ups:

1. When swap is available, the resident anon set must be considered
   when comparing the refault distance. The comparison is made against
   the right anon set, but the check for swap is not. When pages get
   evicted from a cgroup with swap, and refault in one without, this
   can incorrectly consider a hot refault as cold - and vice
   versa. Fix that by using the eviction cgroup for the swap check.

2. The stats and workingset age are updated against the wrong lruvec
   altogether: the right cgroup but the wrong NUMA node. When a page
   refaults on a different NUMA node, this will have confusing stats
   and distort the workingset age on a different lruvec - again
   possibly resulting in hot/cold misclassifications down the line.

Fix the swap check and the refault pgdat to address both concerns.

This was found during code review. It hasn't caused notable issues in
production, suggesting that those refault-migrations are relatively
rare in practice.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Co-developed-by: Nhat Pham <nphamcs@gmail.com>
Signed-off-by: Nhat Pham <nphamcs@gmail.com>
---
 mm/workingset.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/workingset.c b/mm/workingset.c
index ae7e984b23c6..79585d55c45d 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -457,6 +457,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 	 */
 	nr = folio_nr_pages(folio);
 	memcg = folio_memcg(folio);
+	pgdat = folio_pgdat(folio);
 	lruvec = mem_cgroup_lruvec(memcg, pgdat);
 
 	mod_lruvec_state(lruvec, WORKINGSET_REFAULT_BASE + file, nr);
@@ -474,7 +475,7 @@ void workingset_refault(struct folio *folio, void *shadow)
 		workingset_size += lruvec_page_state(eviction_lruvec,
 						     NR_INACTIVE_FILE);
 	}
-	if (mem_cgroup_get_nr_swap_pages(memcg) > 0) {
+	if (mem_cgroup_get_nr_swap_pages(eviction_memcg) > 0) {
 		workingset_size += lruvec_page_state(eviction_lruvec,
 						     NR_ACTIVE_ANON);
 		if (file) {
-- 
2.30.2

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-01-04 23:24 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-04 22:29 [PATCH] workingset: fix confusion around eviction vs refault container Nhat Pham
2023-01-04 23:22 ` Nhat Pham
  -- strict thread matches above, loose matches on Subject: below --
2023-01-04 22:29 Nhat Pham

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.