linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] mm: memcontrol: fix kernel stack account
@ 2021-03-03  9:39 Muchun Song
  2021-03-03 10:25 ` Michal Hocko
  2021-03-03 15:14 ` Johannes Weiner
  0 siblings, 2 replies; 6+ messages in thread
From: Muchun Song @ 2021-03-03  9:39 UTC (permalink / raw)
  To: guro, hannes, mhocko, akpm, shakeelb; +Cc: linux-kernel, linux-mm, Muchun Song

For simplification 991e7673859e ("mm: memcontrol: account kernel stack
per node") has changed the per zone vmalloc backed stack pages
accounting to per node. By doing that we have lost a certain precision
because those pages might live in different NUMA nodes. In the end
NR_KERNEL_STACK_KB exported to the userspace might be over estimated on
some nodes while underestimated on others.

This doesn't impose any real problem to correctnes of the kernel
behavior as the counter is not used for any internal processing but it
can cause some confusion to the userspace.

Address the problem by accounting each vmalloc backing page to its own
node.

Fixes: 991e7673859e ("mm: memcontrol: account kernel stack per node")
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
---
Changelog in v2:
 - Rework commit log suggested by Michal.

 Thanks to Michal and Shakeel for review.

 kernel/fork.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index d66cd1014211..6e2201feb524 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -379,14 +379,19 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
 	void *stack = task_stack_page(tsk);
 	struct vm_struct *vm = task_stack_vm_area(tsk);
 
+	if (vm) {
+		int i;
 
-	/* All stack pages are in the same node. */
-	if (vm)
-		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
-				      account * (THREAD_SIZE / 1024));
-	else
+		BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE);
+
+		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
+			mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB,
+					      account * (PAGE_SIZE / 1024));
+	} else {
+		/* All stack pages are in the same node. */
 		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
 				      account * (THREAD_SIZE / 1024));
+	}
 }
 
 static int memcg_charge_kernel_stack(struct task_struct *tsk)
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: memcontrol: fix kernel stack account
  2021-03-03  9:39 [PATCH v2] mm: memcontrol: fix kernel stack account Muchun Song
@ 2021-03-03 10:25 ` Michal Hocko
  2021-03-03 13:27   ` [External] " Muchun Song
  2021-03-03 15:14 ` Johannes Weiner
  1 sibling, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2021-03-03 10:25 UTC (permalink / raw)
  To: Muchun Song; +Cc: guro, hannes, akpm, shakeelb, linux-kernel, linux-mm

On Wed 03-03-21 17:39:56, Muchun Song wrote:
> For simplification 991e7673859e ("mm: memcontrol: account kernel stack
> per node") has changed the per zone vmalloc backed stack pages
> accounting to per node. By doing that we have lost a certain precision
> because those pages might live in different NUMA nodes. In the end
> NR_KERNEL_STACK_KB exported to the userspace might be over estimated on
> some nodes while underestimated on others.
> 
> This doesn't impose any real problem to correctnes of the kernel
> behavior as the counter is not used for any internal processing but it
> can cause some confusion to the userspace.

You have skipped over one part of the changelog I have proposed and that
is to provide an actual data.

> Address the problem by accounting each vmalloc backing page to its own
> node.
> 
> Fixes: 991e7673859e ("mm: memcontrol: account kernel stack per node")

Fixes tag might make somebody assume this is worth backporting but I
highly doubt so.

> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

Anyway
Acked-by: Michal Hocko <mhocko@suse.com>

as the patch is correct with one comment below

> ---
> Changelog in v2:
>  - Rework commit log suggested by Michal.
> 
>  Thanks to Michal and Shakeel for review.
> 
>  kernel/fork.c | 15 ++++++++++-----
>  1 file changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/fork.c b/kernel/fork.c
> index d66cd1014211..6e2201feb524 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -379,14 +379,19 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>  	void *stack = task_stack_page(tsk);
>  	struct vm_struct *vm = task_stack_vm_area(tsk);
>  
> +	if (vm) {
> +		int i;
>  
> -	/* All stack pages are in the same node. */
> -	if (vm)
> -		mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
> -				      account * (THREAD_SIZE / 1024));
> -	else
> +		BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE);

I do not think we need this BUG_ON. What kind of purpose does it serve?

> +
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
> +			mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB,
> +					      account * (PAGE_SIZE / 1024));
> +	} else {
> +		/* All stack pages are in the same node. */
>  		mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
>  				      account * (THREAD_SIZE / 1024));
> +	}
>  }
>  
>  static int memcg_charge_kernel_stack(struct task_struct *tsk)
> -- 
> 2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH v2] mm: memcontrol: fix kernel stack account
  2021-03-03 10:25 ` Michal Hocko
@ 2021-03-03 13:27   ` Muchun Song
  2021-03-03 14:02     ` Michal Hocko
  0 siblings, 1 reply; 6+ messages in thread
From: Muchun Song @ 2021-03-03 13:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Shakeel Butt,
	LKML, Linux Memory Management List

On Wed, Mar 3, 2021 at 6:25 PM Michal Hocko <mhocko@suse.com> wrote:
>
> On Wed 03-03-21 17:39:56, Muchun Song wrote:
> > For simplification 991e7673859e ("mm: memcontrol: account kernel stack
> > per node") has changed the per zone vmalloc backed stack pages
> > accounting to per node. By doing that we have lost a certain precision
> > because those pages might live in different NUMA nodes. In the end
> > NR_KERNEL_STACK_KB exported to the userspace might be over estimated on
> > some nodes while underestimated on others.
> >
> > This doesn't impose any real problem to correctnes of the kernel
> > behavior as the counter is not used for any internal processing but it
> > can cause some confusion to the userspace.
>
> You have skipped over one part of the changelog I have proposed and that
> is to provide an actual data.

Because this is a problem I found by looking at the code, not a real world
problem. I do not have any actual data. :-(

>
> > Address the problem by accounting each vmalloc backing page to its own
> > node.
> >
> > Fixes: 991e7673859e ("mm: memcontrol: account kernel stack per node")
>
> Fixes tag might make somebody assume this is worth backporting but I
> highly doubt so.

OK. I can remove the Fixes tag.

>
> > Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> > Reviewed-by: Shakeel Butt <shakeelb@google.com>
>
> Anyway
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks for your review.

>
> as the patch is correct with one comment below
>
> > ---
> > Changelog in v2:
> >  - Rework commit log suggested by Michal.
> >
> >  Thanks to Michal and Shakeel for review.
> >
> >  kernel/fork.c | 15 ++++++++++-----
> >  1 file changed, 10 insertions(+), 5 deletions(-)
> >
> > diff --git a/kernel/fork.c b/kernel/fork.c
> > index d66cd1014211..6e2201feb524 100644
> > --- a/kernel/fork.c
> > +++ b/kernel/fork.c
> > @@ -379,14 +379,19 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
> >       void *stack = task_stack_page(tsk);
> >       struct vm_struct *vm = task_stack_vm_area(tsk);
> >
> > +     if (vm) {
> > +             int i;
> >
> > -     /* All stack pages are in the same node. */
> > -     if (vm)
> > -             mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
> > -                                   account * (THREAD_SIZE / 1024));
> > -     else
> > +             BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE);
>
> I do not think we need this BUG_ON. What kind of purpose does it serve?

vm->nr_pages should be always equal to THREAD_SIZE / PAGE_SIZE
if the system is not corrupted. It makes sense to remove the BUG_ON.
I will remove it in the next version. Thanks.

>
> > +
> > +             for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++)
> > +                     mod_lruvec_page_state(vm->pages[i], NR_KERNEL_STACK_KB,
> > +                                           account * (PAGE_SIZE / 1024));
> > +     } else {
> > +             /* All stack pages are in the same node. */
> >               mod_lruvec_kmem_state(stack, NR_KERNEL_STACK_KB,
> >                                     account * (THREAD_SIZE / 1024));
> > +     }
> >  }
> >
> >  static int memcg_charge_kernel_stack(struct task_struct *tsk)
> > --
> > 2.11.0
>
> --
> Michal Hocko
> SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH v2] mm: memcontrol: fix kernel stack account
  2021-03-03 13:27   ` [External] " Muchun Song
@ 2021-03-03 14:02     ` Michal Hocko
  2021-03-03 14:42       ` Shakeel Butt
  0 siblings, 1 reply; 6+ messages in thread
From: Michal Hocko @ 2021-03-03 14:02 UTC (permalink / raw)
  To: Muchun Song
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Shakeel Butt,
	LKML, Linux Memory Management List

On Wed 03-03-21 21:27:24, Muchun Song wrote:
> On Wed, Mar 3, 2021 at 6:25 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Wed 03-03-21 17:39:56, Muchun Song wrote:
> > > For simplification 991e7673859e ("mm: memcontrol: account kernel stack
> > > per node") has changed the per zone vmalloc backed stack pages
> > > accounting to per node. By doing that we have lost a certain precision
> > > because those pages might live in different NUMA nodes. In the end
> > > NR_KERNEL_STACK_KB exported to the userspace might be over estimated on
> > > some nodes while underestimated on others.
> > >
> > > This doesn't impose any real problem to correctnes of the kernel
> > > behavior as the counter is not used for any internal processing but it
> > > can cause some confusion to the userspace.
> >
> > You have skipped over one part of the changelog I have proposed and that
> > is to provide an actual data.
> 
> Because this is a problem I found by looking at the code, not a real world
> problem. I do not have any actual data. :-(

As I've mentioned several times already, this is all fine but it should
be made explicit in the changelog. There might be people spending their
time to evaluate this code to find out whether this is something that
somebody depend on.

[...]
> > > -     /* All stack pages are in the same node. */
> > > -     if (vm)
> > > -             mod_lruvec_page_state(vm->pages[0], NR_KERNEL_STACK_KB,
> > > -                                   account * (THREAD_SIZE / 1024));
> > > -     else
> > > +             BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE);
> >
> > I do not think we need this BUG_ON. What kind of purpose does it serve?
> 
> vm->nr_pages should be always equal to THREAD_SIZE / PAGE_SIZE
> if the system is not corrupted.

BUG_ON is not an annotation for "this shouldn't happen". Even if the
system was corrupted and nr_pages wouldn't match then this is not a
reason to crash the kernel right away.

In general there should be a very _strong_ reason to add a BUG_ON.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [External] Re: [PATCH v2] mm: memcontrol: fix kernel stack account
  2021-03-03 14:02     ` Michal Hocko
@ 2021-03-03 14:42       ` Shakeel Butt
  0 siblings, 0 replies; 6+ messages in thread
From: Shakeel Butt @ 2021-03-03 14:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Muchun Song, Roman Gushchin, Johannes Weiner, Andrew Morton,
	LKML, Linux Memory Management List

On Wed, Mar 3, 2021 at 6:02 AM Michal Hocko <mhocko@suse.com> wrote:
>
[...]
> > > > +             BUG_ON(vm->nr_pages != THREAD_SIZE / PAGE_SIZE);
> > >
> > > I do not think we need this BUG_ON. What kind of purpose does it serve?
> >
> > vm->nr_pages should be always equal to THREAD_SIZE / PAGE_SIZE
> > if the system is not corrupted.
>
> BUG_ON is not an annotation for "this shouldn't happen". Even if the
> system was corrupted and nr_pages wouldn't match then this is not a
> reason to crash the kernel right away.
>
> In general there should be a very _strong_ reason to add a BUG_ON.
>

I agree with Michal. We should remove this BUG_ON or at least convert
it into VM_BUG_ON.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm: memcontrol: fix kernel stack account
  2021-03-03  9:39 [PATCH v2] mm: memcontrol: fix kernel stack account Muchun Song
  2021-03-03 10:25 ` Michal Hocko
@ 2021-03-03 15:14 ` Johannes Weiner
  1 sibling, 0 replies; 6+ messages in thread
From: Johannes Weiner @ 2021-03-03 15:14 UTC (permalink / raw)
  To: Muchun Song; +Cc: guro, mhocko, akpm, shakeelb, linux-kernel, linux-mm

On Wed, Mar 03, 2021 at 05:39:56PM +0800, Muchun Song wrote:
> For simplification 991e7673859e ("mm: memcontrol: account kernel stack
> per node") has changed the per zone vmalloc backed stack pages
> accounting to per node. By doing that we have lost a certain precision
> because those pages might live in different NUMA nodes. In the end
> NR_KERNEL_STACK_KB exported to the userspace might be over estimated on
> some nodes while underestimated on others.
> 
> This doesn't impose any real problem to correctnes of the kernel
> behavior as the counter is not used for any internal processing but it
> can cause some confusion to the userspace.
> 
> Address the problem by accounting each vmalloc backing page to its own
> node.
> 
> Fixes: 991e7673859e ("mm: memcontrol: account kernel stack per node")
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Reviewed-by: Shakeel Butt <shakeelb@google.com>

With changes proposed by Shakeel and Michal, this looks good to me.

Acked-by: Johannes Weiner <hannes@cmpxchg.org>

I guess the BUG_ON() was inspired by memcg_charge_kernel_stack() - not
really your fault for following that example. But yeah, please drop it
from your patch. Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-03-03 18:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-03  9:39 [PATCH v2] mm: memcontrol: fix kernel stack account Muchun Song
2021-03-03 10:25 ` Michal Hocko
2021-03-03 13:27   ` [External] " Muchun Song
2021-03-03 14:02     ` Michal Hocko
2021-03-03 14:42       ` Shakeel Butt
2021-03-03 15:14 ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).