All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Roman Gushchin <guro@fb.com>
Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>,
	<kernel-team@fb.com>, Shakeel Butt <shakeelb@google.com>,
	Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andy Lutomirski <luto@kernel.org>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH v3 1/3] mm: rework memcg kernel stack accounting
Date: Mon, 27 Aug 2018 14:01:43 -0700	[thread overview]
Message-ID: <20180827140143.98b65bc7cb32f50245eb9114@linux-foundation.org> (raw)
In-Reply-To: <20180827162621.30187-1-guro@fb.com>

On Mon, 27 Aug 2018 09:26:19 -0700 Roman Gushchin <guro@fb.com> wrote:

> If CONFIG_VMAP_STACK is set, kernel stacks are allocated
> using __vmalloc_node_range() with __GFP_ACCOUNT. So kernel
> stack pages are charged against corresponding memory cgroups
> on allocation and uncharged on releasing them.
> 
> The problem is that we do cache kernel stacks in small
> per-cpu caches and do reuse them for new tasks, which can
> belong to different memory cgroups.
> 
> Each stack page still holds a reference to the original cgroup,
> so the cgroup can't be released until the vmap area is released.
> 
> To make this happen we need more than two subsequent exits
> without forks in between on the current cpu, which makes it
> very unlikely to happen. As a result, I saw a significant number
> of dying cgroups (in theory, up to 2 * number_of_cpu +
> number_of_tasks), which can't be released even by significant
> memory pressure.
> 
> As a cgroup structure can take a significant amount of memory
> (first of all, per-cpu data like memcg statistics), it leads
> to a noticeable waste of memory.

OK, but this doesn't describe how the patch addresses this issue?

>
> ...
>
> @@ -371,6 +382,35 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>  	}
>  }
>  
> +static int memcg_charge_kernel_stack(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_VMAP_STACK
> +	struct vm_struct *vm = task_stack_vm_area(tsk);
> +	int ret;
> +
> +	if (vm) {
> +		int i;
> +
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {

Can we ever have THREAD_SIZE < PAGE_SIZE?  64k pages?

> +			/*
> +			 * If memcg_kmem_charge() fails, page->mem_cgroup
> +			 * pointer is NULL, and both memcg_kmem_uncharge()
> +			 * and mod_memcg_page_state() in free_thread_stack()
> +			 * will ignore this page. So it's safe.
> +			 */
> +			ret = memcg_kmem_charge(vm->pages[i], GFP_KERNEL, 0);
> +			if (ret)
> +				return ret;
> +
> +			mod_memcg_page_state(vm->pages[i],
> +					     MEMCG_KERNEL_STACK_KB,
> +					     PAGE_SIZE / 1024);
> +		}
> +	}
> +#endif
> +	return 0;
> +}
>
> ...
>

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Roman Gushchin <guro@fb.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	kernel-team@fb.com, Shakeel Butt <shakeelb@google.com>,
	Michal Hocko <mhocko@kernel.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andy Lutomirski <luto@kernel.org>,
	Konstantin Khlebnikov <koct9i@gmail.com>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH v3 1/3] mm: rework memcg kernel stack accounting
Date: Mon, 27 Aug 2018 14:01:43 -0700	[thread overview]
Message-ID: <20180827140143.98b65bc7cb32f50245eb9114@linux-foundation.org> (raw)
In-Reply-To: <20180827162621.30187-1-guro@fb.com>

On Mon, 27 Aug 2018 09:26:19 -0700 Roman Gushchin <guro@fb.com> wrote:

> If CONFIG_VMAP_STACK is set, kernel stacks are allocated
> using __vmalloc_node_range() with __GFP_ACCOUNT. So kernel
> stack pages are charged against corresponding memory cgroups
> on allocation and uncharged on releasing them.
> 
> The problem is that we do cache kernel stacks in small
> per-cpu caches and do reuse them for new tasks, which can
> belong to different memory cgroups.
> 
> Each stack page still holds a reference to the original cgroup,
> so the cgroup can't be released until the vmap area is released.
> 
> To make this happen we need more than two subsequent exits
> without forks in between on the current cpu, which makes it
> very unlikely to happen. As a result, I saw a significant number
> of dying cgroups (in theory, up to 2 * number_of_cpu +
> number_of_tasks), which can't be released even by significant
> memory pressure.
> 
> As a cgroup structure can take a significant amount of memory
> (first of all, per-cpu data like memcg statistics), it leads
> to a noticeable waste of memory.

OK, but this doesn't describe how the patch addresses this issue?

>
> ...
>
> @@ -371,6 +382,35 @@ static void account_kernel_stack(struct task_struct *tsk, int account)
>  	}
>  }
>  
> +static int memcg_charge_kernel_stack(struct task_struct *tsk)
> +{
> +#ifdef CONFIG_VMAP_STACK
> +	struct vm_struct *vm = task_stack_vm_area(tsk);
> +	int ret;
> +
> +	if (vm) {
> +		int i;
> +
> +		for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) {

Can we ever have THREAD_SIZE < PAGE_SIZE?  64k pages?

> +			/*
> +			 * If memcg_kmem_charge() fails, page->mem_cgroup
> +			 * pointer is NULL, and both memcg_kmem_uncharge()
> +			 * and mod_memcg_page_state() in free_thread_stack()
> +			 * will ignore this page. So it's safe.
> +			 */
> +			ret = memcg_kmem_charge(vm->pages[i], GFP_KERNEL, 0);
> +			if (ret)
> +				return ret;
> +
> +			mod_memcg_page_state(vm->pages[i],
> +					     MEMCG_KERNEL_STACK_KB,
> +					     PAGE_SIZE / 1024);
> +		}
> +	}
> +#endif
> +	return 0;
> +}
>
> ...
>

  parent reply	other threads:[~2018-08-27 21:01 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-27 16:26 [PATCH v3 1/3] mm: rework memcg kernel stack accounting Roman Gushchin
2018-08-27 16:26 ` Roman Gushchin
2018-08-27 16:26 ` [PATCH v3 2/3] mm: drain memcg stocks on css offlining Roman Gushchin
2018-08-27 16:26   ` Roman Gushchin
2018-08-27 16:26 ` [PATCH v3 3/3] mm: don't miss the last page because of round-off error Roman Gushchin
2018-08-27 16:26   ` Roman Gushchin
2018-08-27 21:04   ` Andrew Morton
2018-08-27 21:04     ` Andrew Morton
2018-08-27 23:24     ` Roman Gushchin
2018-08-27 23:24       ` Roman Gushchin
2018-08-29 21:33     ` Roman Gushchin
2018-08-29 21:33       ` Roman Gushchin
2018-09-05 21:08       ` Andrew Morton
2018-09-05 21:08         ` Andrew Morton
2018-08-27 21:01 ` Andrew Morton [this message]
2018-08-27 21:01   ` [PATCH v3 1/3] mm: rework memcg kernel stack accounting Andrew Morton
2018-08-27 23:19   ` Roman Gushchin
2018-08-27 23:19     ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180827140143.98b65bc7cb32f50245eb9114@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@fb.com \
    --cc=koct9i@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.