From: Andrew Morton <akpm@linux-foundation.org> To: Roman Gushchin <guro@fb.com> Cc: <linux-mm@kvack.org>, <linux-kernel@vger.kernel.org>, <kernel-team@fb.com>, Shakeel Butt <shakeelb@google.com>, Michal Hocko <mhocko@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>, Andy Lutomirski <luto@kernel.org>, Konstantin Khlebnikov <koct9i@gmail.com>, Tejun Heo <tj@kernel.org> Subject: Re: [PATCH v3 1/3] mm: rework memcg kernel stack accounting Date: Mon, 27 Aug 2018 14:01:43 -0700 [thread overview] Message-ID: <20180827140143.98b65bc7cb32f50245eb9114@linux-foundation.org> (raw) In-Reply-To: <20180827162621.30187-1-guro@fb.com> On Mon, 27 Aug 2018 09:26:19 -0700 Roman Gushchin <guro@fb.com> wrote: > If CONFIG_VMAP_STACK is set, kernel stacks are allocated > using __vmalloc_node_range() with __GFP_ACCOUNT. So kernel > stack pages are charged against corresponding memory cgroups > on allocation and uncharged on releasing them. > > The problem is that we do cache kernel stacks in small > per-cpu caches and do reuse them for new tasks, which can > belong to different memory cgroups. > > Each stack page still holds a reference to the original cgroup, > so the cgroup can't be released until the vmap area is released. > > To make this happen we need more than two subsequent exits > without forks in between on the current cpu, which makes it > very unlikely to happen. As a result, I saw a significant number > of dying cgroups (in theory, up to 2 * number_of_cpu + > number_of_tasks), which can't be released even by significant > memory pressure. > > As a cgroup structure can take a significant amount of memory > (first of all, per-cpu data like memcg statistics), it leads > to a noticeable waste of memory. OK, but this doesn't describe how the patch addresses this issue? > > ... > > @@ -371,6 +382,35 @@ static void account_kernel_stack(struct task_struct *tsk, int account) > } > } > > +static int memcg_charge_kernel_stack(struct task_struct *tsk) > +{ > +#ifdef CONFIG_VMAP_STACK > + struct vm_struct *vm = task_stack_vm_area(tsk); > + int ret; > + > + if (vm) { > + int i; > + > + for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) { Can we ever have THREAD_SIZE < PAGE_SIZE? 64k pages? > + /* > + * If memcg_kmem_charge() fails, page->mem_cgroup > + * pointer is NULL, and both memcg_kmem_uncharge() > + * and mod_memcg_page_state() in free_thread_stack() > + * will ignore this page. So it's safe. > + */ > + ret = memcg_kmem_charge(vm->pages[i], GFP_KERNEL, 0); > + if (ret) > + return ret; > + > + mod_memcg_page_state(vm->pages[i], > + MEMCG_KERNEL_STACK_KB, > + PAGE_SIZE / 1024); > + } > + } > +#endif > + return 0; > +} > > ... >
WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org> To: Roman Gushchin <guro@fb.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, Shakeel Butt <shakeelb@google.com>, Michal Hocko <mhocko@kernel.org>, Johannes Weiner <hannes@cmpxchg.org>, Andy Lutomirski <luto@kernel.org>, Konstantin Khlebnikov <koct9i@gmail.com>, Tejun Heo <tj@kernel.org> Subject: Re: [PATCH v3 1/3] mm: rework memcg kernel stack accounting Date: Mon, 27 Aug 2018 14:01:43 -0700 [thread overview] Message-ID: <20180827140143.98b65bc7cb32f50245eb9114@linux-foundation.org> (raw) In-Reply-To: <20180827162621.30187-1-guro@fb.com> On Mon, 27 Aug 2018 09:26:19 -0700 Roman Gushchin <guro@fb.com> wrote: > If CONFIG_VMAP_STACK is set, kernel stacks are allocated > using __vmalloc_node_range() with __GFP_ACCOUNT. So kernel > stack pages are charged against corresponding memory cgroups > on allocation and uncharged on releasing them. > > The problem is that we do cache kernel stacks in small > per-cpu caches and do reuse them for new tasks, which can > belong to different memory cgroups. > > Each stack page still holds a reference to the original cgroup, > so the cgroup can't be released until the vmap area is released. > > To make this happen we need more than two subsequent exits > without forks in between on the current cpu, which makes it > very unlikely to happen. As a result, I saw a significant number > of dying cgroups (in theory, up to 2 * number_of_cpu + > number_of_tasks), which can't be released even by significant > memory pressure. > > As a cgroup structure can take a significant amount of memory > (first of all, per-cpu data like memcg statistics), it leads > to a noticeable waste of memory. OK, but this doesn't describe how the patch addresses this issue? > > ... > > @@ -371,6 +382,35 @@ static void account_kernel_stack(struct task_struct *tsk, int account) > } > } > > +static int memcg_charge_kernel_stack(struct task_struct *tsk) > +{ > +#ifdef CONFIG_VMAP_STACK > + struct vm_struct *vm = task_stack_vm_area(tsk); > + int ret; > + > + if (vm) { > + int i; > + > + for (i = 0; i < THREAD_SIZE / PAGE_SIZE; i++) { Can we ever have THREAD_SIZE < PAGE_SIZE? 64k pages? > + /* > + * If memcg_kmem_charge() fails, page->mem_cgroup > + * pointer is NULL, and both memcg_kmem_uncharge() > + * and mod_memcg_page_state() in free_thread_stack() > + * will ignore this page. So it's safe. > + */ > + ret = memcg_kmem_charge(vm->pages[i], GFP_KERNEL, 0); > + if (ret) > + return ret; > + > + mod_memcg_page_state(vm->pages[i], > + MEMCG_KERNEL_STACK_KB, > + PAGE_SIZE / 1024); > + } > + } > +#endif > + return 0; > +} > > ... >
next prev parent reply other threads:[~2018-08-27 21:01 UTC|newest] Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-08-27 16:26 [PATCH v3 1/3] mm: rework memcg kernel stack accounting Roman Gushchin 2018-08-27 16:26 ` Roman Gushchin 2018-08-27 16:26 ` [PATCH v3 2/3] mm: drain memcg stocks on css offlining Roman Gushchin 2018-08-27 16:26 ` Roman Gushchin 2018-08-27 16:26 ` [PATCH v3 3/3] mm: don't miss the last page because of round-off error Roman Gushchin 2018-08-27 16:26 ` Roman Gushchin 2018-08-27 21:04 ` Andrew Morton 2018-08-27 21:04 ` Andrew Morton 2018-08-27 23:24 ` Roman Gushchin 2018-08-27 23:24 ` Roman Gushchin 2018-08-29 21:33 ` Roman Gushchin 2018-08-29 21:33 ` Roman Gushchin 2018-09-05 21:08 ` Andrew Morton 2018-09-05 21:08 ` Andrew Morton 2018-08-27 21:01 ` Andrew Morton [this message] 2018-08-27 21:01 ` [PATCH v3 1/3] mm: rework memcg kernel stack accounting Andrew Morton 2018-08-27 23:19 ` Roman Gushchin 2018-08-27 23:19 ` Roman Gushchin
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180827140143.98b65bc7cb32f50245eb9114@linux-foundation.org \ --to=akpm@linux-foundation.org \ --cc=guro@fb.com \ --cc=hannes@cmpxchg.org \ --cc=kernel-team@fb.com \ --cc=koct9i@gmail.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=luto@kernel.org \ --cc=mhocko@kernel.org \ --cc=shakeelb@google.com \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.