linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] memcg: css_tryget_online cleanups
@ 2020-03-02 20:31 Shakeel Butt
  2020-03-03  9:32 ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Shakeel Butt @ 2020-03-02 20:31 UTC (permalink / raw)
  To: Roman Gushchin
  Cc: Johannes Weiner, Michal Hocko, Andrew Morton, linux-mm, cgroups,
	linux-kernel, Shakeel Butt

Currently multiple locations in memcg code, css_tryget_online() is being
used. However it doesn't matter whether the cgroup is online for the
callers. Online used to matter when we had reparenting on offlining and
we needed a way to prevent new ones from showing up.

The failure case for couple of these css_tryget_online usage is to
fallback to root_mem_cgroup which kind of make bypassing the memcg
limits possible for some workloads. For example creating an inotify
group in a subcontainer and then deleting that container after moving the
process to a different container will make all the event objects
allocated for that group to the root_mem_cgroup. So, using
css_tryget_online() is dangerous for such cases.

Two locations still use the online version. The swapin of offlined
memcg's pages and the memcg kmem cache creation. The kmem cache indeed
needs the online version as the kernel does the reparenting of memcg
kmem caches. For the swapin case, it has been left for later as the
fallback is not really that concerning.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
---

Changes since v1:
- replaced WARN_ON with WARN_ON_ONCE

 mm/memcontrol.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 62b574d0cd3c..75d8883bf975 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -656,7 +656,7 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
 	 */
 	__mem_cgroup_remove_exceeded(mz, mctz);
 	if (!soft_limit_excess(mz->memcg) ||
-	    !css_tryget_online(&mz->memcg->css))
+	    !css_tryget(&mz->memcg->css))
 		goto retry;
 done:
 	return mz;
@@ -961,7 +961,8 @@ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
 		return NULL;
 
 	rcu_read_lock();
-	if (!memcg || !css_tryget_online(&memcg->css))
+	/* Page should not get uncharged and freed memcg under us. */
+	if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css)))
 		memcg = root_mem_cgroup;
 	rcu_read_unlock();
 	return memcg;
@@ -974,10 +975,13 @@ EXPORT_SYMBOL(get_mem_cgroup_from_page);
 static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
 {
 	if (unlikely(current->active_memcg)) {
-		struct mem_cgroup *memcg = root_mem_cgroup;
+		struct mem_cgroup *memcg;
 
 		rcu_read_lock();
-		if (css_tryget_online(&current->active_memcg->css))
+		/* current->active_memcg must hold a ref. */
+		if (WARN_ON_ONCE(!css_tryget(&current->active_memcg->css)))
+			memcg = root_mem_cgroup;
+		else
 			memcg = current->active_memcg;
 		rcu_read_unlock();
 		return memcg;
@@ -6732,7 +6736,7 @@ void mem_cgroup_sk_alloc(struct sock *sk)
 		goto out;
 	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcpmem_active)
 		goto out;
-	if (css_tryget_online(&memcg->css))
+	if (css_tryget(&memcg->css))
 		sk->sk_memcg = memcg;
 out:
 	rcu_read_unlock();
-- 
2.25.0.265.gbab2e86ba0-goog



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] memcg: css_tryget_online cleanups
  2020-03-02 20:31 [PATCH v2] memcg: css_tryget_online cleanups Shakeel Butt
@ 2020-03-03  9:32 ` Michal Hocko
  2020-03-03 23:49   ` Shakeel Butt
  0 siblings, 1 reply; 4+ messages in thread
From: Michal Hocko @ 2020-03-03  9:32 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, linux-mm,
	cgroups, linux-kernel

On Mon 02-03-20 12:31:09, Shakeel Butt wrote:
> Currently multiple locations in memcg code, css_tryget_online() is being
> used. However it doesn't matter whether the cgroup is online for the
> callers. Online used to matter when we had reparenting on offlining and
> we needed a way to prevent new ones from showing up.
> 
> The failure case for couple of these css_tryget_online usage is to
> fallback to root_mem_cgroup which kind of make bypassing the memcg
> limits possible for some workloads. For example creating an inotify
> group in a subcontainer and then deleting that container after moving the
> process to a different container will make all the event objects
> allocated for that group to the root_mem_cgroup. So, using
> css_tryget_online() is dangerous for such cases.
> 
> Two locations still use the online version. The swapin of offlined
> memcg's pages and the memcg kmem cache creation. The kmem cache indeed
> needs the online version as the kernel does the reparenting of memcg
> kmem caches. For the swapin case, it has been left for later as the
> fallback is not really that concerning.

Could you be more specific about the swap in case please?

> Signed-off-by: Shakeel Butt <shakeelb@google.com>

Other than that nothing really jumped at me although I have to confess
that I am far from deeply familiar with the sk_buff charging path.

Acked-by: Michal Hocko <mhocko@suse.com>
> ---
> 
> Changes since v1:
> - replaced WARN_ON with WARN_ON_ONCE
> 
>  mm/memcontrol.c | 14 +++++++++-----
>  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 62b574d0cd3c..75d8883bf975 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -656,7 +656,7 @@ __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
>  	 */
>  	__mem_cgroup_remove_exceeded(mz, mctz);
>  	if (!soft_limit_excess(mz->memcg) ||
> -	    !css_tryget_online(&mz->memcg->css))
> +	    !css_tryget(&mz->memcg->css))
>  		goto retry;
>  done:
>  	return mz;
> @@ -961,7 +961,8 @@ struct mem_cgroup *get_mem_cgroup_from_page(struct page *page)
>  		return NULL;
>  
>  	rcu_read_lock();
> -	if (!memcg || !css_tryget_online(&memcg->css))
> +	/* Page should not get uncharged and freed memcg under us. */
> +	if (!memcg || WARN_ON_ONCE(!css_tryget(&memcg->css)))
>  		memcg = root_mem_cgroup;
>  	rcu_read_unlock();
>  	return memcg;
> @@ -974,10 +975,13 @@ EXPORT_SYMBOL(get_mem_cgroup_from_page);
>  static __always_inline struct mem_cgroup *get_mem_cgroup_from_current(void)
>  {
>  	if (unlikely(current->active_memcg)) {
> -		struct mem_cgroup *memcg = root_mem_cgroup;
> +		struct mem_cgroup *memcg;
>  
>  		rcu_read_lock();
> -		if (css_tryget_online(&current->active_memcg->css))
> +		/* current->active_memcg must hold a ref. */
> +		if (WARN_ON_ONCE(!css_tryget(&current->active_memcg->css)))
> +			memcg = root_mem_cgroup;
> +		else
>  			memcg = current->active_memcg;
>  		rcu_read_unlock();
>  		return memcg;
> @@ -6732,7 +6736,7 @@ void mem_cgroup_sk_alloc(struct sock *sk)
>  		goto out;
>  	if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && !memcg->tcpmem_active)
>  		goto out;
> -	if (css_tryget_online(&memcg->css))
> +	if (css_tryget(&memcg->css))
>  		sk->sk_memcg = memcg;
>  out:
>  	rcu_read_unlock();
> -- 
> 2.25.0.265.gbab2e86ba0-goog

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] memcg: css_tryget_online cleanups
  2020-03-03  9:32 ` Michal Hocko
@ 2020-03-03 23:49   ` Shakeel Butt
  2020-03-04 12:32     ` Michal Hocko
  0 siblings, 1 reply; 4+ messages in thread
From: Shakeel Butt @ 2020-03-03 23:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Linux MM, Cgroups, LKML

On Tue, Mar 3, 2020 at 1:32 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Mon 02-03-20 12:31:09, Shakeel Butt wrote:
> > Currently multiple locations in memcg code, css_tryget_online() is being
> > used. However it doesn't matter whether the cgroup is online for the
> > callers. Online used to matter when we had reparenting on offlining and
> > we needed a way to prevent new ones from showing up.
> >
> > The failure case for couple of these css_tryget_online usage is to
> > fallback to root_mem_cgroup which kind of make bypassing the memcg
> > limits possible for some workloads. For example creating an inotify
> > group in a subcontainer and then deleting that container after moving the
> > process to a different container will make all the event objects
> > allocated for that group to the root_mem_cgroup. So, using
> > css_tryget_online() is dangerous for such cases.
> >
> > Two locations still use the online version. The swapin of offlined
> > memcg's pages and the memcg kmem cache creation. The kmem cache indeed
> > needs the online version as the kernel does the reparenting of memcg
> > kmem caches. For the swapin case, it has been left for later as the
> > fallback is not really that concerning.
>
> Could you be more specific about the swap in case please?
>

With swap accounting enabled, if the memcg of the swapped out page is
not online then the memcg extracted from the given 'mm' will be
charged and if 'mm' is NULL then root memcg will be charged. However I
could not find a code path where the given 'mm' will be NULL for
swap-in case.

>
> > Signed-off-by: Shakeel Butt <shakeelb@google.com>
>
> Other than that nothing really jumped at me although I have to confess
> that I am far from deeply familiar with the sk_buff charging path.
>
> Acked-by: Michal Hocko <mhocko@suse.com>

Thanks.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] memcg: css_tryget_online cleanups
  2020-03-03 23:49   ` Shakeel Butt
@ 2020-03-04 12:32     ` Michal Hocko
  0 siblings, 0 replies; 4+ messages in thread
From: Michal Hocko @ 2020-03-04 12:32 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Roman Gushchin, Johannes Weiner, Andrew Morton, Linux MM, Cgroups, LKML

On Tue 03-03-20 15:49:41, Shakeel Butt wrote:
> On Tue, Mar 3, 2020 at 1:32 AM Michal Hocko <mhocko@kernel.org> wrote:
[...]
> > Could you be more specific about the swap in case please?
> >
> 
> With swap accounting enabled, if the memcg of the swapped out page is
> not online then the memcg extracted from the given 'mm' will be
> charged and if 'mm' is NULL then root memcg will be charged. However I
> could not find a code path where the given 'mm' will be NULL for
> swap-in case.

Yes, this is my understanding as well. All the paths are getting mm from
a vma AFAICS. This is a valuable information for the changelog.

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-03-04 12:32 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-02 20:31 [PATCH v2] memcg: css_tryget_online cleanups Shakeel Butt
2020-03-03  9:32 ` Michal Hocko
2020-03-03 23:49   ` Shakeel Butt
2020-03-04 12:32     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).