All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Muchun Song <songmuchun@bytedance.com>
Cc: guro@fb.com, mhocko@kernel.org, akpm@linux-foundation.org,
	shakeelb@google.com, vdavydov.dev@gmail.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	duanxiongchun@bytedance.com, fam.zheng@bytedance.com,
	bsingharora@gmail.com, shy828301@gmail.com,
	alex.shi@linux.alibaba.com
Subject: Re: [RFC PATCH v2 06/18] mm: memcontrol: move the objcg infrastructure out of CONFIG_MEMCG_KMEM
Date: Fri, 9 Apr 2021 12:56:32 -0400	[thread overview]
Message-ID: <YHCHQHrCj6rH1sD3@cmpxchg.org> (raw)
In-Reply-To: <20210409122959.82264-7-songmuchun@bytedance.com>

On Fri, Apr 09, 2021 at 08:29:47PM +0800, Muchun Song wrote:
> Because memory allocations pinning memcgs for a long time - it exists
> at a larger scale and is causing recurring problems in the real world:
> page cache doesn't get reclaimed for a long time, or is used by the
> second, third, fourth, ... instance of the same job that was restarted
> into a new cgroup every time. Unreclaimable dying cgroups pile up,
> waste memory, and make page reclaim very inefficient.
> 
> We can convert LRU pages and most other raw memcg pins to the objcg
> direction to fix this problem, and then the page->memcg will always
> point to an object cgroup pointer.
> 
> Therefore, the infrastructure of objcg no longer only serves
> CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the
> objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages
> can reuse it to charge pages.

Just an observation on this:

We actually may want to remove CONFIG_MEMCG_KMEM altogether at this
point. It used to be an optional feature, but nowadays it's not
configurable anymore, and always on unless slob is configured.

We've also added more than just slab accounting to it, like kernel
stack pages, and it all gets disabled on slob configs just because it
doesn't support slab object tracking.

We could probably replace CONFIG_MEMCG_KMEM with CONFIG_MEMCG in most
places, and add a couple of !CONFIG_SLOB checks in the slab callbacks.

But that's beyond the scope of your patch series, so I'm also okay
with this patch here.

> We know that the LRU pages are not accounted at the root level. But the
> page->memcg_data points to the root_mem_cgroup. So the page->memcg_data
> of the LRU pages always points to a valid pointer. But the root_mem_cgroup
> dose not have an object cgroup. If we use obj_cgroup APIs to charge the
> LRU pages, we should set the page->memcg_data to a root object cgroup. So
> we also allocate an object cgroup for the root_mem_cgroup and introduce
> root_obj_cgroup to cache its value just like root_mem_cgroup.
> 
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>

Overall, the patch makes sense to me. A few comments below:

> @@ -252,9 +253,14 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr)
>  	return &container_of(vmpr, struct mem_cgroup, vmpressure)->css;
>  }
>  
> -#ifdef CONFIG_MEMCG_KMEM
>  extern spinlock_t css_set_lock;
>  
> +static inline bool obj_cgroup_is_root(struct obj_cgroup *objcg)
> +{
  > +	return objcg == root_obj_cgroup;
> +}

This function, and by extension root_obj_cgroup, aren't used by this
patch. Please move them to the patch that adds users for them.

> @@ -298,6 +304,20 @@ static void obj_cgroup_release(struct percpu_ref *ref)
>  	percpu_ref_exit(ref);
>  	kfree_rcu(objcg, rcu);
>  }
> +#else
> +static void obj_cgroup_release(struct percpu_ref *ref)
> +{
> +	struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt);
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&css_set_lock, flags);
> +	list_del(&objcg->list);
> +	spin_unlock_irqrestore(&css_set_lock, flags);
> +
> +	percpu_ref_exit(ref);
> +	kfree_rcu(objcg, rcu);
> +}
> +#endif

Having two separate functions for if and else is good when the else
branch is a completely empty dummy function. In this case you end up
duplicating code, so it's better to have just one function and put the
ifdef around the nr_charged_bytes handling in it.

> @@ -318,10 +338,14 @@ static struct obj_cgroup *obj_cgroup_alloc(void)
>  	return objcg;
>  }
>  
> -static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
> -				  struct mem_cgroup *parent)
> +static void memcg_reparent_objcgs(struct mem_cgroup *memcg)
>  {
>  	struct obj_cgroup *objcg, *iter;
> +	struct mem_cgroup *parent;
> +
> +	parent = parent_mem_cgroup(memcg);
> +	if (!parent)
> +		parent = root_mem_cgroup;
>  
>  	objcg = rcu_replace_pointer(memcg->objcg, NULL, true);
>  
> @@ -342,6 +366,27 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg,
>  	percpu_ref_kill(&objcg->refcnt);
>  }
>  
> +static int memcg_obj_cgroup_alloc(struct mem_cgroup *memcg)
> +{
> +	struct obj_cgroup *objcg;
> +
> +	objcg = obj_cgroup_alloc();
> +	if (!objcg)
> +		return -ENOMEM;
> +
> +	objcg->memcg = memcg;
> +	rcu_assign_pointer(memcg->objcg, objcg);
> +
> +	return 0;
> +}
> +
> +static void memcg_obj_cgroup_free(struct mem_cgroup *memcg)
> +{
> +	if (unlikely(memcg->objcg))
> +		memcg_reparent_objcgs(memcg);
> +}

It's confusing to have a 'free' function not free the object it's
called on.

But rather than search for a fitting name, I think it might be better
to just fold both of these short functions into their only callsites.

Also, since memcg->objcg is reparented, and the pointer cleared, on
offlining, when could this ever be non-NULL? This deserves a comment.

> @@ -3444,7 +3489,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css,
>  #ifdef CONFIG_MEMCG_KMEM
>  static int memcg_online_kmem(struct mem_cgroup *memcg)
>  {
> -	struct obj_cgroup *objcg;
>  	int memcg_id;
>  
>  	if (cgroup_memory_nokmem)
> @@ -3457,14 +3501,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg)
>  	if (memcg_id < 0)
>  		return memcg_id;
>  
> -	objcg = obj_cgroup_alloc();
> -	if (!objcg) {
> -		memcg_free_cache_id(memcg_id);
> -		return -ENOMEM;
> -	}
> -	objcg->memcg = memcg;
> -	rcu_assign_pointer(memcg->objcg, objcg);
> -
>  	static_branch_enable(&memcg_kmem_enabled_key);
>  
>  	memcg->kmemcg_id = memcg_id;
> @@ -3488,7 +3524,7 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg)
>  	if (!parent)
>  		parent = root_mem_cgroup;
>  
> -	memcg_reparent_objcgs(memcg, parent);
> +	memcg_reparent_objcgs(memcg);

Since the objcg is no longer tied to kmem, this should move to
mem_cgroup_css_offline() instead.

  reply	other threads:[~2021-04-09 16:56 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-09 12:29 [RFC PATCH v2 00/18] Use obj_cgroup APIs to charge the LRU pages Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 01/18] mm: memcontrol: fix page charging in page replacement Muchun Song
2021-04-10 17:44   ` Shakeel Butt
2021-04-10 17:44     ` Shakeel Butt
2021-04-09 12:29 ` [RFC PATCH v2 02/18] mm: memcontrol: bail out early when !mm in get_mem_cgroup_from_mm Muchun Song
2021-04-10 17:47   ` Shakeel Butt
2021-04-10 17:47     ` Shakeel Butt
2021-04-09 12:29 ` [RFC PATCH v2 03/18] mm: memcontrol: remove the pgdata parameter of mem_cgroup_page_lruvec Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 04/18] mm: memcontrol: simplify lruvec_holds_page_lru_lock Muchun Song
2021-04-09 16:00   ` Johannes Weiner
2021-04-11  5:37     ` [External] " Muchun Song
2021-04-11  5:37       ` Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 05/18] mm: memcontrol: simplify the logic of objcg pinning memcg Muchun Song
2021-04-09 16:27   ` Johannes Weiner
2021-04-09 12:29 ` [RFC PATCH v2 06/18] mm: memcontrol: move the objcg infrastructure out of CONFIG_MEMCG_KMEM Muchun Song
2021-04-09 16:56   ` Johannes Weiner [this message]
2021-04-11  6:24     ` [External] " Muchun Song
2021-04-11  6:24       ` Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 07/18] mm: memcontrol: introduce compact_lock_page_lruvec_irqsave Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 08/18] mm: memcontrol: make lruvec lock safe when the LRU pages reparented Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 09/18] mm: vmscan: remove noinline_for_stack Muchun Song
2021-04-09 18:31   ` Johannes Weiner
2021-04-10  4:33     ` [External] " Muchun Song
2021-04-10  4:33       ` Muchun Song
2021-04-10 17:48       ` Shakeel Butt
2021-04-10 17:48         ` Shakeel Butt
2021-04-10  0:49   ` Roman Gushchin
2021-04-09 12:29 ` [RFC PATCH v2 10/18] mm: vmscan: rework move_pages_to_lru() Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 11/18] mm: thp: introduce lock/unlock_split_queue{_irqsave}() Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 12/18] mm: thp: make deferred split queue lock safe when the LRU pages reparented Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 13/18] mm: memcontrol: make all the callers of page_memcg() safe Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 14/18] mm: memcontrol: introduce memcg_reparent_ops Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 15/18] mm: memcontrol: use obj_cgroup APIs to charge the LRU pages Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 16/18] mm: memcontrol: rename {un}lock_page_memcg() to {un}lock_page_objcg() Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 17/18] mm: lru: add VM_BUG_ON_PAGE to lru maintenance function Muchun Song
2021-04-09 12:29 ` [RFC PATCH v2 18/18] mm: lru: use lruvec lock to serialize memcg changes Muchun Song
2021-04-10  1:29 ` [RFC PATCH v2 00/18] Use obj_cgroup APIs to charge the LRU pages Roman Gushchin
2021-04-12 17:14   ` Johannes Weiner
2021-04-12 17:45     ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHCHQHrCj6rH1sD3@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=bsingharora@gmail.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=guro@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.