All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Muchun Song <songmuchun@bytedance.com>
Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, mhocko@kernel.org,
	vdavydov.dev@gmail.com, shakeelb@google.com, guro@fb.com,
	shy828301@gmail.com, alexs@kernel.org, richard.weiyang@gmail.com,
	david@fromorbit.com, trond.myklebust@hammerspace.com,
	anna.schumaker@netapp.com, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-nfs@vger.kernel.org, zhengqi.arch@bytedance.com,
	duanxiongchun@bytedance.com, fam.zheng@bytedance.com
Subject: Re: [PATCH v2 17/21] mm: list_lru: replace linear array with xarray
Date: Thu, 27 May 2021 13:07:33 +0100	[thread overview]
Message-ID: <YK+LhWvabd+KQWOJ@casper.infradead.org> (raw)
In-Reply-To: <20210527062148.9361-18-songmuchun@bytedance.com>

On Thu, May 27, 2021 at 02:21:44PM +0800, Muchun Song wrote:
> If we run 10k containers in the system, the size of the
> list_lru_memcg->lrus can be ~96KB per list_lru. When we decrease the
> number containers, the size of the array will not be shrinked. It is
> not scalable. The xarray is a good choice for this case. We can save
> a lot of memory when there are tens of thousands continers in the
> system. If we use xarray, we also can remove the logic code of
> resizing array, which can simplify the code.

I am all for this, in concept.  Some thoughts below ...

> @@ -56,10 +51,8 @@ struct list_lru {
>  #ifdef CONFIG_MEMCG_KMEM
>  	struct list_head	list;
>  	int			shrinker_id;
> -	/* protects ->memcg_lrus->lrus[i] */
> -	spinlock_t		lock;
>  	/* for cgroup aware lrus points to per cgroup lists, otherwise NULL */
> -	struct list_lru_memcg	__rcu *memcg_lrus;
> +	struct xarray		*xa;
>  #endif

Normally, we embed an xarray in its containing structure instead of
allocating it.  It's only a pointer, int and spinlock, so generally
16 bytes, as opposed to the 8 bytes for the pointer and a 16 byte
allocation.  There is a minor wrinkle in that currently 'NULL' is
used to indicate "is not cgroup aware".  Maybe there's another way
to indicate that?

> @@ -51,22 +51,12 @@ static int lru_shrinker_id(struct list_lru *lru)
>  static inline struct list_lru_one *
>  list_lru_from_memcg_idx(struct list_lru *lru, int nid, int idx)
>  {
> -	struct list_lru_memcg *memcg_lrus;
> -	struct list_lru_node *nlru = &lru->node[nid];
> +	if (list_lru_memcg_aware(lru) && idx >= 0) {
> +		struct list_lru_per_memcg *mlru = xa_load(lru->xa, idx);
>  
> -	/*
> -	 * Either lock or RCU protects the array of per cgroup lists
> -	 * from relocation (see memcg_update_list_lru).
> -	 */
> -	memcg_lrus = rcu_dereference_check(lru->memcg_lrus,
> -					   lockdep_is_held(&nlru->lock));
> -	if (memcg_lrus && idx >= 0) {
> -		struct list_lru_per_memcg *mlru;
> -
> -		mlru = rcu_dereference_check(memcg_lrus->lrus[idx], true);
>  		return mlru ? &mlru->nodes[nid] : NULL;
>  	}
> -	return &nlru->lru;
> +	return &lru->node[nid].lru;
>  }

... perhaps we move the xarray out from under the #ifdef and use index 0
for non-memcg-aware lrus?  The XArray is specially optimised for arrays
which only have one entry at 0.

>  int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t gfp)
>  {
> +	XA_STATE(xas, lru->xa, 0);
>  	unsigned long flags;
> -	struct list_lru_memcg *memcg_lrus;
> -	int i;
> +	int i, ret = 0;
>  
>  	struct list_lru_memcg_table {
>  		struct list_lru_per_memcg *mlru;
> @@ -601,22 +522,45 @@ int list_lru_memcg_alloc(struct list_lru *lru, struct mem_cgroup *memcg, gfp_t g
>  		}
>  	}
>  
> -	spin_lock_irqsave(&lru->lock, flags);
> -	memcg_lrus = rcu_dereference_protected(lru->memcg_lrus, true);
> +	xas_lock_irqsave(&xas, flags);
>  	while (i--) {
>  		int index = memcg_cache_id(table[i].memcg);
>  		struct list_lru_per_memcg *mlru = table[i].mlru;
>  
> -		if (index < 0 || rcu_dereference_protected(memcg_lrus->lrus[index], true))
> +		xas_set(&xas, index);
> +retry:
> +		if (unlikely(index < 0 || ret || xas_load(&xas))) {
>  			kfree(mlru);
> -		else
> -			rcu_assign_pointer(memcg_lrus->lrus[index], mlru);
> +		} else {
> +			ret = xa_err(xas_store(&xas, mlru));

This is mixing advanced and normal XArray concepts ... sorry to have
confused you.  I think what you meant to do here was:

			xas_store(&xas, mlru);
			ret = xas_error(&xas);

Or you can avoid introducing 'ret' at all, and keep your errors in the
xa_state.  You're kind of mirroring the xa_state errors into 'ret'
anyway, so that seems easier to understand?

> -	memcg_id = memcg_alloc_cache_id();
> +	memcg_id = ida_simple_get(&memcg_cache_ida, 0, MEMCG_CACHES_MAX_SIZE,
> +				  GFP_KERNEL);

	memcg_id = ida_alloc_max(&memcg_cache_ida,
			MEMCG_CACHES_MAX_SIZE - 1, GFP_KERNEL);

... although i think there's actually a fencepost error, and this really
should be MEMCG_CACHES_MAX_SIZE.

>  	objcg = obj_cgroup_alloc();
>  	if (!objcg) {
> -		memcg_free_cache_id(memcg_id);
> +		ida_simple_remove(&memcg_cache_ida, memcg_id);

		ida_free(&memcg_cache_ida, memcg_id);


  reply	other threads:[~2021-05-27 12:09 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-27  6:21 [PATCH v2 00/21] Optimize list lru memory consumption Muchun Song
2021-05-27  6:21 ` [PATCH v2 01/21] mm: list_lru: fix list_lru_count_one() return value Muchun Song
2021-05-27  6:21 ` [PATCH v2 02/21] mm: memcontrol: remove kmemcg_id reparenting Muchun Song
2021-05-27  6:21 ` [PATCH v2 03/21] mm: memcontrol: remove the kmem states Muchun Song
2021-05-27  6:21 ` [PATCH v2 04/21] mm: memcontrol: do it in mem_cgroup_css_online to make the kmem online Muchun Song
2021-05-27  6:21 ` [PATCH v2 05/21] mm: list_lru: remove lru node locking from memcg_update_list_lru_node Muchun Song
2021-05-27  6:21 ` [PATCH v2 06/21] mm: list_lru: only add the memcg aware lrus to the list_lrus Muchun Song
2021-05-27  6:21 ` [PATCH v2 07/21] mm: list_lru: optimize the array of per memcg lists memory consumption Muchun Song
2021-05-27  6:21 ` [PATCH v2 08/21] mm: list_lru: remove memcg_aware field from struct list_lru Muchun Song
2021-05-27  6:21 ` [PATCH v2 09/21] mm: introduce kmem_cache_alloc_lru Muchun Song
2021-05-27  6:21 ` [PATCH v2 10/21] fs: introduce alloc_inode_sb() to allocate filesystems specific inode Muchun Song
2021-05-27  6:21 ` [PATCH v2 11/21] mm: dcache: use kmem_cache_alloc_lru() to allocate dentry Muchun Song
2021-05-27  6:21 ` [PATCH v2 12/21] xarray: use kmem_cache_alloc_lru to allocate xa_node Muchun Song
2021-05-27  6:21 ` [PATCH v2 13/21] mm: workingset: use xas_set_lru() to pass shadow_nodes Muchun Song
2021-05-27  6:21 ` [PATCH v2 14/21] nfs42: use a specific kmem_cache to allocate nfs4_xattr_entry Muchun Song
2021-05-27  6:21 ` [PATCH v2 15/21] mm: list_lru: allocate list_lru_one only when needed Muchun Song
2021-05-27  6:21 ` [PATCH v2 16/21] mm: list_lru: rename memcg_drain_all_list_lrus to memcg_reparent_list_lrus Muchun Song
2021-05-27  6:21 ` [PATCH v2 17/21] mm: list_lru: replace linear array with xarray Muchun Song
2021-05-27 12:07   ` Matthew Wilcox [this message]
2021-05-28  3:43     ` [External] " Muchun Song
2021-05-28  3:43       ` Muchun Song
2021-05-28  8:04       ` Muchun Song
2021-05-28  8:04         ` Muchun Song
2021-05-27  6:21 ` [PATCH v2 18/21] mm: memcontrol: reuse memory cgroup ID for kmem ID Muchun Song
2021-05-27  6:21 ` [PATCH v2 19/21] mm: memcontrol: fix cannot alloc the maximum memcg ID Muchun Song
2021-05-27  6:21 ` [PATCH v2 20/21] mm: list_lru: rename list_lru_per_memcg to list_lru_memcg Muchun Song
2021-05-27  6:21 ` [PATCH v2 21/21] mm: memcontrol: rename memcg_cache_id to memcg_kmem_id Muchun Song
2021-06-16 10:53 ` [PATCH v2 00/21] Optimize list lru memory consumption Muchun Song
2021-06-16 10:53   ` Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YK+LhWvabd+KQWOJ@casper.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=alexs@kernel.org \
    --cc=anna.schumaker@netapp.com \
    --cc=david@fromorbit.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=richard.weiyang@gmail.com \
    --cc=shakeelb@google.com \
    --cc=shy828301@gmail.com \
    --cc=songmuchun@bytedance.com \
    --cc=trond.myklebust@hammerspace.com \
    --cc=vdavydov.dev@gmail.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.