All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com,
	akpm@linux-foundation.org, viro@zeniv.linux.org.uk,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Roman Gushchin <guro@fb.com>, Chris Down <chris@chrisdown.name>,
	Dave Chinner <dchinner@redhat.com>
Subject: Re: [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode
Date: Sat, 4 Jan 2020 14:55:01 +1100	[thread overview]
Message-ID: <20200104035501.GE23195@dread.disaster.area> (raw)
In-Reply-To: <1577174006-13025-6-git-send-email-laoar.shao@gmail.com>

On Tue, Dec 24, 2019 at 02:53:26AM -0500, Yafang Shao wrote:
> On my server there're some running MEMCGs protected by memory.{min, low},
> but I found the usage of these MEMCGs abruptly became very small, which
> were far less than the protect limit. It confused me and finally I
> found that was because of inode stealing.
> Once an inode is freed, all its belonging page caches will be dropped as
> well, no matter how may page caches it has. So if we intend to protect the
> page caches in a memcg, we must protect their host (the inode) first.
> Otherwise the memcg protection can be easily bypassed with freeing inode,
> especially if there're big files in this memcg.
> 
> Supposes we have a memcg, and the stat of this memcg is,
> 	memory.current = 1024M
> 	memory.min = 512M
> And in this memcg there's a inode with 800M page caches.
> Once this memcg is scanned by kswapd or other regular reclaimers,
>     kswapd <<<< It can be either of the regular reclaimers.
>         shrink_node_memcgs
> 	    switch (mem_cgroup_protected()) <<<< Not protected
> 		case MEMCG_PROT_NONE:  <<<< Will scan this memcg
> 			beak;
>             shrink_lruvec() <<<< Reclaim the page caches
>             shrink_slab()   <<<< It may free this inode and drop all its
>                                  page caches(800M).
> So we must protect the inode first if we want to protect page caches.
> 
> The inherent mismatch between memcg and inode is a trouble. One inode can
> be shared by different MEMCGs, but it is a very rare case. If an inode is
> shared, its belonging page caches may be charged to different MEMCGs.
> Currently there's no perfect solution to fix this kind of issue, but the
> inode majority-writer ownership switching can help it more or less.

There's multiple changes in this patch set. Yes, there's some inode
cache futzing to deal with memcgs, but it also adds some weird
undocumented "in low reclaim" heuristic that does something magical
with "protection" that you don't describe or document at all. PLease
separate that out into a new patch and document it clearly (both in
the commit message and the code, please).

> diff --git a/fs/inode.c b/fs/inode.c
> index fef457a..4f4b2f3 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -54,6 +54,13 @@
>   *   inode_hash_lock
>   */
>  
> +struct inode_head {

not an "inode head" structure. They are inode isolation arguments.
i.e. struct inode_isolation_args {...};

> +	struct list_head *freeable;
> +#ifdef CONFIG_MEMCG_KMEM
> +	struct mem_cgroup *memcg;
> +#endif
> +};

These defines are awful, too, and completely unnecesarily because
the struct shrink_control unconditionally defines sc->memcg and
so we can just code it throughout without caring whether memcgs are
enabled or not.

> +
>  static unsigned int i_hash_mask __read_mostly;
>  static unsigned int i_hash_shift __read_mostly;
>  static struct hlist_head *inode_hashtable __read_mostly;
> @@ -724,8 +731,10 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
>  static enum lru_status inode_lru_isolate(struct list_head *item,
>  		struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
>  {
> -	struct list_head *freeable = arg;
> +	struct inode_head *ihead = (struct inode_head *)arg;

No need for a cast of a void *.

> +	struct list_head *freeable = ihead->freeable;
>  	struct inode	*inode = container_of(item, struct inode, i_lru);
> +	struct mem_cgroup *memcg = NULL;

	struct inode_isolation_args *iargs = arg;
	struct list_head *freeable = iargs->freeable;
	struct mem_cgroup *memcg = iargs->memcg;
	struct inode	*inode = container_of(item, struct inode, i_lru);

> @@ -734,6 +743,15 @@ static enum lru_status inode_lru_isolate(struct list_head *item,
>  	if (!spin_trylock(&inode->i_lock))
>  		return LRU_SKIP;
>  
> +#ifdef CONFIG_MEMCG_KMEM
> +	memcg = ihead->memcg;
> +#endif

This is no longer necessary.

> +	if (memcg && inode->i_data.nrpages &&
> +	    !(memcg_can_reclaim_inode(memcg, inode))) {
> +		spin_unlock(&inode->i_lock);
> +		return LRU_ROTATE;
> +	}

I'd argue that both the memcg and the inode->i_data.nrpages check
should be inside memcg_can_reclaim_inode(), that way this entire
chunk of code disappears when CONFIG_MEMCG_KMEM=N.

> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
> index f36ada9..d1d4175 100644
> --- a/include/linux/memcontrol.h
> +++ b/include/linux/memcontrol.h
> @@ -247,6 +247,9 @@ struct mem_cgroup {
>  	unsigned int tcpmem_active : 1;
>  	unsigned int tcpmem_pressure : 1;
>  
> +	/* Soft protection will be ignored if it's true */
> +	unsigned int in_low_reclaim : 1;

This is the stuff that has nothing to do with the changes to the
inode reclaim shrinker...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  parent reply	other threads:[~2020-01-04  3:55 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-24  7:53 [PATCH v2 0/5] protect page cache from freeing inode Yafang Shao
2019-12-24  7:53 ` [PATCH v2 1/5] mm, memcg: reduce size of struct mem_cgroup by using bit field Yafang Shao
2019-12-26 21:23   ` Roman Gushchin
2019-12-27  1:03     ` Yafang Shao
2019-12-27  1:03       ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 2/5] mm, memcg: introduce MEMCG_PROT_SKIP for memcg zero usage case Yafang Shao
2019-12-26 21:36   ` Roman Gushchin
2019-12-27  1:09     ` Yafang Shao
2019-12-27  1:09       ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 3/5] mm, memcg: reset memcg's memory.{min, low} for reclaiming itself Yafang Shao
2019-12-26 21:45   ` Roman Gushchin
2019-12-27  1:11     ` Yafang Shao
2019-12-27  1:11       ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 4/5] mm: make memcg visible to lru walker isolation function Yafang Shao
2020-01-04  3:35   ` Dave Chinner
2020-01-04  7:26     ` Yafang Shao
2020-01-04  7:26       ` Yafang Shao
2020-01-04 21:23       ` Dave Chinner
2020-01-05  1:43         ` Yafang Shao
2020-01-05  1:43           ` Yafang Shao
2020-01-06  0:17           ` Dave Chinner
2020-01-06 14:41             ` Yafang Shao
2020-01-06 14:41               ` Yafang Shao
2020-01-06 21:31               ` Dave Chinner
2020-01-07 13:22                 ` Yafang Shao
2020-01-07 13:22                   ` Yafang Shao
2019-12-24  7:53 ` [PATCH v2 5/5] memcg, inode: protect page cache from freeing inode Yafang Shao
2019-12-25 13:01   ` kbuild test robot
2019-12-25 13:01     ` kbuild test robot
2019-12-25 13:18   ` kbuild test robot
2019-12-25 13:18     ` kbuild test robot
2019-12-26  5:09     ` Yafang Shao
2019-12-26  5:09       ` Yafang Shao
2020-01-04  3:55   ` Dave Chinner [this message]
2020-01-04  7:42     ` Yafang Shao
2020-01-04  7:42       ` Yafang Shao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200104035501.GE23195@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=akpm@linux-foundation.org \
    --cc=chris@chrisdown.name \
    --cc=dchinner@redhat.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.