All of lore.kernel.org
 help / color / mirror / Atom feed
From: Roman Gushchin <guro@fb.com>
To: Hillf Danton <hdanton@sina.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Kernel Team <Kernel-team@fb.com>, "tj@kernel.org" <tj@kernel.org>,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups
Date: Mon, 7 Oct 2019 22:02:46 +0000	[thread overview]
Message-ID: <20191007220242.GA25914@tower.DHCP.thefacebook.com> (raw)
In-Reply-To: <20191007060144.12416-1-hdanton@sina.com>

On Mon, Oct 07, 2019 at 02:01:44PM +0800, Hillf Danton wrote:
> 
> On Fri, 4 Oct 2019 15:11:04 -0700 Roman Gushchin wrote:
> > 
> > This is a RFC patch, which is not intended to be merged as is,
> > but hopefully will start a discussion which can result in a good
> > solution for the described problem.
> > --
> > We've noticed that the number of dying cgroups on our production hosts
> > tends to grow with the uptime. This time it's caused by the writeback
> > code.
> > 
> > An inode which is getting dirty for the first time is associated
> > with the wb structure (look at __inode_attach_wb()). It can later
> > be switched to another wb under some conditions (e.g. some other
> > cgroup is writing a lot of data to the same inode), but generally
> > stays associated up to the end of life of the inode structure.
> > 
> > The problem is that the wb structure holds a reference to the original
> > memory cgroup. So if the inode was dirty once, it has a good chance
> > to pin down the original memory cgroup.
> > 
> > An example from the real life: some service runs periodically and
> > updates rpm packages. Each time in a new memory cgroup. Installed
> > .so files are heavily used by other cgroups, so corresponding inodes
> > tend to stay alive for a long. So do pinned memory cgroups.
> > In production I've seen many hosts with 1-2 thousands of dying
> > cgroups.
> 
> The diff below fixes e8a7abf5a5bd ("writeback: disassociate inodes
> from dying bdi_writebacks") by selecting new memcg_css id for dying
> bdi_writeback to switch to.
> Checking offline memcg is also added, which is perhaps needed in your
> case. Let us know if it makes sense in helping you cut dying cgroups
> down a bit.

Hello, Hillf!

Thank you for the patch! I'll be back with testing results in few days.
I doubt that it can completely solve the problem (if nobody is using
the inode for writing), but probably can make it less noticeable.

Thanks!

> 
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -552,6 +552,8 @@ out_free:
>  void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
>  				 struct inode *inode)
>  {
> +	int new_id = 0;
> +
>  	if (!inode_cgwb_enabled(inode)) {
>  		spin_unlock(&inode->i_lock);
>  		return;
> @@ -560,6 +562,22 @@ void wbc_attach_and_unlock_inode(struct
>  	wbc->wb = inode_to_wb(inode);
>  	wbc->inode = inode;
>  
> +	if (unlikely(wb_dying(wbc->wb)) ||
> +	    !mem_cgroup_from_css(wbc->wb->memcg_css)->cgwb_list.next) {
> +		int id = wbc->wb->memcg_css->id;
> +		/*
> +		 * any css id is fine in order to let dying/offline
> +		 * memcg reap
> +		 */
> +		if (id != wbc->wb_id && wbc->wb_id)
> +			new_id = wbc->wb_id;
> +		else if (id != wbc->wb_lcand_id && wbc->wb_lcand_id)
> +			new_id = wbc->wb_lcand_id;
> +		else if (id != wbc->wb_tcand_id && wbc->wb_tcand_id)
> +			new_id = wbc->wb_tcand_id;
> +		else
> +			new_id = inode_to_bdi(inode)->wb.memcg_css->id;
> +	}
>  	wbc->wb_id = wbc->wb->memcg_css->id;
>  	wbc->wb_lcand_id = inode->i_wb_frn_winner;
>  	wbc->wb_tcand_id = 0;
> @@ -574,8 +592,8 @@ void wbc_attach_and_unlock_inode(struct
>  	 * A dying wb indicates that the memcg-blkcg mapping has changed
>  	 * and a new wb is already serving the memcg.  Switch immediately.
>  	 */
> -	if (unlikely(wb_dying(wbc->wb)))
> -		inode_switch_wbs(inode, wbc->wb_id);
> +	if (new_id)
> +		inode_switch_wbs(inode, new_id);
>  }
>  EXPORT_SYMBOL_GPL(wbc_attach_and_unlock_inode);
>  
> --
> 
> 

  reply	other threads:[~2019-10-07 22:03 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-07  6:01 [PATCH] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Hillf Danton
2019-10-07 22:02 ` Roman Gushchin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2019-10-04 22:11 Roman Gushchin
2019-10-07 14:57 ` Vlastimil Babka
2019-10-07 23:35   ` Roman Gushchin
2019-10-07 16:19 ` Michal Koutný
2019-10-07 23:24   ` Roman Gushchin
2019-10-08  4:06 ` Dave Chinner
2019-10-08  5:38   ` Roman Gushchin
2019-10-08  8:20     ` Jan Kara
2019-10-09  5:19       ` Roman Gushchin
2019-10-09 21:48       ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191007220242.GA25914@tower.DHCP.thefacebook.com \
    --to=guro@fb.com \
    --cc=Kernel-team@fb.com \
    --cc=hdanton@sina.com \
    --cc=jack@suse.cz \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.