All of lore.kernel.org
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Roman Gushchin <guro@fb.com>
Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com, tj@kernel.org,
	Jan Kara <jack@suse.cz>
Subject: Re: [PATCH] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups
Date: Mon,  7 Oct 2019 14:01:44 +0800	[thread overview]
Message-ID: <20191007060144.12416-1-hdanton@sina.com> (raw)


On Fri, 4 Oct 2019 15:11:04 -0700 Roman Gushchin wrote:
> 
> This is a RFC patch, which is not intended to be merged as is,
> but hopefully will start a discussion which can result in a good
> solution for the described problem.
> --
> We've noticed that the number of dying cgroups on our production hosts
> tends to grow with the uptime. This time it's caused by the writeback
> code.
> 
> An inode which is getting dirty for the first time is associated
> with the wb structure (look at __inode_attach_wb()). It can later
> be switched to another wb under some conditions (e.g. some other
> cgroup is writing a lot of data to the same inode), but generally
> stays associated up to the end of life of the inode structure.
> 
> The problem is that the wb structure holds a reference to the original
> memory cgroup. So if the inode was dirty once, it has a good chance
> to pin down the original memory cgroup.
> 
> An example from the real life: some service runs periodically and
> updates rpm packages. Each time in a new memory cgroup. Installed
> .so files are heavily used by other cgroups, so corresponding inodes
> tend to stay alive for a long. So do pinned memory cgroups.
> In production I've seen many hosts with 1-2 thousands of dying
> cgroups.

The diff below fixes e8a7abf5a5bd ("writeback: disassociate inodes
from dying bdi_writebacks") by selecting new memcg_css id for dying
bdi_writeback to switch to.
Checking offline memcg is also added, which is perhaps needed in your
case. Let us know if it makes sense in helping you cut dying cgroups
down a bit.

--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -552,6 +552,8 @@ out_free:
 void wbc_attach_and_unlock_inode(struct writeback_control *wbc,
 				 struct inode *inode)
 {
+	int new_id = 0;
+
 	if (!inode_cgwb_enabled(inode)) {
 		spin_unlock(&inode->i_lock);
 		return;
@@ -560,6 +562,22 @@ void wbc_attach_and_unlock_inode(struct
 	wbc->wb = inode_to_wb(inode);
 	wbc->inode = inode;
 
+	if (unlikely(wb_dying(wbc->wb)) ||
+	    !mem_cgroup_from_css(wbc->wb->memcg_css)->cgwb_list.next) {
+		int id = wbc->wb->memcg_css->id;
+		/*
+		 * any css id is fine in order to let dying/offline
+		 * memcg reap
+		 */
+		if (id != wbc->wb_id && wbc->wb_id)
+			new_id = wbc->wb_id;
+		else if (id != wbc->wb_lcand_id && wbc->wb_lcand_id)
+			new_id = wbc->wb_lcand_id;
+		else if (id != wbc->wb_tcand_id && wbc->wb_tcand_id)
+			new_id = wbc->wb_tcand_id;
+		else
+			new_id = inode_to_bdi(inode)->wb.memcg_css->id;
+	}
 	wbc->wb_id = wbc->wb->memcg_css->id;
 	wbc->wb_lcand_id = inode->i_wb_frn_winner;
 	wbc->wb_tcand_id = 0;
@@ -574,8 +592,8 @@ void wbc_attach_and_unlock_inode(struct
 	 * A dying wb indicates that the memcg-blkcg mapping has changed
 	 * and a new wb is already serving the memcg.  Switch immediately.
 	 */
-	if (unlikely(wb_dying(wbc->wb)))
-		inode_switch_wbs(inode, wbc->wb_id);
+	if (new_id)
+		inode_switch_wbs(inode, new_id);
 }
 EXPORT_SYMBOL_GPL(wbc_attach_and_unlock_inode);
 
--



             reply	other threads:[~2019-10-07  6:02 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-07  6:01 Hillf Danton [this message]
2019-10-07 22:02 ` [PATCH] cgroup, blkcg: prevent dirty inodes to pin dying memory cgroups Roman Gushchin
  -- strict thread matches above, loose matches on Subject: below --
2019-10-04 22:11 Roman Gushchin
2019-10-07 14:57 ` Vlastimil Babka
2019-10-07 23:35   ` Roman Gushchin
2019-10-07 16:19 ` Michal Koutný
2019-10-07 23:24   ` Roman Gushchin
2019-10-08  4:06 ` Dave Chinner
2019-10-08  5:38   ` Roman Gushchin
2019-10-08  8:20     ` Jan Kara
2019-10-09  5:19       ` Roman Gushchin
2019-10-09 21:48       ` Roman Gushchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191007060144.12416-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=guro@fb.com \
    --cc=jack@suse.cz \
    --cc=kernel-team@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.