From: Roman Gushchin <email@example.com> To: "Michal Koutný" <firstname.lastname@example.org> Cc: Andrew Morton <email@example.com>, Dennis Zhou <firstname.lastname@example.org>, Tejun Heo <email@example.com>, Christoph Lameter <firstname.lastname@example.org>, Johannes Weiner <email@example.com>, Michal Hocko <firstname.lastname@example.org>, Shakeel Butt <email@example.com>, <firstname.lastname@example.org>, <email@example.com>, <firstname.lastname@example.org> Subject: Re: [PATCH v3 4/5] mm: memcg: charge memcg percpu memory to the parent cgroup Date: Tue, 11 Aug 2020 09:55:27 -0700 Message-ID: <20200811165527.GA1507044@carbon.DHCP.thefacebook.com> (raw) In-Reply-To: <20200811144754.GA45201@blackbook> On Tue, Aug 11, 2020 at 04:47:54PM +0200, Michal Koutny wrote: > On Thu, Aug 06, 2020 at 09:37:17PM -0700, Roman Gushchin <email@example.com> wrote: > > In general, yes. But in this case I think it wouldn't be a good idea: > > most often cgroups are created by a centralized daemon (systemd), > > which is usually located in the root cgroup. Even if it's located not in > > the root cgroup, limiting it's memory will likely affect the whole system, > > even if only one specific limit was reached. > The generic scheme would be (assuming the no internal process > constraint, in the root too) > > ` root or delegated root > ` manager-cgroup (systemd, docker, ...) > ` [aggregation group(s)] > ` job-group-1 > ` ... > ` job-group-n > > > If there is a containerized workload, which creates sub-cgroups, > > charging it's parent cgroup is perfectly effective. > No dispute about this in either approaches. > > > And the opposite, if we'll charge the cgroup of a process, who created > > a cgroup, we'll not cover the most common case: systemd creating > > cgroups for all services in the system. > What I mean is that systemd should be charged for the cgroup creation. > Or more generally, any container/cgroup manager should be charged. > Consider a leak when it wouldn't remove spent cgroups, IMO the effect > (throttling, reclaim) should be exercised on such a culprit. As I said, there are 2 problems with charging systemd (or a similar daemon): 1) It often belongs to the root cgroup. 2) OOMing or failing some random memory allocations is a bad way to "communicate" a memory shortage to the daemon. What we really want is to prevent creating a huge number of cgroups (including dying cgroups) in some specific sub-tree(s). OOMing the daemon or returning -ENOMEM to some random syscalls will not help us to reach the goal and likely will bring a bad experience to a user. In a generic case I don't see how we can charge the cgroup which creates cgroups without solving these problems first. And if there is a very special case where we have to limit it, we can just add an additional layer: ` root or delegated root ` manager-parent-cgroup-with-a-limit ` manager-cgroup (systemd, docker, ...) ` [aggregation group(s)] ` job-group-1 ` ... ` job-group-n > > I don't think the existing workload (job-group-i above) should > unnecessarily suffer when only manager is acting up. Is that different > from your idea? > > > Right, but it's quite unusual for tasks from one cgroup to create sub-cgroups > > in completely different cgroup. In this particular case there are tons of other > > ways how a task from C00 can hurt C1. > I agree with that. > > > If I haven't overlooked anything, this should be first case when > cgroup-related structures are accounted (please correct me). > So this is setting a precendent, if others show useful to be accounted > in the future too. Right. > I'm thinking about cpu_cgroup_css_alloc() that can > also allocate a lot (with big CPU count). The current approach would lead > situations where matching cpu and memory csses needn't to exist and that > would need special handling. I'd definitely charge the parent cgroup in all similar cases. > > > > On Thu, Aug 06, 2020 at 09:16:03PM -0700, Andrew Morton wrote: > > > These week-old issues appear to be significant. Roman? Or someone > > > else? > Despite my concerns, I don't think this is fundamental and can't be > changed later so it doesn't prevent the inclusion in 5.9 RC1. Thank you!
next prev parent reply index Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-06-23 18:45 [PATCH v3 0/5] mm: memcg accounting of percpu memory Roman Gushchin 2020-06-23 18:45 ` [PATCH v3 1/5] percpu: return number of released bytes from pcpu_free_area() Roman Gushchin 2020-06-24 0:58 ` Shakeel Butt 2020-06-23 18:45 ` [PATCH v3 2/5] mm: memcg/percpu: account percpu memory to memory cgroups Roman Gushchin 2020-06-24 1:25 ` Shakeel Butt 2020-06-23 18:45 ` [PATCH v3 3/5] mm: memcg/percpu: per-memcg percpu memory statistics Roman Gushchin 2020-06-24 1:35 ` Shakeel Butt 2020-08-11 15:05 ` Johannes Weiner 2020-06-23 18:45 ` [PATCH v3 4/5] mm: memcg: charge memcg percpu memory to the parent cgroup Roman Gushchin 2020-06-24 1:40 ` Shakeel Butt 2020-06-24 1:49 ` Roman Gushchin 2020-07-29 17:10 ` Michal Koutný 2020-08-07 4:16 ` Andrew Morton 2020-08-07 4:37 ` Roman Gushchin 2020-08-10 19:33 ` Roman Gushchin 2020-08-11 14:47 ` Michal Koutný 2020-08-11 16:55 ` Roman Gushchin [this message] 2020-08-11 18:32 ` Michal Koutný 2020-08-11 19:32 ` Roman Gushchin 2020-08-12 16:28 ` Michal Koutný 2020-08-11 15:27 ` Johannes Weiner 2020-08-11 17:06 ` Roman Gushchin 2020-08-13 9:16 ` Naresh Kamboju 2020-08-13 23:27 ` Stephen Rothwell
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200811165527.GA1507044@carbon.DHCP.thefacebook.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-mm Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \ firstname.lastname@example.org public-inbox-index linux-mm Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kvack.linux-mm AGPL code for this site: git clone https://public-inbox.org/public-inbox.git