From: Shakeel Butt <email@example.com> To: Tejun Heo <firstname.lastname@example.org> Cc: Jakub Kicinski <email@example.com>, Andrew Morton <firstname.lastname@example.org>, Linux MM <email@example.com>, Kernel Team <firstname.lastname@example.org>, Johannes Weiner <email@example.com>, Chris Down <firstname.lastname@example.org>, Cgroups <email@example.com> Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted Date: Fri, 17 Apr 2020 14:51:09 -0700 [thread overview] Message-ID: <CALvZod6LT25t9aAA1KHmf1U4-L8zSjUXQ4VQvX4cMT1A+R_gfirstname.lastname@example.org> (raw) In-Reply-To: <20200417193539.GC43469@mtj.thefacebook.com> On Fri, Apr 17, 2020 at 12:35 PM Tejun Heo <email@example.com> wrote: > > Hello, > > On Fri, Apr 17, 2020 at 10:51:10AM -0700, Shakeel Butt wrote: > > > Can you please elaborate concrete scenarios? I'm having a hard time seeing > > > differences from page cache. > > > > Oh I was talking about the global reclaim here. In global reclaim, any > > task can be throttled (throttle_direct_reclaim()). Memory freed by > > using the CPU of high priority low latency jobs can be stolen by low > > priority batch jobs. > > I'm still having a hard time following this thread of discussion, most > likely because my knoweldge of mm is fleeting at best. Can you please ELI5 > why the above is specifically relevant to this discussion? > No, it is not relevant to this discussion "now". The mention of performance isolation in my first email was mostly due to my lack of understanding about what problem this patch series is trying to solve. So, let's skip this topic. > I'm gonna list two things that come to my mind just in case that'd help > reducing the back and forth. > > * With protection based configurations, protected cgroups wouldn't usually > go into direct reclaim themselves all that much. > > * We do have holes in accounting CPU cycles used by reclaim to the orgins, > which, for example, prevents making memory.high reclaim async and lets > memory pressure contaminate cpu isolation possibly to a significant degree > on lower core count machines in some scenarios, but that's a separate > issue we need to address in the future. > I have an opinion on the above but I will restrain as those are not relevant to the patch series. > > > cgroup A has memory.low protection and no other restrictions. cgroup B has > > > no protection and has access to swap. When B's memory starts bloating and > > > gets the system under memory contention, it'll start consuming swap until it > > > can't. When swap becomes depleted for B, there's nothing holding it back and > > > B will start eating into A's protection. > > > > > > > In this example does 'B' have memory.high and memory.max set and by A > > B doesn't have anything set. > > > having no other restrictions, I am assuming you meant unlimited high > > and max for A? Can 'A' use memory.min? > > Sure, it can but 1. the purpose of the example is illustrating the > imcompleteness of the existing mechanism I understand but is this a real world configuration people use and do we want to support the scenario where without setting high/max, the kernel still guarantees the isolation. > 2. there's a big difference between > letting the machine hit the wall and waiting for the kernel OOM to trigger > and being able to monitor the situation as it gradually develops and respond > to it, which is the whole point of the low/high mechanisms. > I am not really against the proposed solution. What I am trying to see is if this problem is more general than an anon/swap-full problem and if a more general solution is possible. To me it seems like, whenever a large portion of reclaimable memory (anon, file or kmem) becomes non-reclaimable abruptly, the memory isolation can be broken. You gave the anon/swap-full example, let me see if I can come up with file and kmem examples (with similar A & B). 1) B has a lot of page cache but temporarily gets pinned for rdma or something and the system gets low on memory. B can attack A's low protected memory as B's page cache is not reclaimable temporarily. 2) B has a lot of dentries/inodes but someone has taken a write lock on shrinker_rwsem and got stuck in allocation/reclaim or CPU preempted. B can attack A's low protected memory as B's slabs are not reclaimable temporarily. I think the aim is to slow down B enough to give the PSI monitor a chance to act before either B targets A's protected memory or the kernel triggers oom-kill. My question is do we really want to solve the issue without limiting B through high/max? Also isn't fine grained PSI monitoring along with limiting B through memory.[high|max] general enough to solve all three example scenarios? thanks, Shakeel
next prev parent reply other threads:[~2020-04-17 21:51 UTC|newest] Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-17 1:06 Jakub Kicinski 2020-04-17 1:06 ` [PATCH 1/3] mm: prepare for swap over-high accounting and penalty calculation Jakub Kicinski 2020-04-17 1:06 ` [PATCH 2/3] mm: move penalty delay clamping out of calculate_high_delay() Jakub Kicinski 2020-04-17 1:06 ` [PATCH 3/3] mm: automatically penalize tasks with high swap use Jakub Kicinski 2020-04-17 7:37 ` Michal Hocko 2020-04-17 23:22 ` Jakub Kicinski 2020-04-17 16:11 ` [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted Shakeel Butt 2020-04-17 16:23 ` Tejun Heo 2020-04-17 17:18 ` Shakeel Butt 2020-04-17 17:36 ` Tejun Heo 2020-04-17 17:51 ` Shakeel Butt 2020-04-17 19:35 ` Tejun Heo 2020-04-17 21:51 ` Shakeel Butt [this message] 2020-04-17 22:59 ` Tejun Heo 2020-04-20 16:12 ` Shakeel Butt 2020-04-20 16:47 ` Tejun Heo 2020-04-20 17:03 ` Michal Hocko 2020-04-20 17:06 ` Tejun Heo 2020-04-21 11:06 ` Michal Hocko 2020-04-21 14:27 ` Johannes Weiner 2020-04-21 16:11 ` Michal Hocko 2020-04-21 16:56 ` Johannes Weiner 2020-04-22 13:26 ` Michal Hocko 2020-04-22 14:15 ` Johannes Weiner 2020-04-22 15:43 ` Michal Hocko 2020-04-22 17:13 ` Johannes Weiner 2020-04-22 18:49 ` Michal Hocko 2020-04-23 15:00 ` Johannes Weiner 2020-04-24 15:05 ` Michal Hocko 2020-04-28 14:24 ` Johannes Weiner 2020-04-29 9:55 ` Michal Hocko 2020-04-21 19:09 ` Shakeel Butt 2020-04-21 21:59 ` Johannes Weiner 2020-04-21 22:39 ` Shakeel Butt 2020-04-21 15:20 ` Tejun Heo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CALvZod6LT25t9aAA1KHmf1U4-L8zSjUXQ4VQvX4cMT1A+R_gfirstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --subject='Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).