From: Tejun Heo <firstname.lastname@example.org> To: Shakeel Butt <email@example.com> Cc: Jakub Kicinski <firstname.lastname@example.org>, Andrew Morton <email@example.com>, Linux MM <firstname.lastname@example.org>, Kernel Team <email@example.com>, Johannes Weiner <firstname.lastname@example.org>, Chris Down <email@example.com>, Cgroups <firstname.lastname@example.org> Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted Date: Fri, 17 Apr 2020 18:59:41 -0400 [thread overview] Message-ID: <20200417225941.GE43469@mtj.thefacebook.com> (raw) In-Reply-To: <CALvZod6LT25t9aAA1KHmf1U4-L8zSjUXQ4VQvX4cMT1A+R_gemail@example.com> Hello, Shakeel. On Fri, Apr 17, 2020 at 02:51:09PM -0700, Shakeel Butt wrote: > > > In this example does 'B' have memory.high and memory.max set and by A > > > > B doesn't have anything set. > > > > > having no other restrictions, I am assuming you meant unlimited high > > > and max for A? Can 'A' use memory.min? > > > > Sure, it can but 1. the purpose of the example is illustrating the > > imcompleteness of the existing mechanism > > I understand but is this a real world configuration people use and do > we want to support the scenario where without setting high/max, the > kernel still guarantees the isolation. Yes, that's the configuration we're deploying fleet-wide and at least the direction I'm gonna be pushing towards for reasons of generality and ease of use. Here's an example to illustrate the point - consider distros or upstream desktop environments wanting to provide basic resource configuration to protect user sessions and critical system services needed for user interaction by default. That is something which is clearly and immediately useful but also is extremely challenging to achieve with limits. There are no universally good enough upper limits. Any one number is gonna be both too high to guarantee protection and too low for use cases which legitimately need that much memory. That's because the upper limits aren't work-conserving and have a high chance of doing harm when misconfigured making figuring out the correct configuration almost impossible with per-use-case manual tuning. The whole idea behind memory.low and related efforts is resolving that problem by making memory control more work-conserving and forgiving, so that users can say something like "I want the user session to have at least 25% memory protected if needed and possible" and get most of the benefits of carefully crafted configuration. We're already deploying such configuration and it works well enough for a wide variety of workloads. > > 2. there's a big difference between > > letting the machine hit the wall and waiting for the kernel OOM to trigger > > and being able to monitor the situation as it gradually develops and respond > > to it, which is the whole point of the low/high mechanisms. > > I am not really against the proposed solution. What I am trying to see > is if this problem is more general than an anon/swap-full problem and > if a more general solution is possible. To me it seems like, whenever > a large portion of reclaimable memory (anon, file or kmem) becomes > non-reclaimable abruptly, the memory isolation can be broken. You gave > the anon/swap-full example, let me see if I can come up with file and > kmem examples (with similar A & B). > > 1) B has a lot of page cache but temporarily gets pinned for rdma or > something and the system gets low on memory. B can attack A's low > protected memory as B's page cache is not reclaimable temporarily. > > 2) B has a lot of dentries/inodes but someone has taken a write lock > on shrinker_rwsem and got stuck in allocation/reclaim or CPU > preempted. B can attack A's low protected memory as B's slabs are not > reclaimable temporarily. > > I think the aim is to slow down B enough to give the PSI monitor a > chance to act before either B targets A's protected memory or the > kernel triggers oom-kill. > > My question is do we really want to solve the issue without limiting B > through high/max? Also isn't fine grained PSI monitoring along with > limiting B through memory.[high|max] general enough to solve all three > example scenarios? Yes, we definitely want to solve the issue without involving high and max. I hope that part is clear now. As for whether we want to cover niche cases such as RDMA pinning a large swath of page cache, I don't know, maybe? But I don't think that's a problem with a comparable importance especially given that in both cases you listed the problem is temporary and the workload wouldn't have the ability to keep expanding undeterred. Thanks. -- tejun
next prev parent reply other threads:[~2020-04-17 22:59 UTC|newest] Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-04-17 1:06 Jakub Kicinski 2020-04-17 1:06 ` [PATCH 1/3] mm: prepare for swap over-high accounting and penalty calculation Jakub Kicinski 2020-04-17 1:06 ` [PATCH 2/3] mm: move penalty delay clamping out of calculate_high_delay() Jakub Kicinski 2020-04-17 1:06 ` [PATCH 3/3] mm: automatically penalize tasks with high swap use Jakub Kicinski 2020-04-17 7:37 ` Michal Hocko 2020-04-17 23:22 ` Jakub Kicinski 2020-04-17 16:11 ` [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted Shakeel Butt 2020-04-17 16:23 ` Tejun Heo 2020-04-17 17:18 ` Shakeel Butt 2020-04-17 17:36 ` Tejun Heo 2020-04-17 17:51 ` Shakeel Butt 2020-04-17 19:35 ` Tejun Heo 2020-04-17 21:51 ` Shakeel Butt 2020-04-17 22:59 ` Tejun Heo [this message] 2020-04-20 16:12 ` Shakeel Butt 2020-04-20 16:47 ` Tejun Heo 2020-04-20 17:03 ` Michal Hocko 2020-04-20 17:06 ` Tejun Heo 2020-04-21 11:06 ` Michal Hocko 2020-04-21 14:27 ` Johannes Weiner 2020-04-21 16:11 ` Michal Hocko 2020-04-21 16:56 ` Johannes Weiner 2020-04-22 13:26 ` Michal Hocko 2020-04-22 14:15 ` Johannes Weiner 2020-04-22 15:43 ` Michal Hocko 2020-04-22 17:13 ` Johannes Weiner 2020-04-22 18:49 ` Michal Hocko 2020-04-23 15:00 ` Johannes Weiner 2020-04-24 15:05 ` Michal Hocko 2020-04-28 14:24 ` Johannes Weiner 2020-04-29 9:55 ` Michal Hocko 2020-04-21 19:09 ` Shakeel Butt 2020-04-21 21:59 ` Johannes Weiner 2020-04-21 22:39 ` Shakeel Butt 2020-04-21 15:20 ` Tejun Heo
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20200417225941.GE43469@mtj.thefacebook.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).