From: Johannes Weiner <hannes@cmpxchg.org> To: Roman Gushchin <guro@fb.com> Cc: Andrew Morton <akpm@linux-foundation.org>, Leon Yang <lnyng@fb.com>, Chris Down <chris@chrisdown.name>, Michal Hocko <mhocko@suse.com>, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Date: Wed, 18 Aug 2021 10:15:24 -0400 [thread overview] Message-ID: <YR0V/KhKSYZs+ksn@cmpxchg.org> (raw) In-Reply-To: <YRwRzjOexeXbkirV@carbon.dhcp.thefacebook.com> On Tue, Aug 17, 2021 at 12:45:18PM -0700, Roman Gushchin wrote: > On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote: > > We've noticed occasional OOM killing when memory.low settings are in > > effect for cgroups. This is unexpected and undesirable as memory.low > > is supposed to express non-OOMing memory priorities between cgroups. > > > > The reason for this is proportional memory.low reclaim. When cgroups > > are below their memory.low threshold, reclaim passes them over in the > > first round, and then retries if it couldn't find pages anywhere else. > > But when cgroups are slighly above their memory.low setting, page scan > > force is scaled down and diminished in proportion to the overage, to > > the point where it can cause reclaim to fail as well - only in that > > case we currently don't retry, and instead trigger OOM. > > > > To fix this, hook proportional reclaim into the same retry logic we > > have in place for when cgroups are skipped entirely. This way if > > reclaim fails and some cgroups were scanned with dimished pressure, > > we'll try another full-force cycle before giving up and OOMing. > > > > Reported-by: Leon Yang <lnyng@fb.com> > > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> > > Acked-by: Roman Gushchin <guro@fb.com> Thank you. > I guess it's a stable material, so maybe adding: > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") Yes, that Fixes makes sense. Plus: Cc: <stable@vger.kernel.org> # 5.4+ I initially didn't tag it because the issue is over two years old and we've had no other reports of this. But thinking about it, it's probably more a lack of users rather than severity. At FB we only noticed with a recent rollout of memory_recursiveprot (8a931f801340c2be10552c7b5622d5f4852f3a36) because we didn't have working memory.low configurations before that. But now that we do notice, it's a problem worth fixing. So yes, stable makes sense. Thanks.
WARNING: multiple messages have this Message-ID (diff)
From: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> To: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Leon Yang <lnyng-b10kYP2dOMg@public.gmane.org>, Chris Down <chris-6Bi1550iOqEnzZ6mRAm98g@public.gmane.org>, Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>, linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, kernel-team-b10kYP2dOMg@public.gmane.org Subject: Re: [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Date: Wed, 18 Aug 2021 10:15:24 -0400 [thread overview] Message-ID: <YR0V/KhKSYZs+ksn@cmpxchg.org> (raw) In-Reply-To: <YRwRzjOexeXbkirV-cx5fftMpWqeCjSd+JxjunQ2O0Ztt9esIQQ4Iyu8u01E@public.gmane.org> On Tue, Aug 17, 2021 at 12:45:18PM -0700, Roman Gushchin wrote: > On Tue, Aug 17, 2021 at 02:05:06PM -0400, Johannes Weiner wrote: > > We've noticed occasional OOM killing when memory.low settings are in > > effect for cgroups. This is unexpected and undesirable as memory.low > > is supposed to express non-OOMing memory priorities between cgroups. > > > > The reason for this is proportional memory.low reclaim. When cgroups > > are below their memory.low threshold, reclaim passes them over in the > > first round, and then retries if it couldn't find pages anywhere else. > > But when cgroups are slighly above their memory.low setting, page scan > > force is scaled down and diminished in proportion to the overage, to > > the point where it can cause reclaim to fail as well - only in that > > case we currently don't retry, and instead trigger OOM. > > > > To fix this, hook proportional reclaim into the same retry logic we > > have in place for when cgroups are skipped entirely. This way if > > reclaim fails and some cgroups were scanned with dimished pressure, > > we'll try another full-force cycle before giving up and OOMing. > > > > Reported-by: Leon Yang <lnyng-b10kYP2dOMg@public.gmane.org> > > Signed-off-by: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org> > > Acked-by: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org> Thank you. > I guess it's a stable material, so maybe adding: > Fixes: 9783aa9917f8 ("mm, memcg: proportional memory.{low,min} reclaim") Yes, that Fixes makes sense. Plus: Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> # 5.4+ I initially didn't tag it because the issue is over two years old and we've had no other reports of this. But thinking about it, it's probably more a lack of users rather than severity. At FB we only noticed with a recent rollout of memory_recursiveprot (8a931f801340c2be10552c7b5622d5f4852f3a36) because we didn't have working memory.low configurations before that. But now that we do notice, it's a problem worth fixing. So yes, stable makes sense. Thanks.
next prev parent reply other threads:[~2021-08-18 14:17 UTC|newest] Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top 2021-08-17 18:05 [PATCH] mm: memcontrol: fix occasional OOMs due to proportional memory.low reclaim Johannes Weiner 2021-08-17 18:05 ` Johannes Weiner 2021-08-17 18:44 ` Rik van Riel 2021-08-17 18:44 ` Rik van Riel 2021-08-17 19:10 ` Shakeel Butt 2021-08-17 19:10 ` Shakeel Butt 2021-08-17 19:10 ` Shakeel Butt 2021-08-18 14:16 ` Johannes Weiner 2021-08-18 14:16 ` Johannes Weiner 2021-08-17 19:14 ` Andrew Morton 2021-08-17 19:45 ` Roman Gushchin 2021-08-17 19:45 ` Roman Gushchin 2021-08-18 14:15 ` Johannes Weiner [this message] 2021-08-18 14:15 ` Johannes Weiner 2021-08-18 20:18 ` Chris Down 2021-08-18 20:18 ` Chris Down 2021-08-19 15:01 ` Michal Hocko 2021-08-19 20:38 ` Johannes Weiner 2021-08-20 15:44 ` Michal Hocko 2021-08-23 16:09 ` Michal Koutný 2021-08-23 16:09 ` Michal Koutný 2021-08-23 17:48 ` Johannes Weiner 2021-08-23 17:48 ` Johannes Weiner 2021-08-24 13:01 ` Michal Koutný 2021-08-24 13:01 ` Michal Koutný
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=YR0V/KhKSYZs+ksn@cmpxchg.org \ --to=hannes@cmpxchg.org \ --cc=akpm@linux-foundation.org \ --cc=cgroups@vger.kernel.org \ --cc=chris@chrisdown.name \ --cc=guro@fb.com \ --cc=kernel-team@fb.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=lnyng@fb.com \ --cc=mhocko@suse.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.