All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Roman Gushchin <guro@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Linux MM <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Cgroups <cgroups@vger.kernel.org>,
	David Rientjes <rientjes@google.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Greg Thelen <gthelen@google.com>,
	Dragos Sbirlea <dragoss@google.com>,
	Priya Duraisamy <padmapriyad@google.com>
Subject: Re: [RFC] memory reserve for userspace oom-killer
Date: Wed, 21 Apr 2021 07:13:45 -0700	[thread overview]
Message-ID: <CALvZod6oCBB6tDh5wABSwdHfcDzLX7S7cOTLp_4Qk4DCi50X_A@mail.gmail.com> (raw)
In-Reply-To: <YH/S2dVxk2le8SMw@dhcp22.suse.cz>

On Wed, Apr 21, 2021 at 12:23 AM Michal Hocko <mhocko@suse.com> wrote:
>
[...]
> > In our observation the global reclaim is very non-deterministic at the
> > tail and dramatically impacts the reliability of the system. We are
> > looking for a solution which is independent of the global reclaim.
>
> I believe it is worth purusing a solution that would make the memory
> reclaim more predictable. I have seen direct reclaim memory throttling
> in the past. For some reason which I haven't tried to examine this has
> become less of a problem with newer kernels. Maybe the memory access
> patterns have changed or those problems got replaced by other issues but
> an excessive throttling is definitely something that we want to address
> rather than work around by some user visible APIs.
>

I agree we want to address the excessive throttling but for everyone
on the machine and most importantly it is a moving target. The reclaim
code continues to evolve and in addition it has callbacks to diverse
sets of subsystems.

The user visible APIs is for one specific use-case i.e. oom-killer
which will indirectly help in reducing the excessive throttling.

[...]
> > So, the suggestion is to have a per-task flag to (1) indicate to not
> > throttle and (2) fail allocations easily on significant memory
> > pressure.
> >
> > For (1), the challenge I see is that there are a lot of places in the
> > reclaim code paths where a task can get throttled. There are
> > filesystems that block/throttle in slab shrinking. Any process can get
> > blocked on an unrelated page or inode writeback within reclaim.
> >
> > For (2), I am not sure how to deterministically define "significant
> > memory pressure". One idea is to follow the __GFP_NORETRY semantics
> > and along with (1) the userspace oom-killer will see ENOMEM more
> > reliably than stucking in the reclaim.
>
> Some of the interfaces (e.g. seq_file uses GFP_KERNEL reclaim strength)
> could be more relaxed and rather fail than OOM kill but wouldn't your
> OOM handler be effectivelly dysfunctional when not able to collect data
> to make a decision?
>

Yes it would be. Roman is suggesting to have a precomputed kill-list
(pidfds ready to send SIGKILL) and whenever oom-killer gets ENOMEM, it
would go with the kill-list. Though we are still contemplating the
ways and side-effects of preferably returning ENOMEM in slowpath for
oom-killer and in addition the complexity to maintain the kill-list
and keeping it up to date.

thanks,
Shakeel

WARNING: multiple messages have this Message-ID (diff)
From: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
To: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>
Cc: Roman Gushchin <guro-b10kYP2dOMg@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Suren Baghdasaryan
	<surenb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Dragos Sbirlea <dragoss-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Priya Duraisamy
	<padmapriyad-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC] memory reserve for userspace oom-killer
Date: Wed, 21 Apr 2021 07:13:45 -0700	[thread overview]
Message-ID: <CALvZod6oCBB6tDh5wABSwdHfcDzLX7S7cOTLp_4Qk4DCi50X_A@mail.gmail.com> (raw)
In-Reply-To: <YH/S2dVxk2le8SMw-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>

On Wed, Apr 21, 2021 at 12:23 AM Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org> wrote:
>
[...]
> > In our observation the global reclaim is very non-deterministic at the
> > tail and dramatically impacts the reliability of the system. We are
> > looking for a solution which is independent of the global reclaim.
>
> I believe it is worth purusing a solution that would make the memory
> reclaim more predictable. I have seen direct reclaim memory throttling
> in the past. For some reason which I haven't tried to examine this has
> become less of a problem with newer kernels. Maybe the memory access
> patterns have changed or those problems got replaced by other issues but
> an excessive throttling is definitely something that we want to address
> rather than work around by some user visible APIs.
>

I agree we want to address the excessive throttling but for everyone
on the machine and most importantly it is a moving target. The reclaim
code continues to evolve and in addition it has callbacks to diverse
sets of subsystems.

The user visible APIs is for one specific use-case i.e. oom-killer
which will indirectly help in reducing the excessive throttling.

[...]
> > So, the suggestion is to have a per-task flag to (1) indicate to not
> > throttle and (2) fail allocations easily on significant memory
> > pressure.
> >
> > For (1), the challenge I see is that there are a lot of places in the
> > reclaim code paths where a task can get throttled. There are
> > filesystems that block/throttle in slab shrinking. Any process can get
> > blocked on an unrelated page or inode writeback within reclaim.
> >
> > For (2), I am not sure how to deterministically define "significant
> > memory pressure". One idea is to follow the __GFP_NORETRY semantics
> > and along with (1) the userspace oom-killer will see ENOMEM more
> > reliably than stucking in the reclaim.
>
> Some of the interfaces (e.g. seq_file uses GFP_KERNEL reclaim strength)
> could be more relaxed and rather fail than OOM kill but wouldn't your
> OOM handler be effectivelly dysfunctional when not able to collect data
> to make a decision?
>

Yes it would be. Roman is suggesting to have a precomputed kill-list
(pidfds ready to send SIGKILL) and whenever oom-killer gets ENOMEM, it
would go with the kill-list. Though we are still contemplating the
ways and side-effects of preferably returning ENOMEM in slowpath for
oom-killer and in addition the complexity to maintain the kill-list
and keeping it up to date.

thanks,
Shakeel

  reply	other threads:[~2021-04-21 14:14 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-20  1:44 [RFC] memory reserve for userspace oom-killer Shakeel Butt
2021-04-20  1:44 ` Shakeel Butt
2021-04-20  1:44 ` Shakeel Butt
2021-04-20  6:45 ` Michal Hocko
2021-04-20  6:45   ` Michal Hocko
2021-04-20 16:04   ` Shakeel Butt
2021-04-20 16:04     ` Shakeel Butt
2021-04-20 16:04     ` Shakeel Butt
2021-04-21  7:16     ` Michal Hocko
2021-04-21  7:16       ` Michal Hocko
2021-04-21 13:57       ` Shakeel Butt
2021-04-21 13:57         ` Shakeel Butt
2021-04-21 13:57         ` Shakeel Butt
2021-04-21 14:29         ` Michal Hocko
2021-04-22 12:33           ` [RFC PATCH] Android OOM helper proof of concept peter enderborg
2021-04-22 12:33             ` peter enderborg
2021-04-22 13:03             ` Michal Hocko
2021-05-05  0:37           ` [RFC] memory reserve for userspace oom-killer Shakeel Butt
2021-05-05  0:37             ` Shakeel Butt
2021-05-05  0:37             ` Shakeel Butt
2021-05-05  1:26             ` Suren Baghdasaryan
2021-05-05  1:26               ` Suren Baghdasaryan
2021-05-05  2:45               ` Shakeel Butt
2021-05-05  2:45                 ` Shakeel Butt
2021-05-05  2:45                 ` Shakeel Butt
2021-05-05  2:59                 ` Suren Baghdasaryan
2021-05-05  2:59                   ` Suren Baghdasaryan
2021-05-05  2:59                   ` Suren Baghdasaryan
2021-05-05  2:43             ` Hillf Danton
2021-04-20 19:17 ` Roman Gushchin
2021-04-20 19:17   ` Roman Gushchin
2021-04-20 19:36   ` Suren Baghdasaryan
2021-04-20 19:36     ` Suren Baghdasaryan
2021-04-20 19:36     ` Suren Baghdasaryan
2021-04-21  1:18   ` Shakeel Butt
2021-04-21  1:18     ` Shakeel Butt
2021-04-21  1:18     ` Shakeel Butt
2021-04-21  2:58     ` Roman Gushchin
2021-04-21 13:26       ` Shakeel Butt
2021-04-21 13:26         ` Shakeel Butt
2021-04-21 13:26         ` Shakeel Butt
2021-04-21 19:04         ` Roman Gushchin
2021-04-21 19:04           ` Roman Gushchin
2021-04-21  7:23     ` Michal Hocko
2021-04-21  7:23       ` Michal Hocko
2021-04-21 14:13       ` Shakeel Butt [this message]
2021-04-21 14:13         ` Shakeel Butt
2021-04-21 14:13         ` Shakeel Butt
2021-04-21 17:05 ` peter enderborg
2021-04-21 18:28   ` Shakeel Butt
2021-04-21 18:28     ` Shakeel Butt
2021-04-21 18:28     ` Shakeel Butt
2021-04-21 18:46     ` Peter.Enderborg
2021-04-21 18:46       ` Peter.Enderborg-7U/KSKJipcs
2021-04-21 19:18       ` Shakeel Butt
2021-04-21 19:18         ` Shakeel Butt
2021-04-21 19:18         ` Shakeel Butt
2021-04-22  5:38         ` Peter.Enderborg
2021-04-22  5:38           ` Peter.Enderborg-7U/KSKJipcs
2021-04-22 14:27           ` Shakeel Butt
2021-04-22 14:27             ` Shakeel Butt
2021-04-22 14:27             ` Shakeel Butt
2021-04-22 15:41             ` Peter.Enderborg
2021-04-22 15:41               ` Peter.Enderborg-7U/KSKJipcs
2021-04-22 13:08   ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALvZod6oCBB6tDh5wABSwdHfcDzLX7S7cOTLp_4Qk4DCi50X_A@mail.gmail.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=dragoss@google.com \
    --cc=gthelen@google.com \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=padmapriyad@google.com \
    --cc=rientjes@google.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.