[2/2] doc, mm: clarify /proc/<pid>/oom_score value range
diff mbox series

Message ID 20200709062603.18480-2-mhocko@kernel.org
State In Next
Commit 06dde99d771aa2b936b91d65a44b012a17e76e9c
Headers show
Series
  • [1/2] doc, mm: sync up oom_score_adj documentation
Related show

Commit Message

Michal Hocko July 9, 2020, 6:26 a.m. UTC
From: Michal Hocko <mhocko@suse.com>

The exported value includes oom_score_adj so the range is no [0, 1000]
as described in the previous section but rather [0, 2000]. Mention that
fact explicitly.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.rst | 3 +++
 1 file changed, 3 insertions(+)

Comments

Yafang Shao July 9, 2020, 7:41 a.m. UTC | #1
On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> From: Michal Hocko <mhocko@suse.com>
>
> The exported value includes oom_score_adj so the range is no [0, 1000]
> as described in the previous section but rather [0, 2000]. Mention that
> fact explicitly.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  Documentation/filesystems/proc.rst | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 8e3b5dffcfa8..78a0dec323a3 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
>  3.2 /proc/<pid>/oom_score - Display current oom-killer score
>  -------------------------------------------------------------
>
> +Please note that the exported value includes oom_score_adj so it is effectively
> +in range [0,2000].
> +

[0, 2000] may be not a proper range, see my reply in another thread.[1]
As this value hasn't been documented before and nobody notices that, I
think there might be no user really care about it before.
So we should discuss the proper range if we really think the user will
care about this value.

[1]. https://lore.kernel.org/linux-mm/CALOAHbAvj-gWZMLef=PuKTfDScwfM8gPPX0evzjoref1bG=zwA@mail.gmail.com/T/#m2332c3e6b7f869383cb74ab3a0f7b6c670b3b23b

>  This file can be used to check the current score used by the oom-killer is for
>  any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
>  process should be killed in an out-of-memory situation.
> --
> 2.27.0
>
Michal Hocko July 9, 2020, 8:18 a.m. UTC | #2
On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > From: Michal Hocko <mhocko@suse.com>
> >
> > The exported value includes oom_score_adj so the range is no [0, 1000]
> > as described in the previous section but rather [0, 2000]. Mention that
> > fact explicitly.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  Documentation/filesystems/proc.rst | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 8e3b5dffcfa8..78a0dec323a3 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> >  -------------------------------------------------------------
> >
> > +Please note that the exported value includes oom_score_adj so it is effectively
> > +in range [0,2000].
> > +
> 
> [0, 2000] may be not a proper range, see my reply in another thread.[1]
> As this value hasn't been documented before and nobody notices that, I
> think there might be no user really care about it before.
> So we should discuss the proper range if we really think the user will
> care about this value.

Even if we decide the range should change, I do not really assume this
will happen, it is good to have the existing behavior clarified.
Yafang Shao July 9, 2020, 9:01 a.m. UTC | #3
On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > From: Michal Hocko <mhocko@suse.com>
> > >
> > > The exported value includes oom_score_adj so the range is no [0, 1000]
> > > as described in the previous section but rather [0, 2000]. Mention that
> > > fact explicitly.
> > >
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  Documentation/filesystems/proc.rst | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > index 8e3b5dffcfa8..78a0dec323a3 100644
> > > --- a/Documentation/filesystems/proc.rst
> > > +++ b/Documentation/filesystems/proc.rst
> > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> > >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> > >  -------------------------------------------------------------
> > >
> > > +Please note that the exported value includes oom_score_adj so it is effectively
> > > +in range [0,2000].
> > > +
> >
> > [0, 2000] may be not a proper range, see my reply in another thread.[1]
> > As this value hasn't been documented before and nobody notices that, I
> > think there might be no user really care about it before.
> > So we should discuss the proper range if we really think the user will
> > care about this value.
>
> Even if we decide the range should change, I do not really assume this
> will happen, it is good to have the existing behavior clarified.
>

But the existing behavior is not defined in the kernel documentation
before, so I don't think that the user has a clear understanding of
the existing behavior.
The way to use the result of proc_oom_score is to compare which
processes will be killed first by the OOM killer, IOW, the user should
always use it to compare different processes. For example,

if proc_oom_score(process_a) > proc_oom_score(process_b)
then
     process_a will be killed before process_b
fi

And then  the user will "Use it together with
/proc/<pid>/oom_score_adj to tune which
 process should be killed in an out-of-memory situation."

That means what the user really cares about is the relative value, and
they will not care about the range or the absolute value.
Michal Hocko July 9, 2020, 9:58 a.m. UTC | #4
On Thu 09-07-20 17:01:06, Yafang Shao wrote:
> On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> > > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > From: Michal Hocko <mhocko@suse.com>
> > > >
> > > > The exported value includes oom_score_adj so the range is no [0, 1000]
> > > > as described in the previous section but rather [0, 2000]. Mention that
> > > > fact explicitly.
> > > >
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  Documentation/filesystems/proc.rst | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > > index 8e3b5dffcfa8..78a0dec323a3 100644
> > > > --- a/Documentation/filesystems/proc.rst
> > > > +++ b/Documentation/filesystems/proc.rst
> > > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> > > >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> > > >  -------------------------------------------------------------
> > > >
> > > > +Please note that the exported value includes oom_score_adj so it is effectively
> > > > +in range [0,2000].
> > > > +
> > >
> > > [0, 2000] may be not a proper range, see my reply in another thread.[1]
> > > As this value hasn't been documented before and nobody notices that, I
> > > think there might be no user really care about it before.
> > > So we should discuss the proper range if we really think the user will
> > > care about this value.
> >
> > Even if we decide the range should change, I do not really assume this
> > will happen, it is good to have the existing behavior clarified.
> >
> 
> But the existing behavior is not defined in the kernel documentation
> before, so I don't think that the user has a clear understanding of
> the existing behavior.

Well, documentation is by no means authoritative, especially when it is
outdated or incomplete. What really matters is the observed behavior and
a lot of userspace depends on that or based on the specific
implementation.

> The way to use the result of proc_oom_score is to compare which
> processes will be killed first by the OOM killer, IOW, the user should
> always use it to compare different processes. For example,
> 
> if proc_oom_score(process_a) > proc_oom_score(process_b)
> then
>      process_a will be killed before process_b
> fi
> 
> And then  the user will "Use it together with
> /proc/<pid>/oom_score_adj to tune which
>  process should be killed in an out-of-memory situation."
> 
> That means what the user really cares about is the relative value, and
> they will not care about the range or the absolute value.

In an ideal world yes. But the real life tells a different story. Many
times userspace (ab)uses certain undocumented/unintended (mis)features
and the hard rule is that we never break userspace. We've learned that
through many painful historical experiences. Especially vaguely defined
functionality suffers from the problem.
Yafang Shao July 9, 2020, 11:20 a.m. UTC | #5
On Thu, Jul 9, 2020 at 5:58 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 09-07-20 17:01:06, Yafang Shao wrote:
> > On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> > > > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > >
> > > > > The exported value includes oom_score_adj so the range is no [0, 1000]
> > > > > as described in the previous section but rather [0, 2000]. Mention that
> > > > > fact explicitly.
> > > > >
> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > > ---
> > > > >  Documentation/filesystems/proc.rst | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > > > index 8e3b5dffcfa8..78a0dec323a3 100644
> > > > > --- a/Documentation/filesystems/proc.rst
> > > > > +++ b/Documentation/filesystems/proc.rst
> > > > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> > > > >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> > > > >  -------------------------------------------------------------
> > > > >
> > > > > +Please note that the exported value includes oom_score_adj so it is effectively
> > > > > +in range [0,2000].
> > > > > +
> > > >
> > > > [0, 2000] may be not a proper range, see my reply in another thread.[1]
> > > > As this value hasn't been documented before and nobody notices that, I
> > > > think there might be no user really care about it before.
> > > > So we should discuss the proper range if we really think the user will
> > > > care about this value.
> > >
> > > Even if we decide the range should change, I do not really assume this
> > > will happen, it is good to have the existing behavior clarified.
> > >
> >
> > But the existing behavior is not defined in the kernel documentation
> > before, so I don't think that the user has a clear understanding of
> > the existing behavior.
>
> Well, documentation is by no means authoritative, especially when it is
> outdated or incomplete. What really matters is the observed behavior and
> a lot of userspace depends on that or based on the specific
> implementation.
>
> > The way to use the result of proc_oom_score is to compare which
> > processes will be killed first by the OOM killer, IOW, the user should
> > always use it to compare different processes. For example,
> >
> > if proc_oom_score(process_a) > proc_oom_score(process_b)
> > then
> >      process_a will be killed before process_b
> > fi
> >
> > And then  the user will "Use it together with
> > /proc/<pid>/oom_score_adj to tune which
> >  process should be killed in an out-of-memory situation."
> >
> > That means what the user really cares about is the relative value, and
> > they will not care about the range or the absolute value.
>
> In an ideal world yes. But the real life tells a different story. Many
> times userspace (ab)uses certain undocumented/unintended (mis)features
> and the hard rule is that we never break userspace. We've learned that
> through many painful historical experiences. Especially vaguely defined
> functionality suffers from the problem.
> --

All right. I don't insist if we think the change in range may break
the userspace.

Patch
diff mbox series

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 8e3b5dffcfa8..78a0dec323a3 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1673,6 +1673,9 @@  requires CAP_SYS_RESOURCE.
 3.2 /proc/<pid>/oom_score - Display current oom-killer score
 -------------------------------------------------------------
 
+Please note that the exported value includes oom_score_adj so it is effectively
+in range [0,2000].
+
 This file can be used to check the current score used by the oom-killer is for
 any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
 process should be killed in an out-of-memory situation.