LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] doc, mm: sync up oom_score_adj documentation
@ 2020-07-09  6:26 Michal Hocko
  2020-07-09  6:26 ` [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2020-07-09  6:26 UTC (permalink / raw)
  To: Jonathan Corbet, Andrew Morton
  Cc: David Rientjes, Yafang Shao, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

There are at least two notes in the oom section. The 3% discount for
root processes is gone since d46078b28889 ("mm, oom: remove 3% bonus for
CAP_SYS_ADMIN processes").

Likewise children of the selected oom victim are not sacrificed since
bbbe48029720 ("mm, oom: remove 'prefer children over parent' heuristic")

Drop both of them.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.rst | 8 --------
 1 file changed, 8 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 996f3cfe7030..8e3b5dffcfa8 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1634,9 +1634,6 @@ may allocate from based on an estimation of its current memory and swap use.
 For example, if a task is using all allowed memory, its badness score will be
 1000.  If it is using half of its allowed memory, its score will be 500.
 
-There is an additional factor included in the badness score: the current memory
-and swap usage is discounted by 3% for root processes.
-
 The amount of "allowed" memory depends on the context in which the oom killer
 was called.  If it is due to the memory assigned to the allocating task's cpuset
 being exhausted, the allowed memory represents the set of mems assigned to that
@@ -1672,11 +1669,6 @@ The value of /proc/<pid>/oom_score_adj may be reduced no lower than the last
 value set by a CAP_SYS_RESOURCE process. To reduce the value any lower
 requires CAP_SYS_RESOURCE.
 
-Caveat: when a parent task is selected, the oom killer will sacrifice any first
-generation children with separate address spaces instead, if possible.  This
-avoids servers and important system daemons from being killed and loses the
-minimal amount of work.
-
 
 3.2 /proc/<pid>/oom_score - Display current oom-killer score
 -------------------------------------------------------------
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range
  2020-07-09  6:26 [PATCH 1/2] doc, mm: sync up oom_score_adj documentation Michal Hocko
@ 2020-07-09  6:26 ` Michal Hocko
  2020-07-09  7:41   ` Yafang Shao
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2020-07-09  6:26 UTC (permalink / raw)
  To: Jonathan Corbet, Andrew Morton
  Cc: David Rientjes, Yafang Shao, linux-mm, LKML, Michal Hocko

From: Michal Hocko <mhocko@suse.com>

The exported value includes oom_score_adj so the range is no [0, 1000]
as described in the previous section but rather [0, 2000]. Mention that
fact explicitly.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.rst | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 8e3b5dffcfa8..78a0dec323a3 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
 3.2 /proc/<pid>/oom_score - Display current oom-killer score
 -------------------------------------------------------------
 
+Please note that the exported value includes oom_score_adj so it is effectively
+in range [0,2000].
+
 This file can be used to check the current score used by the oom-killer is for
 any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
 process should be killed in an out-of-memory situation.
-- 
2.27.0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range
  2020-07-09  6:26 ` [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range Michal Hocko
@ 2020-07-09  7:41   ` Yafang Shao
  2020-07-09  8:18     ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Yafang Shao @ 2020-07-09  7:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jonathan Corbet, Andrew Morton, David Rientjes, Linux MM, LKML,
	Michal Hocko

On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> From: Michal Hocko <mhocko@suse.com>
>
> The exported value includes oom_score_adj so the range is no [0, 1000]
> as described in the previous section but rather [0, 2000]. Mention that
> fact explicitly.
>
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  Documentation/filesystems/proc.rst | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> index 8e3b5dffcfa8..78a0dec323a3 100644
> --- a/Documentation/filesystems/proc.rst
> +++ b/Documentation/filesystems/proc.rst
> @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
>  3.2 /proc/<pid>/oom_score - Display current oom-killer score
>  -------------------------------------------------------------
>
> +Please note that the exported value includes oom_score_adj so it is effectively
> +in range [0,2000].
> +

[0, 2000] may be not a proper range, see my reply in another thread.[1]
As this value hasn't been documented before and nobody notices that, I
think there might be no user really care about it before.
So we should discuss the proper range if we really think the user will
care about this value.

[1]. https://lore.kernel.org/linux-mm/CALOAHbAvj-gWZMLef=PuKTfDScwfM8gPPX0evzjoref1bG=zwA@mail.gmail.com/T/#m2332c3e6b7f869383cb74ab3a0f7b6c670b3b23b

>  This file can be used to check the current score used by the oom-killer is for
>  any given <pid>. Use it together with /proc/<pid>/oom_score_adj to tune which
>  process should be killed in an out-of-memory situation.
> --
> 2.27.0
>


-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range
  2020-07-09  7:41   ` Yafang Shao
@ 2020-07-09  8:18     ` Michal Hocko
  2020-07-09  9:01       ` Yafang Shao
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2020-07-09  8:18 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Jonathan Corbet, Andrew Morton, David Rientjes, Linux MM, LKML

On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > From: Michal Hocko <mhocko@suse.com>
> >
> > The exported value includes oom_score_adj so the range is no [0, 1000]
> > as described in the previous section but rather [0, 2000]. Mention that
> > fact explicitly.
> >
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > ---
> >  Documentation/filesystems/proc.rst | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > index 8e3b5dffcfa8..78a0dec323a3 100644
> > --- a/Documentation/filesystems/proc.rst
> > +++ b/Documentation/filesystems/proc.rst
> > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> >  -------------------------------------------------------------
> >
> > +Please note that the exported value includes oom_score_adj so it is effectively
> > +in range [0,2000].
> > +
> 
> [0, 2000] may be not a proper range, see my reply in another thread.[1]
> As this value hasn't been documented before and nobody notices that, I
> think there might be no user really care about it before.
> So we should discuss the proper range if we really think the user will
> care about this value.

Even if we decide the range should change, I do not really assume this
will happen, it is good to have the existing behavior clarified.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range
  2020-07-09  8:18     ` Michal Hocko
@ 2020-07-09  9:01       ` Yafang Shao
  2020-07-09  9:58         ` Michal Hocko
  0 siblings, 1 reply; 7+ messages in thread
From: Yafang Shao @ 2020-07-09  9:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jonathan Corbet, Andrew Morton, David Rientjes, Linux MM, LKML

On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > From: Michal Hocko <mhocko@suse.com>
> > >
> > > The exported value includes oom_score_adj so the range is no [0, 1000]
> > > as described in the previous section but rather [0, 2000]. Mention that
> > > fact explicitly.
> > >
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > ---
> > >  Documentation/filesystems/proc.rst | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > index 8e3b5dffcfa8..78a0dec323a3 100644
> > > --- a/Documentation/filesystems/proc.rst
> > > +++ b/Documentation/filesystems/proc.rst
> > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> > >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> > >  -------------------------------------------------------------
> > >
> > > +Please note that the exported value includes oom_score_adj so it is effectively
> > > +in range [0,2000].
> > > +
> >
> > [0, 2000] may be not a proper range, see my reply in another thread.[1]
> > As this value hasn't been documented before and nobody notices that, I
> > think there might be no user really care about it before.
> > So we should discuss the proper range if we really think the user will
> > care about this value.
>
> Even if we decide the range should change, I do not really assume this
> will happen, it is good to have the existing behavior clarified.
>

But the existing behavior is not defined in the kernel documentation
before, so I don't think that the user has a clear understanding of
the existing behavior.
The way to use the result of proc_oom_score is to compare which
processes will be killed first by the OOM killer, IOW, the user should
always use it to compare different processes. For example,

if proc_oom_score(process_a) > proc_oom_score(process_b)
then
     process_a will be killed before process_b
fi

And then  the user will "Use it together with
/proc/<pid>/oom_score_adj to tune which
 process should be killed in an out-of-memory situation."

That means what the user really cares about is the relative value, and
they will not care about the range or the absolute value.

-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range
  2020-07-09  9:01       ` Yafang Shao
@ 2020-07-09  9:58         ` Michal Hocko
  2020-07-09 11:20           ` Yafang Shao
  0 siblings, 1 reply; 7+ messages in thread
From: Michal Hocko @ 2020-07-09  9:58 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Jonathan Corbet, Andrew Morton, David Rientjes, Linux MM, LKML

On Thu 09-07-20 17:01:06, Yafang Shao wrote:
> On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> > > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > >
> > > > From: Michal Hocko <mhocko@suse.com>
> > > >
> > > > The exported value includes oom_score_adj so the range is no [0, 1000]
> > > > as described in the previous section but rather [0, 2000]. Mention that
> > > > fact explicitly.
> > > >
> > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > ---
> > > >  Documentation/filesystems/proc.rst | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > > index 8e3b5dffcfa8..78a0dec323a3 100644
> > > > --- a/Documentation/filesystems/proc.rst
> > > > +++ b/Documentation/filesystems/proc.rst
> > > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> > > >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> > > >  -------------------------------------------------------------
> > > >
> > > > +Please note that the exported value includes oom_score_adj so it is effectively
> > > > +in range [0,2000].
> > > > +
> > >
> > > [0, 2000] may be not a proper range, see my reply in another thread.[1]
> > > As this value hasn't been documented before and nobody notices that, I
> > > think there might be no user really care about it before.
> > > So we should discuss the proper range if we really think the user will
> > > care about this value.
> >
> > Even if we decide the range should change, I do not really assume this
> > will happen, it is good to have the existing behavior clarified.
> >
> 
> But the existing behavior is not defined in the kernel documentation
> before, so I don't think that the user has a clear understanding of
> the existing behavior.

Well, documentation is by no means authoritative, especially when it is
outdated or incomplete. What really matters is the observed behavior and
a lot of userspace depends on that or based on the specific
implementation.

> The way to use the result of proc_oom_score is to compare which
> processes will be killed first by the OOM killer, IOW, the user should
> always use it to compare different processes. For example,
> 
> if proc_oom_score(process_a) > proc_oom_score(process_b)
> then
>      process_a will be killed before process_b
> fi
> 
> And then  the user will "Use it together with
> /proc/<pid>/oom_score_adj to tune which
>  process should be killed in an out-of-memory situation."
> 
> That means what the user really cares about is the relative value, and
> they will not care about the range or the absolute value.

In an ideal world yes. But the real life tells a different story. Many
times userspace (ab)uses certain undocumented/unintended (mis)features
and the hard rule is that we never break userspace. We've learned that
through many painful historical experiences. Especially vaguely defined
functionality suffers from the problem.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range
  2020-07-09  9:58         ` Michal Hocko
@ 2020-07-09 11:20           ` Yafang Shao
  0 siblings, 0 replies; 7+ messages in thread
From: Yafang Shao @ 2020-07-09 11:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jonathan Corbet, Andrew Morton, David Rientjes, Linux MM, LKML

On Thu, Jul 9, 2020 at 5:58 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 09-07-20 17:01:06, Yafang Shao wrote:
> > On Thu, Jul 9, 2020 at 4:18 PM Michal Hocko <mhocko@kernel.org> wrote:
> > >
> > > On Thu 09-07-20 15:41:11, Yafang Shao wrote:
> > > > On Thu, Jul 9, 2020 at 2:26 PM Michal Hocko <mhocko@kernel.org> wrote:
> > > > >
> > > > > From: Michal Hocko <mhocko@suse.com>
> > > > >
> > > > > The exported value includes oom_score_adj so the range is no [0, 1000]
> > > > > as described in the previous section but rather [0, 2000]. Mention that
> > > > > fact explicitly.
> > > > >
> > > > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > > > > ---
> > > > >  Documentation/filesystems/proc.rst | 3 +++
> > > > >  1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
> > > > > index 8e3b5dffcfa8..78a0dec323a3 100644
> > > > > --- a/Documentation/filesystems/proc.rst
> > > > > +++ b/Documentation/filesystems/proc.rst
> > > > > @@ -1673,6 +1673,9 @@ requires CAP_SYS_RESOURCE.
> > > > >  3.2 /proc/<pid>/oom_score - Display current oom-killer score
> > > > >  -------------------------------------------------------------
> > > > >
> > > > > +Please note that the exported value includes oom_score_adj so it is effectively
> > > > > +in range [0,2000].
> > > > > +
> > > >
> > > > [0, 2000] may be not a proper range, see my reply in another thread.[1]
> > > > As this value hasn't been documented before and nobody notices that, I
> > > > think there might be no user really care about it before.
> > > > So we should discuss the proper range if we really think the user will
> > > > care about this value.
> > >
> > > Even if we decide the range should change, I do not really assume this
> > > will happen, it is good to have the existing behavior clarified.
> > >
> >
> > But the existing behavior is not defined in the kernel documentation
> > before, so I don't think that the user has a clear understanding of
> > the existing behavior.
>
> Well, documentation is by no means authoritative, especially when it is
> outdated or incomplete. What really matters is the observed behavior and
> a lot of userspace depends on that or based on the specific
> implementation.
>
> > The way to use the result of proc_oom_score is to compare which
> > processes will be killed first by the OOM killer, IOW, the user should
> > always use it to compare different processes. For example,
> >
> > if proc_oom_score(process_a) > proc_oom_score(process_b)
> > then
> >      process_a will be killed before process_b
> > fi
> >
> > And then  the user will "Use it together with
> > /proc/<pid>/oom_score_adj to tune which
> >  process should be killed in an out-of-memory situation."
> >
> > That means what the user really cares about is the relative value, and
> > they will not care about the range or the absolute value.
>
> In an ideal world yes. But the real life tells a different story. Many
> times userspace (ab)uses certain undocumented/unintended (mis)features
> and the hard rule is that we never break userspace. We've learned that
> through many painful historical experiences. Especially vaguely defined
> functionality suffers from the problem.
> --

All right. I don't insist if we think the change in range may break
the userspace.

-- 
Thanks
Yafang

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09  6:26 [PATCH 1/2] doc, mm: sync up oom_score_adj documentation Michal Hocko
2020-07-09  6:26 ` [PATCH 2/2] doc, mm: clarify /proc/<pid>/oom_score value range Michal Hocko
2020-07-09  7:41   ` Yafang Shao
2020-07-09  8:18     ` Michal Hocko
2020-07-09  9:01       ` Yafang Shao
2020-07-09  9:58         ` Michal Hocko
2020-07-09 11:20           ` Yafang Shao

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git
	git clone --mirror https://lore.kernel.org/lkml/10 lkml/git/10.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git