All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
@ 2021-07-30 16:20 Aaron Tomlin
  2021-08-01 20:01 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Aaron Tomlin @ 2021-07-30 16:20 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, mhocko, penguin-kernel, rientjes, llong, neelx, linux-kernel

Changes since v2:
 - Use single character (e.g. 'R' for MMF_OOM_SKIP) as suggested
   by Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
 - Add new header to oom_dump_tasks documentation
 - Provide further justification


The output generated by dump_tasks() can be helpful to determine why
there was an OOM condition and which rogue task potentially caused it.
Please note that this is only provided when sysctl oom_dump_tasks is
enabled.

At the present time, when showing potential OOM victims, we do not
exclude any task that are not OOM eligible e.g. those that have
MMF_OOM_SKIP set; it is possible that the last OOM killable victim was
already OOM killed, yet the OOM reaper failed to reclaim memory and set
MMF_OOM_SKIP. This can be confusing (or perhaps even be misleading) to the
viewer. Now, we already unconditionally display a task's oom_score_adj_min
value that can be set to OOM_SCORE_ADJ_MIN which is indicative of an
"unkillable" task.

This patch provides a clear indication with regard to the OOM ineligibility
(and why) of each displayed task with the addition of a new column namely
"oom_skipped". An example is provided below:

    [ 5084.524970] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name
    [ 5084.526397] [660417]     0 660417    35869      683   167936        0         -1000 M conmon
    [ 5084.526400] [660452]     0 660452   175834      472    86016        0          -998  pod
    [ 5084.527460] [752415]     0 752415    35869      650   172032        0         -1000 M conmon
    [ 5084.527462] [752575] 1001050000 752575   184205    11158   700416        0           999  npm
    [ 5084.527467] [753606] 1001050000 753606   183380    46843  2134016        0           999  node
    [ 5084.527581] Memory cgroup out of memory: Killed process 753606 (node) total-vm:733520kB, anon-rss:161228kB, file-rss:26144kB, shmem-rss:0kB, UID:1001050000

So, a single character 'M' is for OOM_SCORE_ADJ_MIN, 'R' MMF_OOM_SKIP and
'V' for in_vfork().


Signed-off-by: Aaron Tomlin <atomlin@redhat.com>
---
 Documentation/admin-guide/sysctl/vm.rst |  5 ++--
 mm/oom_kill.c                           | 31 +++++++++++++++++++++----
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index 003d5cc3751b..4c79fa00ddb3 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -650,8 +650,9 @@ oom_dump_tasks
 Enables a system-wide task dump (excluding kernel threads) to be produced
 when the kernel performs an OOM-killing and includes such information as
 pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
-score, and name.  This is helpful to determine why the OOM killer was
-invoked, to identify the rogue task that caused it, and to determine why
+score, oom eligibility status and name.  This is helpful to determine why
+the OOM killer was invoked, to identify the rogue task that caused it, and
+to determine why
 the OOM killer chose the task it did to kill.
 
 If this is set to zero, this information is suppressed.  On very
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index c729a4c4a1ac..36daa6917b62 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -160,6 +160,27 @@ static inline bool is_sysrq_oom(struct oom_control *oc)
 	return oc->order == -1;
 }
 
+/**
+ * is_task_eligible_oom - determine if and why a task cannot be OOM killed
+ * @tsk: task to check
+ *
+ * Needs to be called with task_lock().
+ */
+static const char * const is_task_oom_eligible(struct task_struct *p)
+{
+	long adj;
+
+	adj = (long)p->signal->oom_score_adj;
+	if (adj == OOM_SCORE_ADJ_MIN)
+		return "M";
+	else if (test_bit(MMF_OOM_SKIP, &p->mm->flags)
+		return "R";
+	else if (in_vfork(p))
+		return "V";
+	else
+		return "";
+}
+
 /* return true if the task is not adequate as candidate victim task. */
 static bool oom_unkillable_task(struct task_struct *p)
 {
@@ -401,12 +422,13 @@ static int dump_task(struct task_struct *p, void *arg)
 		return 0;
 	}
 
-	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %s\n",
+	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %1s %s\n",
 		task->pid, from_kuid(&init_user_ns, task_uid(task)),
 		task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
 		mm_pgtables_bytes(task->mm),
 		get_mm_counter(task->mm, MM_SWAPENTS),
-		task->signal->oom_score_adj, task->comm);
+		task->signal->oom_score_adj, is_task_oom_eligible(task),
+		task->comm);
 	task_unlock(task);
 
 	return 0;
@@ -420,12 +442,13 @@ static int dump_task(struct task_struct *p, void *arg)
  * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes
  * are not shown.
  * State information includes task's pid, uid, tgid, vm size, rss,
- * pgtables_bytes, swapents, oom_score_adj value, and name.
+ * pgtables_bytes, swapents, oom_score_adj value, oom eligibility status
+ * and name.
  */
 static void dump_tasks(struct oom_control *oc)
 {
 	pr_info("Tasks state (memory values in pages):\n");
-	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
+	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name\n");
 
 	if (is_memcg_oom(oc))
 		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
  2021-07-30 16:20 [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks Aaron Tomlin
@ 2021-08-01 20:01 ` Andrew Morton
  2021-08-02  3:49   ` David Rientjes
  2021-08-02  6:34 ` Michal Hocko
  2 siblings, 0 replies; 9+ messages in thread
From: Andrew Morton @ 2021-08-01 20:01 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: linux-mm, mhocko, penguin-kernel, rientjes, llong, neelx, linux-kernel

On Fri, 30 Jul 2021 17:20:02 +0100 Aaron Tomlin <atomlin@redhat.com> wrote:

> Changes since v2:
>  - Use single character (e.g. 'R' for MMF_OOM_SKIP) as suggested
>    by Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
>  - Add new header to oom_dump_tasks documentation
>  - Provide further justification
> 
> 
> The output generated by dump_tasks() can be helpful to determine why
> there was an OOM condition and which rogue task potentially caused it.
> Please note that this is only provided when sysctl oom_dump_tasks is
> enabled.
> 
> At the present time, when showing potential OOM victims, we do not
> exclude any task that are not OOM eligible e.g. those that have
> MMF_OOM_SKIP set; it is possible that the last OOM killable victim was
> already OOM killed, yet the OOM reaper failed to reclaim memory and set
> MMF_OOM_SKIP. This can be confusing (or perhaps even be misleading) to the
> viewer. Now, we already unconditionally display a task's oom_score_adj_min
> value that can be set to OOM_SCORE_ADJ_MIN which is indicative of an
> "unkillable" task.
> 
> This patch provides a clear indication with regard to the OOM ineligibility
> (and why) of each displayed task with the addition of a new column namely
> "oom_skipped". An example is provided below:
> 
>     [ 5084.524970] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name
>     [ 5084.526397] [660417]     0 660417    35869      683   167936        0         -1000 M conmon
>     [ 5084.526400] [660452]     0 660452   175834      472    86016        0          -998  pod
>     [ 5084.527460] [752415]     0 752415    35869      650   172032        0         -1000 M conmon
>     [ 5084.527462] [752575] 1001050000 752575   184205    11158   700416        0           999  npm
>     [ 5084.527467] [753606] 1001050000 753606   183380    46843  2134016        0           999  node
>     [ 5084.527581] Memory cgroup out of memory: Killed process 753606 (node) total-vm:733520kB, anon-rss:161228kB, file-rss:26144kB, shmem-rss:0kB, UID:1001050000
> 
> So, a single character 'M' is for OOM_SCORE_ADJ_MIN, 'R' MMF_OOM_SKIP and
> 'V' for in_vfork().
> 
> index 003d5cc3751b..4c79fa00ddb3 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -650,8 +650,9 @@ oom_dump_tasks
>  Enables a system-wide task dump (excluding kernel threads) to be produced
>  when the kernel performs an OOM-killing and includes such information as
>  pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
> -score, and name.  This is helpful to determine why the OOM killer was
> -invoked, to identify the rogue task that caused it, and to determine why
> +score, oom eligibility status and name.  This is helpful to determine why
> +the OOM killer was invoked, to identify the rogue task that caused it, and
> +to determine why

It would be better if the meaning of 'M', 'R' and 'V' were described here.

>  the OOM killer chose the task it did to kill.
>  
> +/**
> + * is_task_eligible_oom - determine if and why a task cannot be OOM killed
> + * @tsk: task to check
> + *
> + * Needs to be called with task_lock().
> + */
> +static const char * const is_task_oom_eligible(struct task_struct *p)

Name seems inappropriate.  task_oom_eligibility()?

> +{
> +	long adj;
> +
> +	adj = (long)p->signal->oom_score_adj;
> +	if (adj == OOM_SCORE_ADJ_MIN)
> +		return "M";
> +	else if (test_bit(MMF_OOM_SKIP, &p->mm->flags)
> +		return "R";
> +	else if (in_vfork(p))
> +		return "V";
> +	else
> +		return "";
> +}


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
  2021-07-30 16:20 [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks Aaron Tomlin
@ 2021-08-02  3:49   ` David Rientjes
  2021-08-02  3:49   ` David Rientjes
  2021-08-02  6:34 ` Michal Hocko
  2 siblings, 0 replies; 9+ messages in thread
From: David Rientjes @ 2021-08-02  3:49 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: linux-mm, akpm, mhocko, penguin-kernel, llong, neelx, linux-kernel

On Fri, 30 Jul 2021, Aaron Tomlin wrote:

>  Documentation/admin-guide/sysctl/vm.rst |  5 ++--
>  mm/oom_kill.c                           | 31 +++++++++++++++++++++----
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 003d5cc3751b..4c79fa00ddb3 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -650,8 +650,9 @@ oom_dump_tasks
>  Enables a system-wide task dump (excluding kernel threads) to be produced
>  when the kernel performs an OOM-killing and includes such information as
>  pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
> -score, and name.  This is helpful to determine why the OOM killer was
> -invoked, to identify the rogue task that caused it, and to determine why
> +score, oom eligibility status and name.  This is helpful to determine why
> +the OOM killer was invoked, to identify the rogue task that caused it, and
> +to determine why
>  the OOM killer chose the task it did to kill.
>  
>  If this is set to zero, this information is suppressed.  On very
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index c729a4c4a1ac..36daa6917b62 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -160,6 +160,27 @@ static inline bool is_sysrq_oom(struct oom_control *oc)
>  	return oc->order == -1;
>  }
>  
> +/**
> + * is_task_eligible_oom - determine if and why a task cannot be OOM killed
> + * @tsk: task to check
> + *
> + * Needs to be called with task_lock().
> + */
> +static const char * const is_task_oom_eligible(struct task_struct *p)
> +{
> +	long adj;
> +
> +	adj = (long)p->signal->oom_score_adj;
> +	if (adj == OOM_SCORE_ADJ_MIN)
> +		return "M";

oom_score_adj is shown already in the tasklist dump, I'm not sure what 
value this adds.

> +	else if (test_bit(MMF_OOM_SKIP, &p->mm->flags)
> +		return "R";
> +	else if (in_vfork(p))
> +		return "V";

This is going to be racy, we can't show that a task that is emitted as 
part of the tasklist dump was did not have in_vfork() == true at the time 
oom_badness() was called.

Wouldn't it be better to simply print the output of oom_badness() to the 
tasklist dump instead so we get complete information?

We could simply special case a LONG_MIN return value as -1000 or "min".

> +	else
> +		return "";
> +}
> +
>  /* return true if the task is not adequate as candidate victim task. */
>  static bool oom_unkillable_task(struct task_struct *p)
>  {
> @@ -401,12 +422,13 @@ static int dump_task(struct task_struct *p, void *arg)
>  		return 0;
>  	}
>  
> -	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %s\n",
> +	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %1s %s\n",
>  		task->pid, from_kuid(&init_user_ns, task_uid(task)),
>  		task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
>  		mm_pgtables_bytes(task->mm),
>  		get_mm_counter(task->mm, MM_SWAPENTS),
> -		task->signal->oom_score_adj, task->comm);
> +		task->signal->oom_score_adj, is_task_oom_eligible(task),
> +		task->comm);
>  	task_unlock(task);
>  
>  	return 0;
> @@ -420,12 +442,13 @@ static int dump_task(struct task_struct *p, void *arg)
>   * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes
>   * are not shown.
>   * State information includes task's pid, uid, tgid, vm size, rss,
> - * pgtables_bytes, swapents, oom_score_adj value, and name.
> + * pgtables_bytes, swapents, oom_score_adj value, oom eligibility status
> + * and name.
>   */
>  static void dump_tasks(struct oom_control *oc)
>  {
>  	pr_info("Tasks state (memory values in pages):\n");
> -	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
> +	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name\n");
>  
>  	if (is_memcg_oom(oc))
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> -- 
> 2.31.1
> 
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
@ 2021-08-02  3:49   ` David Rientjes
  0 siblings, 0 replies; 9+ messages in thread
From: David Rientjes @ 2021-08-02  3:49 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: linux-mm, akpm, mhocko, penguin-kernel, llong, neelx, linux-kernel

On Fri, 30 Jul 2021, Aaron Tomlin wrote:

>  Documentation/admin-guide/sysctl/vm.rst |  5 ++--
>  mm/oom_kill.c                           | 31 +++++++++++++++++++++----
>  2 files changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
> index 003d5cc3751b..4c79fa00ddb3 100644
> --- a/Documentation/admin-guide/sysctl/vm.rst
> +++ b/Documentation/admin-guide/sysctl/vm.rst
> @@ -650,8 +650,9 @@ oom_dump_tasks
>  Enables a system-wide task dump (excluding kernel threads) to be produced
>  when the kernel performs an OOM-killing and includes such information as
>  pid, uid, tgid, vm size, rss, pgtables_bytes, swapents, oom_score_adj
> -score, and name.  This is helpful to determine why the OOM killer was
> -invoked, to identify the rogue task that caused it, and to determine why
> +score, oom eligibility status and name.  This is helpful to determine why
> +the OOM killer was invoked, to identify the rogue task that caused it, and
> +to determine why
>  the OOM killer chose the task it did to kill.
>  
>  If this is set to zero, this information is suppressed.  On very
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index c729a4c4a1ac..36daa6917b62 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -160,6 +160,27 @@ static inline bool is_sysrq_oom(struct oom_control *oc)
>  	return oc->order == -1;
>  }
>  
> +/**
> + * is_task_eligible_oom - determine if and why a task cannot be OOM killed
> + * @tsk: task to check
> + *
> + * Needs to be called with task_lock().
> + */
> +static const char * const is_task_oom_eligible(struct task_struct *p)
> +{
> +	long adj;
> +
> +	adj = (long)p->signal->oom_score_adj;
> +	if (adj == OOM_SCORE_ADJ_MIN)
> +		return "M";

oom_score_adj is shown already in the tasklist dump, I'm not sure what 
value this adds.

> +	else if (test_bit(MMF_OOM_SKIP, &p->mm->flags)
> +		return "R";
> +	else if (in_vfork(p))
> +		return "V";

This is going to be racy, we can't show that a task that is emitted as 
part of the tasklist dump was did not have in_vfork() == true at the time 
oom_badness() was called.

Wouldn't it be better to simply print the output of oom_badness() to the 
tasklist dump instead so we get complete information?

We could simply special case a LONG_MIN return value as -1000 or "min".

> +	else
> +		return "";
> +}
> +
>  /* return true if the task is not adequate as candidate victim task. */
>  static bool oom_unkillable_task(struct task_struct *p)
>  {
> @@ -401,12 +422,13 @@ static int dump_task(struct task_struct *p, void *arg)
>  		return 0;
>  	}
>  
> -	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %s\n",
> +	pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %1s %s\n",
>  		task->pid, from_kuid(&init_user_ns, task_uid(task)),
>  		task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
>  		mm_pgtables_bytes(task->mm),
>  		get_mm_counter(task->mm, MM_SWAPENTS),
> -		task->signal->oom_score_adj, task->comm);
> +		task->signal->oom_score_adj, is_task_oom_eligible(task),
> +		task->comm);
>  	task_unlock(task);
>  
>  	return 0;
> @@ -420,12 +442,13 @@ static int dump_task(struct task_struct *p, void *arg)
>   * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes
>   * are not shown.
>   * State information includes task's pid, uid, tgid, vm size, rss,
> - * pgtables_bytes, swapents, oom_score_adj value, and name.
> + * pgtables_bytes, swapents, oom_score_adj value, oom eligibility status
> + * and name.
>   */
>  static void dump_tasks(struct oom_control *oc)
>  {
>  	pr_info("Tasks state (memory values in pages):\n");
> -	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name\n");
> +	pr_info("[  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name\n");
>  
>  	if (is_memcg_oom(oc))
>  		mem_cgroup_scan_tasks(oc->memcg, dump_task, oc);
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
  2021-07-30 16:20 [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks Aaron Tomlin
  2021-08-01 20:01 ` Andrew Morton
  2021-08-02  3:49   ` David Rientjes
@ 2021-08-02  6:34 ` Michal Hocko
  2021-08-02 15:12   ` Aaron Tomlin
  2 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2021-08-02  6:34 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: linux-mm, akpm, penguin-kernel, rientjes, llong, neelx, linux-kernel

On Fri 30-07-21 17:20:02, Aaron Tomlin wrote:
> Changes since v2:
>  - Use single character (e.g. 'R' for MMF_OOM_SKIP) as suggested
>    by Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
>  - Add new header to oom_dump_tasks documentation
>  - Provide further justification
> 
> 
> The output generated by dump_tasks() can be helpful to determine why
> there was an OOM condition and which rogue task potentially caused it.
> Please note that this is only provided when sysctl oom_dump_tasks is
> enabled.
> 
> At the present time, when showing potential OOM victims, we do not
> exclude any task that are not OOM eligible e.g.

Well, this is not precise. We do exclude ineligible. Consider tasks that
are outside of the OOM domain for example. You are right that we are not
excluding all of them though.

> those that have
> MMF_OOM_SKIP set; it is possible that the last OOM killable victim was
> already OOM killed, yet the OOM reaper failed to reclaim memory and set
> MMF_OOM_SKIP. This can be confusing (or perhaps even be misleading) to the
> viewer. Now, we already unconditionally display a task's oom_score_adj_min
> value that can be set to OOM_SCORE_ADJ_MIN which is indicative of an
> "unkillable" task.
> 
> This patch provides a clear indication with regard to the OOM ineligibility
> (and why) of each displayed task with the addition of a new column namely
> "oom_skipped". An example is provided below:
> 
>     [ 5084.524970] [ pid ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj oom_skipped name
>     [ 5084.526397] [660417]     0 660417    35869      683   167936        0         -1000 M conmon
>     [ 5084.526400] [660452]     0 660452   175834      472    86016        0          -998  pod
>     [ 5084.527460] [752415]     0 752415    35869      650   172032        0         -1000 M conmon
>     [ 5084.527462] [752575] 1001050000 752575   184205    11158   700416        0           999  npm
>     [ 5084.527467] [753606] 1001050000 753606   183380    46843  2134016        0           999  node
>     [ 5084.527581] Memory cgroup out of memory: Killed process 753606 (node) total-vm:733520kB, anon-rss:161228kB, file-rss:26144kB, shmem-rss:0kB, UID:1001050000
> 
> So, a single character 'M' is for OOM_SCORE_ADJ_MIN, 'R' MMF_OOM_SKIP and
> 'V' for in_vfork().

I have to say I dislike this for two reasons. First and foremost it
makes parsing unnecessarily more complex. Now you have a potentially
an empty column to special case. Secondly MMF_OOM_SKIP is an internal
state that shouldn't really leak to userspace IMO. in_vfork is a racy
check and M is already expressed via oom_score_adj so it duplicates an
existing information.

If you really want/need to make any change here then I would propose to
either add E(eligible)/I(ligible) column without any specifics or
consistently skip over all tasks which are not eligible.
> 2.31.1

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
  2021-08-02  3:49   ` David Rientjes
  (?)
@ 2021-08-02 14:50   ` Aaron Tomlin
  -1 siblings, 0 replies; 9+ messages in thread
From: Aaron Tomlin @ 2021-08-02 14:50 UTC (permalink / raw)
  To: David Rientjes
  Cc: linux-mm, akpm, mhocko, penguin-kernel, llong, neelx, linux-kernel

On Sun 2021-08-01 20:49 -0700, David Rientjes wrote:
> oom_score_adj is shown already in the tasklist dump, I'm not sure what 
> value this adds.

Fair enough.

> 
> > +	else if (test_bit(MMF_OOM_SKIP, &p->mm->flags)
> > +		return "R";
> > +	else if (in_vfork(p))
> > +		return "V";
> 
> This is going to be racy, we can't show that a task that is emitted as 
> part of the tasklist dump was did not have in_vfork() == true at the time 
> oom_badness() was called.

Yes, this is true.

> Wouldn't it be better to simply print the output of oom_badness() to the 
> tasklist dump instead so we get complete information?

I think this would be acceptable.

> We could simply special case a LONG_MIN return value as -1000 or "min".

Agreed.





Kind regards,

-- 
Aaron Tomlin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
  2021-08-02  6:34 ` Michal Hocko
@ 2021-08-02 15:12   ` Aaron Tomlin
  2021-08-03  7:05     ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Aaron Tomlin @ 2021-08-02 15:12 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, akpm, penguin-kernel, rientjes, llong, neelx, linux-kernel

On Mon 2021-08-02 08:34 +0200, Michal Hocko wrote:
> If you really want/need to make any change here then I would propose to
> either add E(eligible)/I(ligible) column without any specifics or
> consistently skip over all tasks which are not eligible.

How about the suggestion made by David i.e. exposing the value returned by
oom_badness(), as if I understand correctly, this would provide a more
complete picture with regard to the rationale used?


Kind regards,

-- 
Aaron Tomlin


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
  2021-08-02 15:12   ` Aaron Tomlin
@ 2021-08-03  7:05     ` Michal Hocko
  2021-08-03 10:32       ` Aaron Tomlin
  0 siblings, 1 reply; 9+ messages in thread
From: Michal Hocko @ 2021-08-03  7:05 UTC (permalink / raw)
  To: Aaron Tomlin
  Cc: linux-mm, akpm, penguin-kernel, rientjes, llong, neelx, linux-kernel

On Mon 02-08-21 16:12:50, Aaron Tomlin wrote:
> On Mon 2021-08-02 08:34 +0200, Michal Hocko wrote:
> > If you really want/need to make any change here then I would propose to
> > either add E(eligible)/I(ligible) column without any specifics or
> > consistently skip over all tasks which are not eligible.
> 
> How about the suggestion made by David i.e. exposing the value returned by
> oom_badness(), as if I understand correctly, this would provide a more
> complete picture with regard to the rationale used?

There were some attempts to print oom_score during OOM. E.g.
http://lkml.kernel.org/r/20190808183247.28206-1-echron@arista.com. That
one was rejected on the grounds that the number on its own doesn't
really present any real value. It is really only valuable when comparing
to other potential oom victims. I have to say I am still worried about
printing this internal scoring as it should have really been an
implementation detail but with /proc/<pid>/oom_score this is likely a
lost battle and I am willing to give up on that front.

I am still not entirely convinced this is worth doing though.
oom_badness is not a cheap operation. task_lock has to be taken again
during dump_tasks for each task so the already quite expensive operation
will be more so. Is this really something we cannot live without?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks
  2021-08-03  7:05     ` Michal Hocko
@ 2021-08-03 10:32       ` Aaron Tomlin
  0 siblings, 0 replies; 9+ messages in thread
From: Aaron Tomlin @ 2021-08-03 10:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-mm, akpm, penguin-kernel, rientjes, llong, neelx, linux-kernel

On Tue 2021-08-03 09:05 +0200, Michal Hocko wrote:
> There were some attempts to print oom_score during OOM. E.g.
> http://lkml.kernel.org/r/20190808183247.28206-1-echron@arista.com. That
> one was rejected on the grounds that the number on its own doesn't
> really present any real value. It is really only valuable when comparing
> to other potential oom victims. I have to say I am still worried about
> printing this internal scoring as it should have really been an
> implementation detail but with /proc/<pid>/oom_score this is likely a
> lost battle and I am willing to give up on that front.

Understood.

> I am still not entirely convinced this is worth doing though.
> oom_badness is not a cheap operation. task_lock has to be taken again
> during dump_tasks for each task so the already quite expensive operation
> will be more so. Is this really something we cannot live without?

Fair enough and I now agree, it is unquestionably not worth it.



Kind regards,

-- 
Aaron Tomlin


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-08-03 10:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-30 16:20 [PATCH v3] mm/oom_kill: show oom eligibility when displaying the current memory state of all tasks Aaron Tomlin
2021-08-01 20:01 ` Andrew Morton
2021-08-02  3:49 ` David Rientjes
2021-08-02  3:49   ` David Rientjes
2021-08-02 14:50   ` Aaron Tomlin
2021-08-02  6:34 ` Michal Hocko
2021-08-02 15:12   ` Aaron Tomlin
2021-08-03  7:05     ` Michal Hocko
2021-08-03 10:32       ` Aaron Tomlin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.