linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm, oom: OOM killer use rss size without shmem
@ 2019-02-22  4:37 Junil Lee
  2019-02-22  7:10 ` Michal Hocko
  0 siblings, 1 reply; 3+ messages in thread
From: Junil Lee @ 2019-02-22  4:37 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: akpm, willy, mhocko, pasha.tatashin, kirill.shutemov, jrdr.linux,
	dan.j.williams, alexander.h.duyck, andreyknvl, arunks,
	keith.busch, guro, hannes, rientjes, penguin-kernel, shakeelb,
	yuzhoujian, Junil Lee

The oom killer use get_mm_rss() function to estimate how free memory
will be reclaimed when the oom killer select victim task.

However, the returned rss size by get_mm_rss() function was changed from
"mm, shmem: add internal shmem resident memory accounting" commit.
This commit makes the get_mm_rss() return size including SHMEM pages.

The oom killer can't get free memory from SHMEM pages directly after
kill victim process, it leads to mis-calculate victim points.

Therefore, make new API as get_mm_rss_wo_shmem() which returns the rss
value excluding SHMEM_PAGES.

Signed-off-by: Junil Lee <junil0814.lee@lge.com>
---
 include/linux/mm.h | 6 ++++++
 mm/oom_kill.c      | 4 ++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 2d483db..bca3acc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1701,6 +1701,12 @@ static inline int mm_counter(struct page *page)
 	return mm_counter_file(page);
 }
 
+static inline unsigned long get_mm_rss_wo_shmem(struct mm_struct *mm)
+{
+	return get_mm_counter(mm, MM_FILEPAGES) +
+		get_mm_counter(mm, MM_ANONPAGES);
+}
+
 static inline unsigned long get_mm_rss(struct mm_struct *mm)
 {
 	return get_mm_counter(mm, MM_FILEPAGES) +
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 3a24848..e569737 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -230,7 +230,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
 	 * The baseline for the badness score is the proportion of RAM that each
 	 * task's rss, pagetable and swap space use.
 	 */
-	points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
+	points = get_mm_rss_wo_shmem(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
 		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
 	task_unlock(p);
 
@@ -419,7 +419,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
 
 		pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %s\n",
 			task->pid, from_kuid(&init_user_ns, task_uid(task)),
-			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
+			task->tgid, task->mm->total_vm, get_mm_rss_wo_shmem(task->mm),
 			mm_pgtables_bytes(task->mm),
 			get_mm_counter(task->mm, MM_SWAPENTS),
 			task->signal->oom_score_adj, task->comm);
-- 
2.6.2


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm, oom: OOM killer use rss size without shmem
  2019-02-22  4:37 [PATCH] mm, oom: OOM killer use rss size without shmem Junil Lee
@ 2019-02-22  7:10 ` Michal Hocko
  2019-02-22 16:19   ` Johannes Weiner
  0 siblings, 1 reply; 3+ messages in thread
From: Michal Hocko @ 2019-02-22  7:10 UTC (permalink / raw)
  To: Junil Lee
  Cc: linux-mm, linux-kernel, akpm, willy, pasha.tatashin,
	kirill.shutemov, jrdr.linux, dan.j.williams, alexander.h.duyck,
	andreyknvl, arunks, keith.busch, guro, hannes, rientjes,
	penguin-kernel, shakeelb, yuzhoujian

On Fri 22-02-19 13:37:33, Junil Lee wrote:
> The oom killer use get_mm_rss() function to estimate how free memory
> will be reclaimed when the oom killer select victim task.
> 
> However, the returned rss size by get_mm_rss() function was changed from
> "mm, shmem: add internal shmem resident memory accounting" commit.
> This commit makes the get_mm_rss() return size including SHMEM pages.

This was actually the case even before eca56ff906bdd because SHMEM was
just accounted to MM_FILEPAGES so this commit hasn't changed much
really.

Besides that we cannot really rule out SHMEM pages simply. They are
backing MAP_ANON|MAP_SHARED which might be unmapped and freed during the
oom victim exit. Moreover this is essentially the same as file backed
pages or even MAP_PRIVATE|MAP_ANON pages. Bothe can be pinned by other
processes e.g. via private pages via CoW mappings and file pages by
filesystem or simply mlocked by another process. So this really gross
evaluation will never be perfect. We would basically have to do exact
calculation of the freeable memory of each process and that is just not
feasible.

That being said, I do not think the patch is an improvement in that
direction. It just turnes one fuzzy evaluation by another that even
misses a lot of memory potentially.

> The oom killer can't get free memory from SHMEM pages directly after
> kill victim process, it leads to mis-calculate victim points.
> 
> Therefore, make new API as get_mm_rss_wo_shmem() which returns the rss
> value excluding SHMEM_PAGES.
> 
> Signed-off-by: Junil Lee <junil0814.lee@lge.com>
> ---
>  include/linux/mm.h | 6 ++++++
>  mm/oom_kill.c      | 4 ++--
>  2 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 2d483db..bca3acc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1701,6 +1701,12 @@ static inline int mm_counter(struct page *page)
>  	return mm_counter_file(page);
>  }
>  
> +static inline unsigned long get_mm_rss_wo_shmem(struct mm_struct *mm)
> +{
> +	return get_mm_counter(mm, MM_FILEPAGES) +
> +		get_mm_counter(mm, MM_ANONPAGES);
> +}
> +
>  static inline unsigned long get_mm_rss(struct mm_struct *mm)
>  {
>  	return get_mm_counter(mm, MM_FILEPAGES) +
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 3a24848..e569737 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -230,7 +230,7 @@ unsigned long oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
>  	 * The baseline for the badness score is the proportion of RAM that each
>  	 * task's rss, pagetable and swap space use.
>  	 */
> -	points = get_mm_rss(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
> +	points = get_mm_rss_wo_shmem(p->mm) + get_mm_counter(p->mm, MM_SWAPENTS) +
>  		mm_pgtables_bytes(p->mm) / PAGE_SIZE;
>  	task_unlock(p);
>  
> @@ -419,7 +419,7 @@ static void dump_tasks(struct mem_cgroup *memcg, const nodemask_t *nodemask)
>  
>  		pr_info("[%7d] %5d %5d %8lu %8lu %8ld %8lu         %5hd %s\n",
>  			task->pid, from_kuid(&init_user_ns, task_uid(task)),
> -			task->tgid, task->mm->total_vm, get_mm_rss(task->mm),
> +			task->tgid, task->mm->total_vm, get_mm_rss_wo_shmem(task->mm),
>  			mm_pgtables_bytes(task->mm),
>  			get_mm_counter(task->mm, MM_SWAPENTS),
>  			task->signal->oom_score_adj, task->comm);
> -- 
> 2.6.2
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] mm, oom: OOM killer use rss size without shmem
  2019-02-22  7:10 ` Michal Hocko
@ 2019-02-22 16:19   ` Johannes Weiner
  0 siblings, 0 replies; 3+ messages in thread
From: Johannes Weiner @ 2019-02-22 16:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Junil Lee, linux-mm, linux-kernel, akpm, willy, pasha.tatashin,
	kirill.shutemov, jrdr.linux, dan.j.williams, alexander.h.duyck,
	andreyknvl, arunks, keith.busch, guro, rientjes, penguin-kernel,
	shakeelb, yuzhoujian

On Fri, Feb 22, 2019 at 08:10:01AM +0100, Michal Hocko wrote:
> On Fri 22-02-19 13:37:33, Junil Lee wrote:
> > The oom killer use get_mm_rss() function to estimate how free memory
> > will be reclaimed when the oom killer select victim task.
> > 
> > However, the returned rss size by get_mm_rss() function was changed from
> > "mm, shmem: add internal shmem resident memory accounting" commit.
> > This commit makes the get_mm_rss() return size including SHMEM pages.
> 
> This was actually the case even before eca56ff906bdd because SHMEM was
> just accounted to MM_FILEPAGES so this commit hasn't changed much
> really.
> 
> Besides that we cannot really rule out SHMEM pages simply. They are
> backing MAP_ANON|MAP_SHARED which might be unmapped and freed during the
> oom victim exit. Moreover this is essentially the same as file backed
> pages or even MAP_PRIVATE|MAP_ANON pages. Bothe can be pinned by other
> processes e.g. via private pages via CoW mappings and file pages by
> filesystem or simply mlocked by another process. So this really gross
> evaluation will never be perfect. We would basically have to do exact
> calculation of the freeable memory of each process and that is just not
> feasible.
> 
> That being said, I do not think the patch is an improvement in that
> direction. It just turnes one fuzzy evaluation by another that even
> misses a lot of memory potentially.

You make good points.

I think it's also worth noting that while the OOM killer is ultimately
about freeing memory, the victim algorithm is not about finding the
*optimal* amount of memory to free, but to kill the thing that is most
likely to have put the system into trouble. We're not going for
killing the smallest tasks until we're barely back over the line and
operational again, but instead we're finding the biggest offender to
stop the most likely source of unsustainable allocations. That's why
our metric is called "badness score", and not "freeable" or similar.

So even if a good chunk of the biggest task are tmpfs pages that
aren't necessarily freed upon kill, from a heuristics POV it's still
the best candidate to kill.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-02-22 16:19 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-22  4:37 [PATCH] mm, oom: OOM killer use rss size without shmem Junil Lee
2019-02-22  7:10 ` Michal Hocko
2019-02-22 16:19   ` Johannes Weiner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).