linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: surenb@google.com
Cc: akpm@linux-foundation.org, mhocko@kernel.org, mhocko@suse.com,
	rientjes@google.com, willy@infradead.org, hannes@cmpxchg.org,
	guro@fb.com, riel@surriel.com, minchan@kernel.org,
	christian@brauner.io, oleg@redhat.com, timmurray@google.com,
	linux-api@vger.kernel.org, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: [PATCH 2/2] mm/madvise: add process_madvise MADV_DONTNEER support
Date: Mon, 23 Nov 2020 21:39:43 -0800	[thread overview]
Message-ID: <20201124053943.1684874-3-surenb@google.com> (raw)
In-Reply-To: <20201124053943.1684874-1-surenb@google.com>

In modern systems it's not unusual to have a system component monitoring
memory conditions of the system and tasked with keeping system memory
pressure under control. One way to accomplish that is to kill
non-essential processes to free up memory for more important ones.
Examples of this are Facebook's OOM killer daemon called oomd and
Android's low memory killer daemon called lmkd.
For such system component it's important to be able to free memory
quickly and efficiently. Unfortunately the time process takes to free
up its memory after receiving a SIGKILL might vary based on the state
of the process (uninterruptible sleep), size and OPP level of the core
the process is running.
In such situation it is desirable to be able to free up the memory of the
process being killed in a more controlled way.
Enable MADV_DONTNEED to be used with process_madvise when applied to a
dying process to reclaim its memory. This would allow userspace system
components like oomd and lmkd to free memory of the target process in
a more predictable way.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 mm/madvise.c | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

diff --git a/mm/madvise.c b/mm/madvise.c
index 1aa074a46524..11306534369e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -29,6 +29,7 @@
 #include <linux/swapops.h>
 #include <linux/shmem_fs.h>
 #include <linux/mmu_notifier.h>
+#include <linux/oom.h>
 
 #include <asm/tlb.h>
 
@@ -995,6 +996,18 @@ process_madvise_behavior_valid(int behavior)
 	switch (behavior) {
 	case MADV_COLD:
 	case MADV_PAGEOUT:
+	case MADV_DONTNEED:
+		return true;
+	default:
+		return false;
+	}
+}
+
+static bool madvise_destructive(int behavior)
+{
+	switch (behavior) {
+	case MADV_DONTNEED:
+	case MADV_FREE:
 		return true;
 	default:
 		return false;
@@ -1006,6 +1019,10 @@ static bool can_range_madv_lru_vma(struct vm_area_struct *vma, int behavior)
 	if (!can_madv_lru_vma(vma))
 		return false;
 
+	/* For destructive madvise skip shared file-backed VMAs */
+	if (madvise_destructive(behavior))
+		return vma_is_anonymous(vma) || !(vma->vm_flags & VM_SHARED);
+
 	return true;
 }
 
@@ -1239,6 +1256,23 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
 		goto release_task;
 	}
 
+	if (madvise_destructive(behavior)) {
+		/* Allow destructive madvise only on a dying processes */
+		if (!signal_group_exit(task->signal)) {
+			ret = -EINVAL;
+			goto release_mm;
+		}
+		/* Ensure no competition with OOM-killer to avoid contention */
+		if (unlikely(mm_is_oom_victim(mm)) ||
+		    unlikely(test_bit(MMF_OOM_SKIP, &mm->flags))) {
+			/* Already being reclaimed */
+			ret = 0;
+			goto release_mm;
+		}
+		/* Mark mm as unstable */
+		set_bit(MMF_UNSTABLE, &mm->flags);
+	}
+
 	/*
 	 * For range madvise only the entire address space is supported for now
 	 * and input iovec is ignored.
-- 
2.29.2.454.gaff20da3a2-goog


  parent reply	other threads:[~2020-11-24  5:40 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-24  5:39 [PATCH 0/2] userspace memory reaping using process_madvise Suren Baghdasaryan
2020-11-24  5:39 ` [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range Suren Baghdasaryan
2020-11-25 23:13   ` Minchan Kim
2020-11-25 23:23     ` Suren Baghdasaryan
2020-11-25 23:43       ` Minchan Kim
2020-11-30 19:01         ` Suren Baghdasaryan
2020-12-08  7:23           ` Suren Baghdasaryan
2020-12-11 20:27     ` Jann Horn
2020-12-11 23:01       ` Minchan Kim
2020-12-12  0:16         ` Jann Horn
2020-12-22 13:44       ` Christoph Hellwig
2020-12-22 17:48         ` Suren Baghdasaryan
2020-12-23  4:09           ` Suren Baghdasaryan
2020-12-23  7:57           ` Christoph Hellwig
2020-12-23 17:32             ` Suren Baghdasaryan
2020-11-24  5:39 ` Suren Baghdasaryan [this message]
2020-11-24 13:42   ` [PATCH 2/2] mm/madvise: add process_madvise MADV_DONTNEER support Oleg Nesterov
2020-11-24 16:42     ` Suren Baghdasaryan
2020-12-08 23:40   ` Jann Horn
2020-12-08 23:59     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201124053943.1684874-3-surenb@google.com \
    --to=surenb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=christian@brauner.io \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=kernel-team@android.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=oleg@redhat.com \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=timmurray@google.com \
    --cc=willy@infradead.org \
    --subject='Re: [PATCH 2/2] mm/madvise: add process_madvise MADV_DONTNEER support' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).