mm-commits.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: arjunroy@google.com, bgeffon@google.com, dancol@google.com,
	hannes@cmpxchg.org, joaodias@google.com, joel@joelfernandes.org,
	mhocko@suse.com, minchan@kernel.org, mm-commits@vger.kernel.org,
	oleksandr@redhat.com, rientjes@google.com, shakeelb@google.com,
	sj38.park@gmail.com, sonnyrao@google.com, sspatil@google.com,
	surenb@google.com, timmurray@google.com, vbabka@suse.cz
Subject: + mm-support-vector-address-ranges-for-process_madvise.patch added to -mm tree
Date: Thu, 23 Apr 2020 15:44:20 -0700	[thread overview]
Message-ID: <20200423224420.KZLkNkOTO%akpm@linux-foundation.org> (raw)
In-Reply-To: <20200420181310.c18b3c0aa4dc5b3e5ec1be10@linux-foundation.org>


The patch titled
     Subject: mm: support vector address ranges for process_madvise
has been added to the -mm tree.  Its filename is
     mm-support-vector-address-ranges-for-process_madvise.patch

This patch should soon appear at
    http://ozlabs.org/~akpm/mmots/broken-out/mm-support-vector-address-ranges-for-process_madvise.patch
and later at
    http://ozlabs.org/~akpm/mmotm/broken-out/mm-support-vector-address-ranges-for-process_madvise.patch

Before you just go and hit "reply", please:
   a) Consider who else should be cc'ed
   b) Prefer to cc a suitable mailing list as well
   c) Ideally: find the original patch on the mailing list and do a
      reply-to-all to that, adding suitable additional cc's

*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***

The -mm tree is included into linux-next and is updated
there every 3-4 working days

------------------------------------------------------
From: Minchan Kim <minchan@kernel.org>
Subject: mm: support vector address ranges for process_madvise

This patch extends a) process_madvise(2) support vector address ranges in
a system call and then b) support the vector address ranges to local
process as well as external process.

Android app has thousands of vmas due to zygote so it's totally waste of
CPU and power if we should call the syscall one by one for each vma. 
(With testing 2000-vma syscall vs 1-vector syscall, it showed 15%
performance improvement.  I think it would be bigger in real practice
because the testing ran very cache friendly environment).

Another potential use case for the vector range is to amortize the cost of
TLB shootdowns for multiple ranges when using MADV_DONTNEED; this could
benefit users like TCP receive zerocopy and malloc implementations.  In
future, we could find more usecases for other advises so let's make it
happens as API since we introduce a new syscall at this moment.  With
that, existing madvise(2) user could replace it with process_madvise(2)
with their own pid if they want to have batch address ranges support
feature.

So finally, the API is as follows,

  ssize_t process_madvise(idtype_t idtype, id_t id,
		const struct iovec *iovec, unsigned long vlen,
                int advice, unsigned long flags);

DESCRIPTION
  The process_madvise() system call is used to give advice or directions
  to the kernel about the address ranges from external process as well as
  local process. It provides the advice to address ranges of process
  described by iovec and vlen. The goal of such advice is to improve system
  or application performance.

  The idtype and id arguments select the target process to be advised as
  follows:

    idtype == P_PID
      select the process whose process ID matches id

    idtype == P_PIDFD
      select the process referred to by the PID file descriptor
      specified in id. (See pidofd_open(2) for further information)

  The pointer iovec points to an array of iovec structures, defined in
  <sys/uio.h> as:

    struct iovec {
    	void *iov_base;		/* starting address */
	size_t iov_len;		/* number of bytes to be advised */
    };

  The iovec describes address ranges beginning at address(iov_base)
  and with size length of bytes(iov_len).

  The vlen represents the number of elements in iovec.

  The advice is indicated in the advice argument, which is one of the
  following at this moment if the target process specified by idtype and
  id is external.

    MADV_COLD
    MADV_PAGEOUT
    MADV_MERGEABLE
    MADV_UNMERGEABLE

  Permission to provide a hint to external process is governed by a
  ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2).

  The process_madvise supports every advice madvise(2) has if target
  process is in same thread group with calling process so user could
  use process_madvise(2) to extend existing madvise(2) to support
  vector address ranges.

RETURN VALUE
  On success, process_madvise() returns the number of bytes advised.
  This return value may be less than the total number of requested
  bytes, if an error occurred. The caller should check return value
  to determine whether a partial advice occurred.

Link: http://lkml.kernel.org/r/20200423145215.72666-2-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Cc: Tim Murray <timmurray@google.com>
Cc: Daniel Colascione <dancol@google.com>
Cc: Sonny Rao <sonnyrao@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: John Dias <joaodias@google.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: SeongJae Park <sj38.park@gmail.com>
Cc: Oleksandr Natalenko <oleksandr@redhat.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Sandeep Patil <sspatil@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 mm/madvise.c |   47 ++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 40 insertions(+), 7 deletions(-)

--- a/mm/madvise.c~mm-support-vector-address-ranges-for-process_madvise
+++ a/mm/madvise.c
@@ -1195,20 +1195,39 @@ SYSCALL_DEFINE3(madvise, unsigned long,
 	return do_madvise(current, current->mm, start, len_in, behavior);
 }
 
-SYSCALL_DEFINE6(process_madvise, int, which, pid_t, upid, unsigned long, start,
-		size_t, len_in, int, behavior, unsigned long, flags)
+static int do_process_madvise(struct task_struct *target_task,
+		struct mm_struct *mm, struct iov_iter *iter, int behavior)
 {
-	int ret;
+	struct iovec iovec;
+	int ret = 0;
+
+	while (iov_iter_count(iter)) {
+		iovec = iov_iter_iovec(iter);
+		ret = do_madvise(target_task, mm, (unsigned long)iovec.iov_base,
+					iovec.iov_len, behavior);
+		if (ret < 0)
+			break;
+		iov_iter_advance(iter, iovec.iov_len);
+	}
+
+	return ret;
+}
+
+SYSCALL_DEFINE6(process_madvise, int, which, pid_t, upid,
+		const struct iovec __user *, vec, unsigned long, vlen,
+		int, behavior, unsigned long, flags)
+{
+	ssize_t ret;
 	struct pid *pid;
 	struct task_struct *task;
 	struct mm_struct *mm;
+	struct iovec iovstack[UIO_FASTIOV];
+	struct iovec *iov = iovstack;
+	struct iov_iter iter;
 
 	if (flags != 0)
 		return -EINVAL;
 
-	if (!process_madvise_behavior_valid(behavior))
-		return -EINVAL;
-
 	switch (which) {
 	case P_PID:
 		if (upid <= 0)
@@ -1236,13 +1255,27 @@ SYSCALL_DEFINE6(process_madvise, int, wh
 		goto put_pid;
 	}
 
+	if (task->mm != current->mm &&
+			!process_madvise_behavior_valid(behavior)) {
+		ret = -EINVAL;
+		goto release_task;
+	}
+
 	mm = mm_access(task, PTRACE_MODE_ATTACH_FSCREDS);
 	if (IS_ERR_OR_NULL(mm)) {
 		ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH;
 		goto release_task;
 	}
 
-	ret = do_madvise(task, mm, start, len_in, behavior);
+	ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter);
+	if (ret >= 0) {
+		size_t total_len = iov_iter_count(&iter);
+
+		ret = do_process_madvise(task, mm, &iter, behavior);
+		if (ret >= 0)
+			ret = total_len - iov_iter_count(&iter);
+		kfree(iov);
+	}
 	mmput(mm);
 release_task:
 	put_task_struct(task);
_

Patches currently in -mm which might be from minchan@kernel.org are

mm-pass-task-and-mm-to-do_madvise.patch
mm-pass-task-and-mm-to-do_madvise-fix.patch
mm-introduce-external-memory-hinting-api.patch
mm-introduce-external-memory-hinting-api-fix.patch
mm-check-fatal-signal-pending-of-target-process.patch
pid-move-pidfd_get_pid-function-to-pidc.patch
mm-support-both-pid-and-pidfd-for-process_madvise.patch
mm-support-vector-address-ranges-for-process_madvise.patch
mm-support-vector-address-ranges-for-process_madvise-fix.patch

  parent reply	other threads:[~2020-04-23 22:44 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-21  1:13 incoming Andrew Morton
2020-04-21  1:13 ` [patch 01/15] sh: fix build error in mm/init.c Andrew Morton
2020-04-21  1:13 ` [patch 02/15] slub: avoid redzone when choosing freepointer location Andrew Morton
2020-04-21  1:13 ` [patch 03/15] mm/userfaultfd: disable userfaultfd-wp on x86_32 Andrew Morton
2020-04-21  1:13 ` [patch 04/15] MAINTAINERS: add an entry for kfifo Andrew Morton
2020-04-21  1:13 ` [patch 05/15] mm/hugetlb: fix a addressing exception caused by huge_pte_offset Andrew Morton
2020-04-21  1:13 ` [patch 06/15] mm, gup: return EINTR when gup is interrupted by fatal signals Andrew Morton
2020-04-21  1:13 ` [patch 07/15] checkpatch: fix a typo in the regex for $allocFunctions Andrew Morton
2020-04-21  1:14 ` [patch 08/15] tools/build: tweak unused value workaround Andrew Morton
2020-04-21  1:14 ` [patch 09/15] mm/ksm: fix NULL pointer dereference when KSM zero page is enabled Andrew Morton
2020-04-21  1:14 ` [patch 10/15] mm/shmem: fix build without THP Andrew Morton
2020-04-21  1:14 ` [patch 11/15] vmalloc: fix remap_vmalloc_range() bounds checks Andrew Morton
2020-04-21  1:14 ` [patch 12/15] shmem: fix possible deadlocks on shmlock_user_lock Andrew Morton
2020-04-21  1:14 ` [patch 13/15] mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path Andrew Morton
2020-04-21  1:14 ` [patch 14/15] coredump: fix null pointer dereference on coredump Andrew Morton
2020-04-21  1:14 ` [patch 15/15] tools/vm: fix cross-compile build Andrew Morton
2020-04-21  2:00 ` + mm-memory_hotplug-refrain-from-adding-memory-into-an-impossible-node.patch added to -mm tree Andrew Morton
2020-04-21  2:48 ` + x86-mm-define-mm_p4d_folded.patch " Andrew Morton
2020-04-21  2:52 ` + mm-debug-add-tests-validating-architecture-page-table-helpers-v17.patch " Andrew Morton
2020-04-21  2:59 ` + mm-mmapc-add-more-sanity-checks-to-get_unmapped_area.patch " Andrew Morton
2020-04-21  2:59 ` + mm-mmapc-do-not-allow-mappings-outside-of-allowed-limits.patch " Andrew Morton
2020-04-21  3:07 ` + initrdmem=-option-to-specify-initrd-physical-address-checkpatch-fixes.patch " Andrew Morton
2020-04-21  3:58 ` + initrdmem=-option-to-specify-initrd-physical-address.patch " Andrew Morton
2020-04-21  5:43 ` mmotm 2020-04-20-22-43 uploaded Andrew Morton
2020-04-22  1:36 ` + mm-swapfilec-found_free-could-be-represented-by-tmp-max.patch added to -mm tree Andrew Morton
2020-04-22  1:36 ` + mm-swapfilec-tmp-is-always-smaller-than-max.patch " Andrew Morton
2020-04-22  1:36 ` + mm-swapfilec-omit-a-duplicate-code-by-compare-tmp-and-max-first.patch " Andrew Morton
2020-04-23 22:36 ` + kasan-initialise-array-in-kasan_memcmp-test.patch " Andrew Morton
2020-04-23 22:38 ` + kvm-svm-change-flag-passed-to-gup-fast-in-sev_pin_memory.patch " Andrew Morton
2020-04-23 22:41 ` + mm-pass-task-and-mm-to-do_madvise-fix.patch " Andrew Morton
2020-04-23 22:44 ` Andrew Morton [this message]
2020-04-23 22:44 ` + mm-support-vector-address-ranges-for-process_madvise-fix.patch " Andrew Morton
2020-04-23 22:48 ` + kasan-stop-tests-being-eliminated-as-dead-code-with-fortify_source.patch " Andrew Morton
2020-04-23 22:48 ` + stringh-fix-incompatibility-between-fortify_source-and-kasan.patch " Andrew Morton
2020-04-23 23:03 ` + powerpc-add-support-for-folded-p4d-page-tables-fix.patch " Andrew Morton
2020-04-23 23:09 ` [folded-merged] memcg-optimize-memorynuma_stat-like-memorystat-fix.patch removed from " Andrew Morton
2020-04-23 23:32 ` + slub-remove-userspace-notifier-for-cache-add-remove.patch added to " Andrew Morton
2020-04-23 23:35 ` + ocfs2-mount-shared-volume-without-ha-stack.patch " Andrew Morton
2020-04-24  0:29 ` + mm-memory_hotplug-handle-memblocks-only-with-config_arch_keep_memblock.patch " Andrew Morton
2020-04-24  1:17 ` + mm-return-true-in-cpupid_pid_unset.patch " Andrew Morton
2020-04-24  1:20 ` + kernel-better-document-the-use_mm-unuse_mm-api-contract-v2-fix.patch " Andrew Morton
2020-04-24  1:40 ` + mm-thp-rename-pmd_mknotpresent-as-pmd_mkinvalid-v2.patch " Andrew Morton
2020-04-24  1:47 ` + ipc-convert-ipcs_idr-to-xarray-update.patch " Andrew Morton
     [not found]   ` <20200605195848.GB5393@lca.pw>
     [not found]     ` <20200605201134.GJ19604@bombadil.infradead.org>
2020-06-05 21:20       ` Andrew Morton
2020-12-30 15:44   ` Manfred Spraul
2020-04-24  2:06 ` + powerpc-spufs-simplify-spufs-core-dumping.patch " Andrew Morton
2020-04-24  2:06 ` + signal-factor-copy_siginfo_to_external32-from-copy_siginfo_to_user32.patch " Andrew Morton
2020-04-24  2:06 ` + binfmt_elf-femove-the-set_fs-in-fill_siginfo_note.patch " Andrew Morton
2020-04-24  2:06 ` + binfmt_elf-remove-the-set_fskernel_ds-in-elf_core_dump.patch " Andrew Morton
2020-04-24  2:06 ` + binfmt_elf_fdpic-remove-the-set_fskernel_ds-in-elf_fdpic_core_dump.patch " Andrew Morton
2020-04-24  2:06 ` + exec-simplify-the-copy_strings_kernel-calling-convention.patch " Andrew Morton
2020-04-24  2:06 ` + exec-open-code-copy_string_kernel.patch " Andrew Morton
2020-04-24  3:24 ` + add-kernel-config-option-for-twisting-kernel-behavior.patch " Andrew Morton
2020-04-24  3:24 ` + twist-allow-disabling-k_spec-function-in-drivers-tty-vt-keyboardc.patch " Andrew Morton
2020-04-24  3:24 ` + twist-add-option-for-selecting-twist-options-for-syzkallers-testing.patch " Andrew Morton
2020-04-24  3:32 ` + eventpoll-fix-missing-wakeup-for-ovflist-in-ep_poll_callback.patch " Andrew Morton
2020-04-24  3:49 ` [obsolete] linux-next-rejects.patch removed from " Andrew Morton
2020-04-24  3:51 ` + mips-mm-add-page-soft-dirty-tracking.patch added to " Andrew Morton
2020-04-24 23:36 ` + mm-memory_hotplug-set-node_start_pfn-of-hotadded-pgdat-to-0.patch " Andrew Morton
2020-04-26  0:09 ` + mm-switch-the-test_vmalloc-module-to-use-__vmalloc_node-fix-fix.patch " Andrew Morton
2020-04-26  0:17 ` + mm-hugetlb-avoid-unnecessary-check-on-pud-and-pmd-entry-in-huge_pte_offset.patch " Andrew Morton
2020-04-26  0:29 ` + eventpoll-fix-missing-wakeup-for-ovflist-in-ep_poll_callback-v2.patch " Andrew Morton
2020-04-26  0:41 ` [withdrawn] kasan-initialise-array-in-kasan_memcmp-test.patch removed from " Andrew Morton
2020-04-26  0:41 ` + kasan-stop-tests-being-eliminated-as-dead-code-with-fortify_source-v4.patch added to " Andrew Morton
2020-04-26  0:48 ` + checkpatch-test-git_dir-changes.patch " Andrew Morton
2020-04-26  1:06 ` + mm-add-debug_wx-support.patch " Andrew Morton
2020-04-26  1:06 ` + riscv-support-debug_wx.patch " Andrew Morton
2020-04-26  1:06 ` + riscv-support-debug_wx-fix.patch " Andrew Morton
2020-04-26  1:06 ` + x86-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch " Andrew Morton
2020-04-26  1:06 ` + arm64-mm-use-arch_has_debug_wx-instead-of-arch-defined.patch " Andrew Morton
2020-04-26  1:09 ` [folded-merged] initrdmem=-option-to-specify-initrd-physical-address-checkpatch-fixes.patch removed from " Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200423224420.KZLkNkOTO%akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=arjunroy@google.com \
    --cc=bgeffon@google.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=joaodias@google.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=mm-commits@vger.kernel.org \
    --cc=oleksandr@redhat.com \
    --cc=rientjes@google.com \
    --cc=shakeelb@google.com \
    --cc=sj38.park@gmail.com \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).