Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Minchan Kim <minchan@kernel.org>
To: SeongJae Park <sj38.park@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	linux-api@vger.kernel.org, oleksandr@redhat.com,
	Suren Baghdasaryan <surenb@google.com>,
	Tim Murray <timmurray@google.com>,
	Daniel Colascione <dancol@google.com>,
	Sandeep Patil <sspatil@google.com>,
	Sonny Rao <sonnyrao@google.com>,
	Brian Geffon <bgeffon@google.com>, Michal Hocko <mhocko@suse.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Shakeel Butt <shakeelb@google.com>,
	John Dias <joaodias@google.com>
Subject: Re: [PATCH 2/4] mm: introduce external memory hinting API
Date: Mon, 13 Jan 2020 10:02:30 -0800
Message-ID: <20200113180230.GA110363@google.com> (raw)
In-Reply-To: <20200111073452.25182-1-sj38.park@gmail.com>

On Sat, Jan 11, 2020 at 08:34:52AM +0100, SeongJae Park wrote:
> On Fri, 10 Jan 2020 13:34:31 -0800 Minchan Kim <minchan@kernel.org> wrote:
> 
> > There are usecases that System Management Software(SMS) want to give
> > a memory hint to other processes because it's not known to the
> > application. In the case of Android, ActivityManagerService daemon
> > manges app's life cycle and that daemon must be able to initiate
> > reclaim on its own without any app involvement.
> > 
> > To solve the issue, this patch introduces new syscall process_madvise(2).
> > It uses pidfd of an external processs to give the hint.
> > 
> >  int process_madvise(int pidfd, void *addr, size_t length, int advise,
> > 			unsigned long flag);
> > 
> > Since it could affect other process's address range, only privileged
> > process(CAP_SYS_PTRACE) or something else(e.g., being the same UID)
> > gives it the right to ptrace the process could use it successfully.
> > The flag argument is reserved for future use if we need to extend the
> > API.
> > 
> > Supporting all hints madvise has/will supported/support to process_madvise
> > is rather risky. Because we are not sure all hints make sense from external
> > process and implementation for the hint may rely on the caller being
> > in the current context so it could be error-prone. Thus, I just limited
> > hints as MADV_[COLD|PAGEOUT] in this patch.
> > 
> > If someone want to add other hints, we could hear hear the usecase and
> > review it for each hint. It's more safe for maintainace rather than
> > introducing a buggy syscall but hard to fix it later.
> > 
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  arch/alpha/kernel/syscalls/syscall.tbl      |  1 +
> >  arch/arm/tools/syscall.tbl                  |  1 +
> >  arch/arm64/include/asm/unistd.h             |  2 +-
> >  arch/arm64/include/asm/unistd32.h           |  2 +
> >  arch/ia64/kernel/syscalls/syscall.tbl       |  1 +
> >  arch/m68k/kernel/syscalls/syscall.tbl       |  1 +
> >  arch/microblaze/kernel/syscalls/syscall.tbl |  1 +
> >  arch/mips/kernel/syscalls/syscall_n32.tbl   |  1 +
> >  arch/mips/kernel/syscalls/syscall_n64.tbl   |  1 +
> >  arch/parisc/kernel/syscalls/syscall.tbl     |  1 +
> >  arch/powerpc/kernel/syscalls/syscall.tbl    |  1 +
> >  arch/s390/kernel/syscalls/syscall.tbl       |  1 +
> >  arch/sh/kernel/syscalls/syscall.tbl         |  1 +
> >  arch/sparc/kernel/syscalls/syscall.tbl      |  1 +
> >  arch/x86/entry/syscalls/syscall_32.tbl      |  1 +
> >  arch/x86/entry/syscalls/syscall_64.tbl      |  1 +
> >  arch/xtensa/kernel/syscalls/syscall.tbl     |  1 +
> >  include/linux/syscalls.h                    |  2 +
> >  include/uapi/asm-generic/unistd.h           |  5 +-
> >  kernel/sys_ni.c                             |  1 +
> >  mm/madvise.c                                | 64 +++++++++++++++++++++
> >  21 files changed, 89 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
> > index e56950f23b49..776c61803315 100644
> > --- a/arch/alpha/kernel/syscalls/syscall.tbl
> > +++ b/arch/alpha/kernel/syscalls/syscall.tbl
> > @@ -477,3 +477,4 @@
> >  # 545 reserved for clone3
> >  546	common	watch_devices			sys_watch_devices
> >  547	common	openat2				sys_openat2
> > +548	common	process_madvise			sys_process_madvise
> > diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
> > index 7fb2f4d59210..a43381542276 100644
> > --- a/arch/arm/tools/syscall.tbl
> > +++ b/arch/arm/tools/syscall.tbl
> > @@ -451,3 +451,4 @@
> >  435	common	clone3				sys_clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +438	common	process_madvise			sys_process_madvise
> > diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
> > index 8aa00ccb0b96..b722e47377a5 100644
> > --- a/arch/arm64/include/asm/unistd.h
> > +++ b/arch/arm64/include/asm/unistd.h
> > @@ -38,7 +38,7 @@
> >  #define __ARM_NR_compat_set_tls		(__ARM_NR_COMPAT_BASE + 5)
> >  #define __ARM_NR_COMPAT_END		(__ARM_NR_COMPAT_BASE + 0x800)
> >  
> > -#define __NR_compat_syscalls		438
> > +#define __NR_compat_syscalls		439
> >  #endif
> >  
> >  #define __ARCH_WANT_SYS_CLONE
> > diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
> > index 31f0ce25719e..5c82557d408f 100644
> > --- a/arch/arm64/include/asm/unistd32.h
> > +++ b/arch/arm64/include/asm/unistd32.h
> > @@ -883,6 +883,8 @@ __SYSCALL(__NR_clone3, sys_clone3)
> >  __SYSCALL(__NR_watch_devices, sys_watch_devices)
> >  #define __NR_openat2 437
> >  __SYSCALL(__NR_openat2, sys_openat2)
> > +#define __NR_openat2 438
> 
> Shouldn't this be '#define __NR_process_madvise 438'?
> 
> > +__SYSCALL(__NR_process_madvise, process_madvise)
> >  
> >  /*
> >   * Please add new compat syscalls above this comment and update
> > diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
> > index b9aa59931905..c156abc9a298 100644
> > --- a/arch/ia64/kernel/syscalls/syscall.tbl
> > +++ b/arch/ia64/kernel/syscalls/syscall.tbl
> > @@ -358,3 +358,4 @@
> >  # 435 reserved for clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +438	common	process_madvise			sys_process_madvise
> > diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
> > index 868c1ef89d35..5b6034b6650f 100644
> > --- a/arch/m68k/kernel/syscalls/syscall.tbl
> > +++ b/arch/m68k/kernel/syscalls/syscall.tbl
> > @@ -437,3 +437,4 @@
> >  # 435 reserved for clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +438	common	process_madvise			sys_process_madvise
> > diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
> > index 544b4cef18b3..4bef584af09c 100644
> > --- a/arch/microblaze/kernel/syscalls/syscall.tbl
> > +++ b/arch/microblaze/kernel/syscalls/syscall.tbl
> > @@ -443,3 +443,4 @@
> >  435	common	clone3				sys_clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +438	common	process_madvise			sys_process_madvise
> > diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
> > index 05e8aee5dae7..94fbd0fcccce 100644
> > --- a/arch/mips/kernel/syscalls/syscall_n32.tbl
> > +++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
> > @@ -376,3 +376,4 @@
> >  435	n32	clone3				__sys_clone3
> >  436	n32	watch_devices			sys_watch_devices
> >  437	n32	openat2				sys_openat2
> > +437	n32	process_madivse			sys_process_madvise
> 
> Shouldn't the number for the 'process_madvise' be '438' instead of '437'?
> 
> > diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
> > index 24d6c01328fb..4e6982c429d5 100644
> > --- a/arch/mips/kernel/syscalls/syscall_n64.tbl
> > +++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
> > @@ -352,3 +352,4 @@
> >  435	n64	clone3				__sys_clone3
> >  436	n64	watch_devices			sys_watch_devices
> >  437	n64	openat2				sys_openat2
> > +437	n64	process_madvise			sys_process_madvise
> 
> 438?  Same for below 5 changes.
> 
> > diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
> > index 4b5f77a4e1a2..3aa990caf9dc 100644
> > --- a/arch/parisc/kernel/syscalls/syscall.tbl
> > +++ b/arch/parisc/kernel/syscalls/syscall.tbl
> > @@ -435,3 +435,4 @@
> >  435	common	clone3				sys_clone3_wrapper
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +437	common	process_madvise			sys_process_madvise
> > diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
> > index 9716dc85a517..30e727a23f33 100644
> > --- a/arch/powerpc/kernel/syscalls/syscall.tbl
> > +++ b/arch/powerpc/kernel/syscalls/syscall.tbl
> > @@ -519,3 +519,4 @@
> >  435	nospu	clone3				ppc_clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +437	common	process_madvise			sys_process_madvise
> > diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
> > index 7da330f8b03e..75722e5ff496 100644
> > --- a/arch/s390/kernel/syscalls/syscall.tbl
> > +++ b/arch/s390/kernel/syscalls/syscall.tbl
> > @@ -440,3 +440,4 @@
> >  435  common	clone3			sys_clone3			sys_clone3
> >  436  common	watch_devices		sys_watch_devices		sys_watch_devices
> >  437  common	openat2			sys_openat2			sys_openat2
> > +437  common	process_madvise		sys_process_madvise		sys_process_madvise
> > diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
> > index bb7e68e25337..7d7bc7befad3 100644
> > --- a/arch/sh/kernel/syscalls/syscall.tbl
> > +++ b/arch/sh/kernel/syscalls/syscall.tbl
> > @@ -440,3 +440,4 @@
> >  # 435 reserved for clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +437	common	process_madvise			sys_process_madvise
> > diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
> > index 646a1fad7218..581d331ff62f 100644
> > --- a/arch/sparc/kernel/syscalls/syscall.tbl
> > +++ b/arch/sparc/kernel/syscalls/syscall.tbl
> > @@ -483,3 +483,4 @@
> >  # 435 reserved for clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2			sys_openat2
> > +437	common	process_madvise		sys_process_madvise
> > diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> > index 57c53acee290..76a2c266fe7e 100644
> > --- a/arch/x86/entry/syscalls/syscall_32.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> > @@ -442,3 +442,4 @@
> >  435	i386	clone3			sys_clone3			__ia32_sys_clone3
> >  436	i386	watch_devices		sys_watch_devices		__ia32_sys_watch_devices
> >  437	i386	openat2			sys_openat2			__ia32_sys_openat2
> > +438	i386	process_madvise		sys_process_madvise		__ia32_sys_process_madvise
> > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
> > index 1dd8d21f6500..b697cd8620cb 100644
> > --- a/arch/x86/entry/syscalls/syscall_64.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_64.tbl
> > @@ -359,6 +359,7 @@
> >  435	common	clone3			__x64_sys_clone3/ptregs
> >  436	common	watch_devices		__x64_sys_watch_devices
> >  437	common	openat2			__x64_sys_openat2
> > +438	common	process_madvise		__x64_sys_process_madvise
> >  
> >  #
> >  # x32-specific system call numbers start at 512 to avoid cache impact
> > diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
> > index 0f48ab7bd75b..2e9813ecfd7d 100644
> > --- a/arch/xtensa/kernel/syscalls/syscall.tbl
> > +++ b/arch/xtensa/kernel/syscalls/syscall.tbl
> > @@ -408,3 +408,4 @@
> >  435	common	clone3				sys_clone3
> >  436	common	watch_devices			sys_watch_devices
> >  437	common	openat2				sys_openat2
> > +438	common	process_madvise			sys_process_madvise
> > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> > index 433c8c85636e..1b58a11ff49f 100644
> > --- a/include/linux/syscalls.h
> > +++ b/include/linux/syscalls.h
> > @@ -877,6 +877,8 @@ asmlinkage long sys_munlockall(void);
> >  asmlinkage long sys_mincore(unsigned long start, size_t len,
> >  				unsigned char __user * vec);
> >  asmlinkage long sys_madvise(unsigned long start, size_t len, int behavior);
> > +asmlinkage long sys_process_madvise(int pidfd, unsigned long start,
> > +			size_t len, int behavior, unsigned long flags);
> >  asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size,
> >  			unsigned long prot, unsigned long pgoff,
> >  			unsigned long flags);
> > diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
> > index 33f3856a9c3c..4bcd8d366f38 100644
> > --- a/include/uapi/asm-generic/unistd.h
> > +++ b/include/uapi/asm-generic/unistd.h
> > @@ -856,8 +856,11 @@ __SYSCALL(__NR_watch_devices, sys_watch_devices)
> >  #define __NR_openat2 437
> >  __SYSCALL(__NR_openat2, sys_openat2)
> >  
> > +#define __NR_openat2 438
> 
> Shouldn't this be '#define __NR_process_madvise 438'?
> 

Hi SeongJae,

I fixed all you pointed out.

Thanks for the review.


  reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-10 21:34 [PATCH 0/4] introduce memory hinting API for external process Minchan Kim
2020-01-10 21:34 ` [PATCH 1/4] mm: factor out madvise's core functionality Minchan Kim
2020-01-11  7:37   ` SeongJae Park
2020-01-13 18:11     ` Minchan Kim
2020-01-13 18:22       ` SeongJae Park
2020-01-10 21:34 ` [PATCH 2/4] mm: introduce external memory hinting API Minchan Kim
2020-01-11  7:34   ` SeongJae Park
2020-01-13 18:02     ` Minchan Kim [this message]
2020-01-13  8:47   ` Kirill Tkhai
2020-01-13 10:42     ` Christian Brauner
2020-01-13 18:44       ` Minchan Kim
2020-01-13 19:10         ` Christian Brauner
2020-01-13 19:27           ` Daniel Colascione
2020-01-13 20:42             ` Christian Brauner
2020-01-13 21:04               ` Daniel Colascione
2020-01-14 19:20                 ` Christian Brauner
2020-01-14 18:59           ` Minchan Kim
2020-01-14 19:22             ` Christian Brauner
2020-01-13 18:39     ` Minchan Kim
2020-01-13 19:18     ` Daniel Colascione
2020-01-14  8:39       ` Kirill Tkhai
2020-01-14 19:12         ` Minchan Kim
2020-01-15  9:38           ` Kirill Tkhai
2020-01-10 21:34 ` [PATCH 3/4] mm/madvise: employ mmget_still_valid for write lock Minchan Kim
2020-01-10 21:34 ` [PATCH 4/4] mm/madvise: allow KSM hints for remote API Minchan Kim
2020-01-11  7:42   ` SeongJae Park

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200113180230.GA110363@google.com \
    --to=minchan@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=bgeffon@google.com \
    --cc=dancol@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=joaodias@google.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=oleksandr@redhat.com \
    --cc=shakeelb@google.com \
    --cc=sj38.park@gmail.com \
    --cc=sonnyrao@google.com \
    --cc=sspatil@google.com \
    --cc=surenb@google.com \
    --cc=timmurray@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git