From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7154C04AAF for ; Mon, 20 May 2019 03:53:29 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 91EF1213F2 for ; Mon, 20 May 2019 03:53:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1558324409; bh=dv3y6HsWBQprlj8mDLHpt7KRdeOba0U2gSSiYYLlwKM=; h=From:To:Cc:Subject:Date:In-Reply-To:References:List-ID:From; b=IYekvJ72UZCEVffAkWZycMeR6fJ/qB03hH5f7pAnsXv0/8eRIcXT//0tXH8IlGWN0 3LPdh6nu3lI0cIozX6vLqGW7nilxjoKxtm4MFs2pswNYCtTj3f5kgJOy2UZ32X9tMr 63OeSIqeFb0AG3P2kFOAevytxEnxaIhZxFCzXN0k= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730319AbfETDx2 (ORCPT ); Sun, 19 May 2019 23:53:28 -0400 Received: from mail-pl1-f196.google.com ([209.85.214.196]:41790 "EHLO mail-pl1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726052AbfETDx1 (ORCPT ); Sun, 19 May 2019 23:53:27 -0400 Received: by mail-pl1-f196.google.com with SMTP id f12so6036772plt.8 for ; Sun, 19 May 2019 20:53:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=tBNcU7dtCYU00dN9icFSaU4lVFSngUnu9jAK5Jk7lkE=; b=YQSNJCcKM7WviKLmz7cKDDQ551lYaqh3ka6GsuFVnMdhbAJcv5YeKKfABVLz6AGuTW lFwJDX545/UN4fmoPbQeKDAZHe7/rEVNOLkNgOXnk53Csp4rl3Oqoo0kakKoxaazkCvK VE3q0cT/ObPibPKiSEqZT2szzLJ0aonUfjZNoXz0b8B6u5vrQ9WKCNKjliUntgNQCH1m Zo3b+x+OCfxBYocbwx2ZWCMM9CR6/OqkPVpXyc4meEXyXm8MM3GplEZqHq/yOb81G/8y LBK0MmgHyiXKvDiNOpYqZgcuTxYGuGePMZ2bth3KXF8BznuDHuYyyQT7YwG2mvugxbqV P8gg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=tBNcU7dtCYU00dN9icFSaU4lVFSngUnu9jAK5Jk7lkE=; b=k8QPvVveolUDiSU8Le2WQzLNhd4pD2O7asab1n8VkepiNI7yWRVawk/wxr7FJKb/zl Gz6P2d0xJsoib8RMOoUz80BfblB0uQzqG+xj3af9rfriLdw1Os+r6/LTUa7VmosHbvDn joKHD9es2wQ0iHiRaMDyjG3P1tdwClEiD86GWiCTWCx92hny8vzHzz08AG0m24Um25K8 X5tJjotfeMsKNfQXAuuH1MwQBXAHt2jET3wY6lLHhrAVX+e2Z2L2SSJwShagrlA0JXEY Smlzhem8lePPoYB+d4Obv2bp/GKAu+7sv13Kidb/PQrGbAY7MNiqm9TZT+PcADrs4NV1 11oQ== X-Gm-Message-State: APjAAAVGrDXNTxiXAE5rQrstkeRTDBPXVmOBiLDyypQhDFRc1MfWmEsv 0jHT6emHsJnYpDsCjoIrwew= X-Google-Smtp-Source: APXvYqz0+fnAnbc+xIx6Hyu1xzjAmscoSixfuxLRBCh1WPSoZBvSNF0PxhRQ0cIpCPfHKg1QdL/lIA== X-Received: by 2002:a17:902:b490:: with SMTP id y16mr44075401plr.161.1558324406751; Sun, 19 May 2019 20:53:26 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id x66sm3312779pfx.139.2019.05.19.20.53.22 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 19 May 2019 20:53:25 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: LKML , linux-mm , Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , Minchan Kim Subject: [RFC 5/7] mm: introduce external memory hinting API Date: Mon, 20 May 2019 12:52:52 +0900 Message-Id: <20190520035254.57579-6-minchan@kernel.org> X-Mailer: git-send-email 2.21.0.1020.gf2820cf01a-goog In-Reply-To: <20190520035254.57579-1-minchan@kernel.org> References: <20190520035254.57579-1-minchan@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org There is some usecase that centralized userspace daemon want to give a memory hint like MADV_[COOL|COLD] to other process. Android's ActivityManagerService is one of them. It's similar in spirit to madvise(MADV_WONTNEED), but the information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces new syscall process_madvise(2) which works based on pidfd so it could give a hint to the exeternal process. int process_madvise(int pidfd, void *addr, size_t length, int advise); All advises madvise provides can be supported in process_madvise, too. Since it could affect other process's address range, only privileged process(CAP_SYS_PTRACE) or something else(e.g., being the same UID) gives it the right to ptrrace the process could use it successfully. Please suggest better idea if you have other idea about the permission. * from v1r1 * use ptrace capability - surenb, dancol Signed-off-by: Minchan Kim --- arch/x86/entry/syscalls/syscall_32.tbl | 1 + arch/x86/entry/syscalls/syscall_64.tbl | 1 + include/linux/proc_fs.h | 1 + include/linux/syscalls.h | 2 ++ include/uapi/asm-generic/unistd.h | 2 ++ kernel/signal.c | 2 +- kernel/sys_ni.c | 1 + mm/madvise.c | 45 ++++++++++++++++++++++++++ 8 files changed, 54 insertions(+), 1 deletion(-) diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl index 4cd5f982b1e5..5b9dd55d6b57 100644 --- a/arch/x86/entry/syscalls/syscall_32.tbl +++ b/arch/x86/entry/syscalls/syscall_32.tbl @@ -438,3 +438,4 @@ 425 i386 io_uring_setup sys_io_uring_setup __ia32_sys_io_uring_setup 426 i386 io_uring_enter sys_io_uring_enter __ia32_sys_io_uring_enter 427 i386 io_uring_register sys_io_uring_register __ia32_sys_io_uring_register +428 i386 process_madvise sys_process_madvise __ia32_sys_process_madvise diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl index 64ca0d06259a..0e5ee78161c9 100644 --- a/arch/x86/entry/syscalls/syscall_64.tbl +++ b/arch/x86/entry/syscalls/syscall_64.tbl @@ -355,6 +355,7 @@ 425 common io_uring_setup __x64_sys_io_uring_setup 426 common io_uring_enter __x64_sys_io_uring_enter 427 common io_uring_register __x64_sys_io_uring_register +428 common process_madvise __x64_sys_process_madvise # # x32-specific system call numbers start at 512 to avoid cache impact diff --git a/include/linux/proc_fs.h b/include/linux/proc_fs.h index 52a283ba0465..f8545d7c5218 100644 --- a/include/linux/proc_fs.h +++ b/include/linux/proc_fs.h @@ -122,6 +122,7 @@ static inline struct pid *tgid_pidfd_to_pid(const struct file *file) #endif /* CONFIG_PROC_FS */ +extern struct pid *pidfd_to_pid(const struct file *file); struct net; static inline struct proc_dir_entry *proc_net_mkdir( diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index e2870fe1be5b..21c6c9a62006 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -872,6 +872,8 @@ asmlinkage long sys_munlockall(void); asmlinkage long sys_mincore(unsigned long start, size_t len, unsigned char __user * vec); asmlinkage long sys_madvise(unsigned long start, size_t len, int behavior); +asmlinkage long sys_process_madvise(int pid_fd, unsigned long start, + size_t len, int behavior); asmlinkage long sys_remap_file_pages(unsigned long start, unsigned long size, unsigned long prot, unsigned long pgoff, unsigned long flags); diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h index dee7292e1df6..7ee82ce04620 100644 --- a/include/uapi/asm-generic/unistd.h +++ b/include/uapi/asm-generic/unistd.h @@ -832,6 +832,8 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) #define __NR_io_uring_register 427 __SYSCALL(__NR_io_uring_register, sys_io_uring_register) +#define __NR_process_madvise 428 +__SYSCALL(__NR_process_madvise, sys_process_madvise) #undef __NR_syscalls #define __NR_syscalls 428 diff --git a/kernel/signal.c b/kernel/signal.c index 1c86b78a7597..04e75daab1f8 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -3620,7 +3620,7 @@ static int copy_siginfo_from_user_any(kernel_siginfo_t *kinfo, siginfo_t *info) return copy_siginfo_from_user(kinfo, info); } -static struct pid *pidfd_to_pid(const struct file *file) +struct pid *pidfd_to_pid(const struct file *file) { if (file->f_op == &pidfd_fops) return file->private_data; diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c index 4d9ae5ea6caf..5277421795ab 100644 --- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -278,6 +278,7 @@ COND_SYSCALL(mlockall); COND_SYSCALL(munlockall); COND_SYSCALL(mincore); COND_SYSCALL(madvise); +COND_SYSCALL(process_madvise); COND_SYSCALL(remap_file_pages); COND_SYSCALL(mbind); COND_SYSCALL_COMPAT(mbind); diff --git a/mm/madvise.c b/mm/madvise.c index 119e82e1f065..af02aa17e5c1 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -16,6 +17,7 @@ #include #include #include +#include #include #include #include @@ -1140,3 +1142,46 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) { return madvise_core(current, start, len_in, behavior); } + +SYSCALL_DEFINE4(process_madvise, int, pidfd, unsigned long, start, + size_t, len_in, int, behavior) +{ + int ret; + struct fd f; + struct pid *pid; + struct task_struct *tsk; + struct mm_struct *mm; + + f = fdget(pidfd); + if (!f.file) + return -EBADF; + + pid = pidfd_to_pid(f.file); + if (IS_ERR(pid)) { + ret = PTR_ERR(pid); + goto err; + } + + ret = -EINVAL; + rcu_read_lock(); + tsk = pid_task(pid, PIDTYPE_PID); + if (!tsk) { + rcu_read_unlock(); + goto err; + } + get_task_struct(tsk); + rcu_read_unlock(); + mm = mm_access(tsk, PTRACE_MODE_ATTACH_REALCREDS); + if (!mm || IS_ERR(mm)) { + ret = IS_ERR(mm) ? PTR_ERR(mm) : -ESRCH; + if (ret == -EACCES) + ret = -EPERM; + goto err; + } + ret = madvise_core(tsk, start, len_in, behavior); + mmput(mm); + put_task_struct(tsk); +err: + fdput(f); + return ret; +} -- 2.21.0.1020.gf2820cf01a-goog From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 38A26C04AB3 for ; Wed, 29 May 2019 03:41:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D5CAC21721 for ; Wed, 29 May 2019 03:41:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5CAC21721 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 20D906B026A; Tue, 28 May 2019 23:41:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 195636B026B; Tue, 28 May 2019 23:41:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 087E36B026C; Tue, 28 May 2019 23:41:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from mail-it1-f200.google.com (mail-it1-f200.google.com [209.85.166.200]) by kanga.kvack.org (Postfix) with ESMTP id DDBF76B026A for ; Tue, 28 May 2019 23:41:44 -0400 (EDT) Received: by mail-it1-f200.google.com with SMTP id s18so711075itl.7 for ; Tue, 28 May 2019 20:41:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-original-authentication-results:x-gm-message-state:from:to:cc :subject:date:message-id:in-reply-to:references:mime-version:sender :precedence:list-id:archived-at:list-archive:list-post :content-transfer-encoding; bh=UjIQeXhkvn85mrWtlkur08PgzBnvE93HsXhxpMce0tI=; b=tFQb5UbhLJa1u9VL2GvS2K/lS4IkBdzK32D/NdgJb+oEyx0WPRlkcSQFbf/XyoVrGP EygSH/1L+rLHzIkeKbDf57a/ipDOuKKXShKjKpLaMQkeqdUvurH7MZiSzYaPjPuItgCa KdEv7/N63awkucYA9wEEzEYb7mGf0PzjUKgXMuYZYmmY+9kyz4MEBpyeUDrv3rPqMbN0 UvyLxySstCdY6jTIFcnYhLepa87+YTuVPwayB1UMHY6aeYqedataqbMi9pd9f3cePm+E Yp6sydGdMz4zLQp3KuI5mfHN9Md0pNExHGudf2EFVdYaLp9eoW3M35vlBgYA1zNfCPGV 9tMg== X-Original-Authentication-Results: mx.google.com; spf=pass (google.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) smtp.mailfrom=hdanton@sina.com X-Gm-Message-State: APjAAAUZD2o2KV9G2zuJJEP8acFhbjIGWbyUyOTILkwyQZPTKrDeiDHO u5asjMWsDjbSuJQ4TQkwp/iQM8s8LY0OhjqjEH6bxZsOfrjquKQ/0jQBgOWNsKUkIcF6Ci2bVrD DtR8OmOpfjakMEBL+1/j9zi1Gnb/Zdxs7OCIgfdYtwtlwWVHkXNjxgW8ywYk7Arp3NQ== X-Received: by 2002:a05:660c:4:: with SMTP id q4mr5514914itj.30.1559101304589; Tue, 28 May 2019 20:41:44 -0700 (PDT) X-Google-Smtp-Source: APXvYqxiqjai8XsrK5f0kzOgm3f4LvNq3nopkDNLsG1h1iYDJhtB5sh4rMQ8f0tfEsHgG5/W86Wk X-Received: by 2002:a05:660c:4:: with SMTP id q4mr5514875itj.30.1559101303292; Tue, 28 May 2019 20:41:43 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1559101303; cv=none; d=google.com; s=arc-20160816; b=KINeqNGEkbjYlI35nOOqEwmElX+tSLvt/gMHb7eAA2ZGyDWi91gsA3TFAHzZLZhEpm XeZt5UvorYH35Zn1eS8lkdG/wY4R6Oeg8mCS102rqlWAAWB7sE3TWp5ZNOuxFqKZgPKL CzhRvobfTL4zaGeX8OJN5SlSRI3uqOux2PRY0UV+8CXRHc8Ad9Vz2T7+LOCCiN8NQI3r mnJQ2Q/N37QRTfGirohN3NOtGp1uoca8MggAgEaakj8Dkle1TO30BUsF0ZAq+jU6EYpM GamqVwLaEDZ1EsgrcP5OM/H20hLAQpvGiF3Z85IifSYr+ixy0xGLYRjaZTX5EJkf6Xq+ 4OHg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:list-post:list-archive:archived-at :list-id:precedence:sender:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from; bh=UjIQeXhkvn85mrWtlkur08PgzBnvE93HsXhxpMce0tI=; b=dczruCWWwqy1u91Lyaj2maYNMnXUiA0344pBGGhBx1qDngYcOSlkqwJUyPosGAJQTO DLyYZMfmLZJ9Wlxmu+Kv8Yappy3REr15UijwnItyEoRXMVdEHOrZNC6WcJ5N77iYplL/ MeHrFtBjCMV0e8LtqGxmGwJhS0ZX4Er3Yc3MfCIAcoLeILV1vVT6WIJszw1y6lqr3NTx 7/TradjQHgVu3sW/1rTaIqiFrEp7xx5KCy2VFY+pRhXWg8oBBl38bfwqup/9LI4SIPNa AtG69UCCmf57PZiVJ6uS3nIzn8x0s1ESeyRurRrSkKVy3ig36lNTW69v49zij1R6AXEm fjCQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) smtp.mailfrom=hdanton@sina.com Received: from mail3-166.sinamail.sina.com.cn (mail3-166.sinamail.sina.com.cn. [202.108.3.166]) by mx.google.com with SMTP id o140si673676ito.31.2019.05.28.20.41.42 for ; Tue, 28 May 2019 20:41:43 -0700 (PDT) Received-SPF: pass (google.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) client-ip=202.108.3.166; Authentication-Results: mx.google.com; spf=pass (google.com: domain of hdanton@sina.com designates 202.108.3.166 as permitted sender) smtp.mailfrom=hdanton@sina.com Received: from unknown (HELO localhost.localdomain)([123.112.52.157]) by sina.com with ESMTP id 5CEDFF6B00000B5F; Wed, 29 May 2019 11:41:33 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com X-SMAIL-MID: 883528396263 From: Hillf Danton To: Minchan Kim Cc: Andrew Morton , LKML , linux-mm , Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon Subject: Re: [RFC 5/7] mm: introduce external memory hinting API Date: Wed, 29 May 2019 11:41:23 +0800 Message-Id: <20190520035254.57579-6-minchan@kernel.org> In-Reply-To: <20190520035254.57579-1-minchan@kernel.org> References: <20190520035254.57579-1-minchan@kernel.org> X-Mailer: git-send-email 2.21.0.1020.gf2820cf01a-goog MIME-Version: 1.0 List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Archived-At: List-Archive: List-Post: Content-Transfer-Encoding: 8bit X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Content-Type: text/plain; charset="UTF-8" Message-ID: <20190529034123.Ydws0HZGButPCKc2uBNRJHio08XPB3nB1CUeGF-D3fs@z> On Mon, 20 May 2019 12:52:52 +0900 Minchan Kim wrote: > --- a/arch/x86/entry/syscalls/syscall_64.tbl > +++ b/arch/x86/entry/syscalls/syscall_64.tbl > @@ -355,6 +355,7 @@ > 425 common io_uring_setup __x64_sys_io_uring_setup > 426 common io_uring_enter __x64_sys_io_uring_enter > 427 common io_uring_register __x64_sys_io_uring_register > +428 common process_madvise __x64_sys_process_madvise > Much better if something similar is added for arm64. > # > # x32-specific system call numbers start at 512 to avoid cache impact > --- a/include/uapi/asm-generic/unistd.h > +++ b/include/uapi/asm-generic/unistd.h > @@ -832,6 +832,8 @@ __SYSCALL(__NR_io_uring_setup, sys_io_uring_setup) > __SYSCALL(__NR_io_uring_enter, sys_io_uring_enter) > #define __NR_io_uring_register 427 > __SYSCALL(__NR_io_uring_register, sys_io_uring_register) > +#define __NR_process_madvise 428 > +__SYSCALL(__NR_process_madvise, sys_process_madvise) > > #undef __NR_syscalls > #define __NR_syscalls 428 Seems __NR_syscalls needs to increment by one. BR Hillf