From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 89D16C433E0 for ; Sat, 16 May 2020 01:21:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2F31F20756 for ; Sat, 16 May 2020 01:21:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="fwB8tR/B" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2F31F20756 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B38CE8E0003; Fri, 15 May 2020 21:21:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AEA168E0001; Fri, 15 May 2020 21:21:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9D84C8E0003; Fri, 15 May 2020 21:21:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 850CB8E0001 for ; Fri, 15 May 2020 21:21:02 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 37F148248047 for ; Sat, 16 May 2020 01:21:02 +0000 (UTC) X-FDA: 76820828364.28.month92_9018ac87c316 X-HE-Tag: month92_9018ac87c316 X-Filterd-Recvd-Size: 8997 Received: from mail-pj1-f67.google.com (mail-pj1-f67.google.com [209.85.216.67]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Sat, 16 May 2020 01:21:01 +0000 (UTC) Received: by mail-pj1-f67.google.com with SMTP id t40so1776247pjb.3 for ; Fri, 15 May 2020 18:21:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=eD8nSEX9FeEfFYFcx1sY+GdGn8YPwFSgWFRNwl0px7U=; b=fwB8tR/Bt20pITmsd0QARLZNQnS8xckhvPsIJziPVT/Sn230jQib78WGwDeCpaW10Y KCgSDnjQvCRfC9RAI1OSumIjP9IDX5v11MmVPBvIUwmCKG7G9nvdV54UVIDrK6+oWfpL YlXxdYC8PgRRzD5a5iFtCxiDyXmMeta57/Bjqg64gcE3IP65HSbaslPzkWSwZi8n5fcx MG59wSAr7Wf0c0UVSildzKqkcTxViOIw8hi8joSpxdHAFmnpTGb462qrE2xl1PFGJ9do jBReA+yk0BAG5z7WXcCtGsIy43FMq+/LSOq//rLHmdkSmkRk8nZ65KkUmdq5XlsWTW58 nrYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :mime-version:content-transfer-encoding; bh=eD8nSEX9FeEfFYFcx1sY+GdGn8YPwFSgWFRNwl0px7U=; b=QGmRkrN4Cnpfj+qcSZkQIzAdhZPnvDP+PYJ2nttALDP8879Tfv2hKdS/vS2J9O+Kqj loWjyftNQdbjGm2qVTaXzVq1kiBBxjz3HZ+zhSrhCK/2ybKCsPeLihaDt0VJeVB+ZRhS sFMFEEB4Pk1FUP5xaqtyh9/mc9vxXEhSGB9JpnUOC3w2S+H7If+ntAQYJ5mC2RELOSH2 HLH7d6IMyoSftruJcYyL0KYqh2q8OlFYfzkjeRIA1EA7s28JLk+Xm3Lw2DT76OjV+0Cc U6Cmm/O4f39l+qSlBE+neWfYoYqXxCOZbEaJgC/5BcAH1+NzU5m49rG2XVz2ofBiyy0s GqVw== X-Gm-Message-State: AOAM532YIC7mSaJ6S9KJrC99mlX8bNJ516F6oNLC8DmdP1BhRA8nE0bl CdvYEiv9ST5+ffTctolTvPc= X-Google-Smtp-Source: ABdhPJxaQ4eukrKWAZXgD0uhfLUXE8sPXxhov8YiL7z2w5J5Emj5+A0V94NHNrxEQhsUsDYmR1lKGQ== X-Received: by 2002:a17:90b:1004:: with SMTP id gm4mr6722768pjb.35.1589592060414; Fri, 15 May 2020 18:21:00 -0700 (PDT) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id ep10sm104530pjb.21.2020.05.15.18.20.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 15 May 2020 18:20:59 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: LKML , Christian Brauner , linux-mm , linux-api@vger.kernel.org, oleksandr@redhat.com, Suren Baghdasaryan , Tim Murray , Daniel Colascione , Sandeep Patil , Sonny Rao , Brian Geffon , Michal Hocko , Johannes Weiner , Shakeel Butt , John Dias , Joel Fernandes , Jann Horn , alexander.h.duyck@linux.intel.com, SeongJae Park , David Rientjes , Arjun Roy , Kirill Tkhai , Minchan Kim Subject: [PATCH] mm: use only pidfd for process_madvise syscall Date: Fri, 15 May 2020 18:20:55 -0700 Message-Id: <20200516012055.126205-1-minchan@kernel.org> X-Mailer: git-send-email 2.26.2.761.g0e0b3e54be-goog MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Based on discussion[1], people didn't feel we need to support both pid and pidfd for every new coming API[2] so this patch keeps only pidfd. This patch also changes flags's type with "unsigned int". So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directi= ons to the kernel about the address ranges from external process as wel= l as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve = system or application performance. The pidfd selects the process referred to by the PID file descripto= r specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined i= n as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by idtype = and id is external. MADV_COLD MADV_PAGEOUT MADV_MERGEABLE MADV_UNMERGEABLE Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. [1] https://lore.kernel.org/linux-mm/20200509124817.xmrvsrq3mla6b76k@witt= genstein/ [2] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509= @virtuozzo.com/ Signed-off-by: Minchan Kim --- mm/madvise.c | 42 +++++++++++++----------------------------- 1 file changed, 13 insertions(+), 29 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index d3fbbe52d230..35c9b220146a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1229,8 +1229,8 @@ static int process_madvise_vec(struct task_struct *= target_task, return ret; } =20 -static ssize_t do_process_madvise(int which, pid_t upid, struct iov_iter= *iter, - int behavior, unsigned long flags) +static ssize_t do_process_madvise(int pidfd, struct iov_iter *iter, + int behavior, unsigned int flags) { ssize_t ret; struct pid *pid; @@ -1241,26 +1241,12 @@ static ssize_t do_process_madvise(int which, pid_= t upid, struct iov_iter *iter, if (flags !=3D 0) return -EINVAL; =20 - switch (which) { - case P_PID: - if (upid <=3D 0) - return -EINVAL; - - pid =3D find_get_pid(upid); - if (!pid) - return -ESRCH; - break; - case P_PIDFD: - if (upid < 0) - return -EINVAL; - - pid =3D pidfd_get_pid(upid); - if (IS_ERR(pid)) - return PTR_ERR(pid); - break; - default: + if (pidfd < 0) return -EINVAL; - } + + pid =3D pidfd_get_pid(pidfd); + if (IS_ERR(pid)) + return PTR_ERR(pid); =20 task =3D get_pid_task(pid, PIDTYPE_PID); if (!task) { @@ -1292,9 +1278,8 @@ static ssize_t do_process_madvise(int which, pid_t = upid, struct iov_iter *iter, return ret; } =20 -SYSCALL_DEFINE6(process_madvise, int, which, pid_t, upid, - const struct iovec __user *, vec, unsigned long, vlen, - int, behavior, unsigned long, flags) +SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *= , vec, + unsigned long, vlen, int, behavior, unsigned int, flags) { ssize_t ret; struct iovec iovstack[UIO_FASTIOV]; @@ -1303,19 +1288,18 @@ SYSCALL_DEFINE6(process_madvise, int, which, pid_= t, upid, =20 ret =3D import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter= ); if (ret >=3D 0) { - ret =3D do_process_madvise(which, upid, &iter, behavior, flags); + ret =3D do_process_madvise(pidfd, &iter, behavior, flags); kfree(iov); } return ret; } =20 #ifdef CONFIG_COMPAT -COMPAT_SYSCALL_DEFINE6(process_madvise, compat_int_t, which, - compat_pid_t, upid, +COMPAT_SYSCALL_DEFINE5(process_madvise, compat_int_t, pidfd, const struct compat_iovec __user *, vec, compat_ulong_t, vlen, compat_int_t, behavior, - compat_ulong_t, flags) + compat_int_t, flags) =20 { ssize_t ret; @@ -1326,7 +1310,7 @@ COMPAT_SYSCALL_DEFINE6(process_madvise, compat_int_= t, which, ret =3D compat_import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); if (ret >=3D 0) { - ret =3D do_process_madvise(which, upid, &iter, behavior, flags); + ret =3D do_process_madvise(pidfd, &iter, behavior, flags); kfree(iov); } return ret; --=20 2.26.2.761.g0e0b3e54be-goog