From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D804AC433E1 for ; Tue, 19 May 2020 05:54:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 933BC2075F for ; Tue, 19 May 2020 05:54:57 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LV2rmPEX" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 933BC2075F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2788E900003; Tue, 19 May 2020 01:54:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 20222900002; Tue, 19 May 2020 01:54:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 054D1900003; Tue, 19 May 2020 01:54:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id DA996900002 for ; Tue, 19 May 2020 01:54:56 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A16292C37 for ; Tue, 19 May 2020 05:54:56 +0000 (UTC) X-FDA: 76832404992.18.dock98_6552e6cf8bd3b X-HE-Tag: dock98_6552e6cf8bd3b X-Filterd-Recvd-Size: 8815 Received: from mail-pj1-f65.google.com (mail-pj1-f65.google.com [209.85.216.65]) by imf03.hostedemail.com (Postfix) with ESMTP for ; Tue, 19 May 2020 05:54:56 +0000 (UTC) Received: by mail-pj1-f65.google.com with SMTP id z15so807105pjb.0 for ; Mon, 18 May 2020 22:54:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=KuFInB8SBsyIcapayRGL/wNNAtWfcntO3NhtMrJkPpc=; b=LV2rmPEX65G1P7yzlYAJpyPVHpR3Q6N78jpGYIV6PUPBodcaaPIjSkKiJVeCDhdyRf yiU9LynQ1C5yWPhBJPfuDCmTJU4aDqVsIsu5Bx1ZS1q/j+VKY7+VZMOMeIsobgS0X+Lp IflTXleJ50QPaYwRzJjO0V9MkrNXDxpNxVeApgsvpN2UvXUAUOipX5ojyJ/XTkahsyjJ k/KjWaW7VTD4T3lWmpZXuVuqrJWHvXxThzyOFfu6T+Z16+HM6akCXJZRAtCpMy85EhAe LfUuqoycJqZpuT9WkNdQDD2E9DAE2WMfp1SYrA0ZIkqxevqNRTAcNSi4YqdYOoQTeb1s xe8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=KuFInB8SBsyIcapayRGL/wNNAtWfcntO3NhtMrJkPpc=; b=rpBS5qp4w7oj3sP35qUGtn9KuEKE1HObgRZah63yr3eqYG628KggXL3afgM7dbfmDT D1KCeNR4jy26WGXOoCMCPLJ8DC1rhLSUWsQ57uRRiSHjJkj8NjabR2khfGzifFoCObXH 8hH0CMPrcqcBZGPIhzaDekZNdGcZ1ORGPCPK4TnFNRd/Fx5Oa81G+V4r8dqnFPyRE9WW FMbdnvNSqav2Ng4YxbP0+Z4m0bMABruixzelQyDV78RuUfoHk9vPD7gjStGhjlftlUWx AyJmcIDNEyESUR1LuCT1xn9fzf/KeyFaoUM8sKjoHeL5XL8P3ZUWHoUL4eIYeNaS1hyI GlVw== X-Gm-Message-State: AOAM530IH1JQNQ8B24/kpNm78KDILlxu9GupIeiisUzMg5qoltODwcgq uA/Uxi5LOtgfpz+Nm2jWh2A= X-Google-Smtp-Source: ABdhPJyEGeaN/cshC/J5XWS3NtbK18AkQlyDwbUaSxJgu0y+9hl01mxZmVNpaCpGOJ9XL5vy5APVQQ== X-Received: by 2002:a17:902:5588:: with SMTP id g8mr20295084pli.321.1589867694934; Mon, 18 May 2020 22:54:54 -0700 (PDT) Received: from google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id 127sm10553604pfw.72.2020.05.18.22.54.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 18 May 2020 22:54:53 -0700 (PDT) Date: Mon, 18 May 2020 22:54:51 -0700 From: Minchan Kim To: Andrew Morton Cc: Suren Baghdasaryan , LKML , Christian Brauner , linux-mm , linux-api@vger.kernel.org, Oleksandr Natalenko , Tim Murray , Daniel Colascione , Sandeep Patil , Sonny Rao , Brian Geffon , Michal Hocko , Johannes Weiner , Shakeel Butt , John Dias , Joel Fernandes , Jann Horn , alexander.h.duyck@linux.intel.com, SeongJae Park , David Rientjes , Arjun Roy , Kirill Tkhai Subject: Re: [PATCH] mm: use only pidfd for process_madvise syscall Message-ID: <20200519055451.GA255907@google.com> References: <20200516012055.126205-1-minchan@kernel.org> <20200518211350.GA50295@google.com> <20200518160656.b9651ef7393db8e0614a1175@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200518160656.b9651ef7393db8e0614a1175@linux-foundation.org> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi Andrew, On Mon, May 18, 2020 at 04:06:56PM -0700, Andrew Morton wrote: > On Mon, 18 May 2020 14:13:50 -0700 Minchan Kim wrote: > > > Andrew, I sent this patch without folding into previous syscall introducing > > patches because it could be arguable. If you want to fold it into each > > patchset(i.e., introdcuing process_madvise syscall and introducing > > compat_syscall), let me know it. I will send partial diff to each > > patchset. > > It doesn't seem necessary - I believe we'll get a clean result if I > squish all of these: > > mm-support-vector-address-ranges-for-process_madvise-fix.patch > mm-support-vector-address-ranges-for-process_madvise-fix-fix.patch > mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix.patch > mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix.patch > mm-support-vector-address-ranges-for-process_madvise-fix-fix-fix-fix-fix.patch > mm-use-only-pidfd-for-process_madvise-syscall.patch > > into mm-support-vector-address-ranges-for-process_madvise.patch and > make the appropriate changelog adjustments? > If you want to fold them all, please use the description below for mm-support-vector-address-ranges-for-process_madvise.patch. Thanks. ============== &< =================== Subject: [PATCH] mm: support vector address ranges for process_madvise This patch changes process_madvise interface: a) support vector address ranges in a system call b) support the vector address ranges to local process as well as external process c) remove pid but keep only pidfd in argument - [1][2] d) change type of flags with unsgined int Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma. (With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost of TLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT MADV_MERGEABLE MADV_UNMERGEABLE Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. [1] https://lore.kernel.org/linux-mm/20200509124817.xmrvsrq3mla6b76k@wittgenstein/ [2] https://lore.kernel.org/linux-mm/9d849087-3359-c4ab-fbec-859e8186c509@virtuozzo.com/ Reviewed-by: Suren Baghdasaryan Signed-off-by: Minchan Kim