From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 622FFC433FE for ; Fri, 11 Dec 2020 20:28:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E224F24052 for ; Fri, 11 Dec 2020 20:28:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E224F24052 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E4F4A6B0036; Fri, 11 Dec 2020 15:28:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DFF976B005C; Fri, 11 Dec 2020 15:28:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D16496B005D; Fri, 11 Dec 2020 15:28:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id BC5E66B0036 for ; Fri, 11 Dec 2020 15:28:15 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 7E9583630 for ; Fri, 11 Dec 2020 20:28:15 +0000 (UTC) X-FDA: 77582138550.06.soap91_24138bf27403 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 5AD5F1005C4D6 for ; Fri, 11 Dec 2020 20:28:15 +0000 (UTC) X-HE-Tag: soap91_24138bf27403 X-Filterd-Recvd-Size: 6139 Received: from mail-lj1-f195.google.com (mail-lj1-f195.google.com [209.85.208.195]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Fri, 11 Dec 2020 20:28:14 +0000 (UTC) Received: by mail-lj1-f195.google.com with SMTP id x23so12379925lji.7 for ; Fri, 11 Dec 2020 12:28:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Ov6E9ovGP4O2PjufZFcQsFjtRg7X8nEmJMclI/QMkDc=; b=VfeB05TnKzT8k7qG5j2JTqEgnpHRx5eZuwRx+iOAq1g+qyYfZiMWKMjO8Llh9nqFzr oBwQa/TyPS7dFNb+g5BnVbBGnIjcGbutrSaIksGWvSLTWa/qN0ExWBDcxWasRBDUAhzK 4HHKzvyL2den97oJb15WqZGhvdblNNDCNJRLU6ughgfsxVSUdgQPuf/pTY/Nuo0ZAoo+ FsUQRvVTpLHZ/4Mfq1mK+FudM2gmfdv4uUBi1F5zhCw0pxC8AAQ3SUBmumOkQWGc39Dr rn00dPvBmIs6+JE58NWHwjSrID3zEcVVOV1fPyqFi3ijl3q4zYn9ZCk+R6M1De4MbcAw cMfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Ov6E9ovGP4O2PjufZFcQsFjtRg7X8nEmJMclI/QMkDc=; b=pai19Shsjlf5rbNF66hbWXOdTNneSZ/TpUlxru7D35TLvKX7vYSGSKHicYf9JP4T8N 7PaWZSqTpQWFi2xcj91iShoQXPBPyyIU6Z+kt7oEjymzANOvQVMC9iAHXh6NxTCxXLXS S+5YXN34ZgTfg9di0mm4oNP0Evawkvl9FU/1Cs4FRFAvgeVc8CTOrEbmVman6wqoKE9E 0x2BhlF6Fzbn2zkcTgjelubyHbVUr4U0qkr2flIfR69x/URxIKovyYZvJbivUecNtX+7 vfiUvcuSOUGjQl6COrgRX/0v3Y3bSdmDlJMqFxwOD34SoFW2V92qk8RIMgvb04mX5FJh JErA== X-Gm-Message-State: AOAM5316Y3U0dGfRWniU+52qKMv17jnLh5eXsgGatI7T09bduDEJJtx1 GSskVZtkyOB5PrhRWmkXrsclvGJQ93j6KEoV299cVA== X-Google-Smtp-Source: ABdhPJxcuGYuisWlCbDZQh90EUiAHflOU1xqg19jqrMoNWSdbkTHnumBHqFUPRv874mad6N7AIog1ZC/9FS0ZE36ARI= X-Received: by 2002:a2e:593:: with SMTP id 141mr5908685ljf.86.1607718493088; Fri, 11 Dec 2020 12:28:13 -0800 (PST) MIME-Version: 1.0 References: <20201124053943.1684874-1-surenb@google.com> <20201124053943.1684874-2-surenb@google.com> <20201125231322.GF1484898@google.com> In-Reply-To: <20201125231322.GF1484898@google.com> From: Jann Horn Date: Fri, 11 Dec 2020 21:27:46 +0100 Message-ID: Subject: Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range To: Minchan Kim , Christoph Hellwig Cc: Suren Baghdasaryan , Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Christian Brauner , Oleg Nesterov , Tim Murray , Linux API , Linux-MM , kernel list , kernel-team Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: +CC Christoph Hellwig for opinions on compat On Thu, Nov 26, 2020 at 12:22 AM Minchan Kim wrote: > On Mon, Nov 23, 2020 at 09:39:42PM -0800, Suren Baghdasaryan wrote: > > process_madvise requires a vector of address ranges to be provided for > > its operations. When an advice should be applied to the entire process, > > the caller process has to obtain the list of VMAs of the target process > > by reading the /proc/pid/maps or some other way. The cost of this > > operation grows linearly with increasing number of VMAs in the target > > process. Even constructing the input vector can be non-trivial when > > target process has several thousands of VMAs and the syscall is being > > issued during high memory pressure period when new allocations for such > > a vector would only worsen the situation. > > In the case when advice is being applied to the entire memory space of > > the target process, this creates an extra overhead. > > Add PMADV_FLAG_RANGE flag for process_madvise enabling the caller to > > advise a memory range of the target process. For now, to keep it simple, > > only the entire process memory range is supported, vec and vlen inputs > > in this mode are ignored and can be NULL and 0. > > Instead of returning the number of bytes that advice was successfully > > applied to, the syscall in this mode returns 0 on success. This is due > > to the fact that the number of bytes would not be useful for the caller > > that does not know the amount of memory the call is supposed to affect. > > Besides, the ssize_t return type can be too small to hold the number of > > bytes affected when the operation is applied to a large memory range. > > Can we just use one element in iovec to indicate entire address rather > than using up the reserved flags? > > struct iovec { > .iov_base = NULL, > .iov_len = (~(size_t)0), > }; In addition to Suren's objections, I think it's also worth considering how this looks in terms of compat API. If a compat process does process_madvise() on another compat process, it would be specifying the maximum 32-bit number, rather than the maximum 64-bit number, so you'd need special code to catch that case, which would be ugly. And when a compat process uses this API on a non-compat process, it semantically gets really weird: The actual address range covered would be larger than the address range specified. And if we want different access checks for the two flavors in the future, gating that different behavior on special values in the iovec would feel too magical to me. And the length value SIZE_MAX doesn't really make sense anyway because the length of the whole address space would be SIZE_MAX+1, which you can't express. So I'm in favor of a new flag, and strongly against using SIZE_MAX as a magic number here.