From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EECCBC63777 for ; Mon, 30 Nov 2020 19:01:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5B34320789 for ; Mon, 30 Nov 2020 19:01:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EABF8k2s" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5B34320789 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 734BB6B0036; Mon, 30 Nov 2020 14:01:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E6E86B005C; Mon, 30 Nov 2020 14:01:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D5198D0001; Mon, 30 Nov 2020 14:01:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0094.hostedemail.com [216.40.44.94]) by kanga.kvack.org (Postfix) with ESMTP id 4881B6B0036 for ; Mon, 30 Nov 2020 14:01:48 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 0A132181C9915 for ; Mon, 30 Nov 2020 19:01:48 +0000 (UTC) X-FDA: 77542003896.30.pets24_3315355273a4 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin30.hostedemail.com (Postfix) with ESMTP id 03E6B180B44B4 for ; Mon, 30 Nov 2020 19:01:28 +0000 (UTC) X-HE-Tag: pets24_3315355273a4 X-Filterd-Recvd-Size: 6328 Received: from mail-wm1-f65.google.com (mail-wm1-f65.google.com [209.85.128.65]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Nov 2020 19:01:28 +0000 (UTC) Received: by mail-wm1-f65.google.com with SMTP id a3so478144wmb.5 for ; Mon, 30 Nov 2020 11:01:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jfWeO4tbhTGJrN6AO+wZJR6KFcxbZZ5VZ2sCRkzWaLQ=; b=EABF8k2s1USjU9oYjO5iwqvFMSQ4Yhe8e6PUMCsy7Nv0n6mtBtxWhrl6sFE2b6aG1W oTjB3pd696431yT8mK6dpmQtnWW5tbT/uaKqLRuHdnRBQE+ahwJeRI5c3mJSF+EIM5hB bGjQuA3JayCi8/mY0PRoPsntKsgg1+j9i7RZKC21GT/lj2n6j4dnbUnj7sBzxYaMSU7B YkidQoecd4EC79njWzY+0q8L9wx0ONQiHtVje8VL4/JjQzTYQCZ7UKeYBdxla6zvI/Xz KrOZJGEXwbcJ8VBoQR1UlNPXTfTeDu8+zZfZ2veJqxMkfxUNXmQm60n+BmRAxrgFZe80 i5uw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jfWeO4tbhTGJrN6AO+wZJR6KFcxbZZ5VZ2sCRkzWaLQ=; b=BBODxrrBamxHXxwQ/q02nmigcwMLt/wz4VpeQ1A8gO6BwCLHBYeCxIU9zNYNTCObHw 6CR/jD5YBzfLcOKvMO3jbigmLFGrI3+OFWiqcpEOehyyTSuVe65W8zpB053XSJx7p9PT AnoQoWaBlXqNcDZPS8R5IjsIZb2DzHTEoWc8VnTBVB0U+bFZ5wxffm+R3uVr9Jx2U2nD gnQ2EC0dUb+28LUM0h1oKRQ0f9Ir5EDmfjowOPe+YgbIRRbo7B/qYXAao1K1K6frx93w DXRteIycIM4QbzglcXyc9mtHhP9p604u2k6XGsa04Pmp/BfcpF1tiB8sMtG8i6jvEy5z xC9Q== X-Gm-Message-State: AOAM532HlsA6aKEK8JvB0iKN1rBZm2yHdwXi0yFmOzjKTFGa0dCQJY8g 2OGZjYJs6a0U4h48tTJTaSID5ffBgfqgjjYQYtErUA== X-Google-Smtp-Source: ABdhPJz8qDhhOjxPi0TXmRcyG4hBncx7s7GIOpBZbwZSPQFLUYthnHyxlIQIX6g6jhkX8xW22CtzoRZi+0JV6bQFkdk= X-Received: by 2002:a1c:4e0a:: with SMTP id g10mr292802wmh.88.1606762886790; Mon, 30 Nov 2020 11:01:26 -0800 (PST) MIME-Version: 1.0 References: <20201124053943.1684874-1-surenb@google.com> <20201124053943.1684874-2-surenb@google.com> <20201125231322.GF1484898@google.com> <20201125234322.GG1484898@google.com> In-Reply-To: <20201125234322.GG1484898@google.com> From: Suren Baghdasaryan Date: Mon, 30 Nov 2020 11:01:15 -0800 Message-ID: Subject: Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range To: Minchan Kim Cc: Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Christian Brauner , Oleg Nesterov , Tim Murray , linux-api@vger.kernel.org, linux-mm , LKML , kernel-team Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 25, 2020 at 3:43 PM Minchan Kim wrote: > > On Wed, Nov 25, 2020 at 03:23:40PM -0800, Suren Baghdasaryan wrote: > > On Wed, Nov 25, 2020 at 3:13 PM Minchan Kim wrote: > > > > > > On Mon, Nov 23, 2020 at 09:39:42PM -0800, Suren Baghdasaryan wrote: > > > > process_madvise requires a vector of address ranges to be provided for > > > > its operations. When an advice should be applied to the entire process, > > > > the caller process has to obtain the list of VMAs of the target process > > > > by reading the /proc/pid/maps or some other way. The cost of this > > > > operation grows linearly with increasing number of VMAs in the target > > > > process. Even constructing the input vector can be non-trivial when > > > > target process has several thousands of VMAs and the syscall is being > > > > issued during high memory pressure period when new allocations for such > > > > a vector would only worsen the situation. > > > > In the case when advice is being applied to the entire memory space of > > > > the target process, this creates an extra overhead. > > > > Add PMADV_FLAG_RANGE flag for process_madvise enabling the caller to > > > > advise a memory range of the target process. For now, to keep it simple, > > > > only the entire process memory range is supported, vec and vlen inputs > > > > in this mode are ignored and can be NULL and 0. > > > > Instead of returning the number of bytes that advice was successfully > > > > applied to, the syscall in this mode returns 0 on success. This is due > > > > to the fact that the number of bytes would not be useful for the caller > > > > that does not know the amount of memory the call is supposed to affect. > > > > Besides, the ssize_t return type can be too small to hold the number of > > > > bytes affected when the operation is applied to a large memory range. > > > > > > Can we just use one element in iovec to indicate entire address rather > > > than using up the reserved flags? > > > > > > struct iovec { > > > .iov_base = NULL, > > > .iov_len = (~(size_t)0), > > > }; > > > > > > Furthermore, it would be applied for other syscalls where have support > > > iovec if we agree on it. > > > > > > > The flag also changes the return value semantics. If we follow your > > suggestion we should also agree that in this mode the return value > > will be 0 on success and negative otherwise instead of the number of > > bytes madvise was applied to. > > Well, return value will depends on the each API. If the operation is > desruptive, it should return the right size affected by the API but > would be okay with 0 or error, otherwise. I'm fine with dropping the flag, I just thought with the flag it would be more explicit that this is a special mode operating on ranges. This way the patch also becomes simpler. Andrew, Michal, Christian, what do you think about such API? Should I change the API this way / keep the flag / change it in some other way?