From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C0C00C433FE for ; Tue, 8 Dec 2020 07:23:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 5EA2323A69 for ; Tue, 8 Dec 2020 07:23:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5EA2323A69 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CA0936B005D; Tue, 8 Dec 2020 02:23:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C50566B0068; Tue, 8 Dec 2020 02:23:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B42236B006C; Tue, 8 Dec 2020 02:23:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id 9E4C76B005D for ; Tue, 8 Dec 2020 02:23:58 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6309E180AD81A for ; Tue, 8 Dec 2020 07:23:58 +0000 (UTC) X-FDA: 77569275756.26.hair90_4a066ab273e5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 447F61804B667 for ; Tue, 8 Dec 2020 07:23:58 +0000 (UTC) X-HE-Tag: hair90_4a066ab273e5 X-Filterd-Recvd-Size: 6688 Received: from mail-wm1-f67.google.com (mail-wm1-f67.google.com [209.85.128.67]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 8 Dec 2020 07:23:57 +0000 (UTC) Received: by mail-wm1-f67.google.com with SMTP id e25so1438040wme.0 for ; Mon, 07 Dec 2020 23:23:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=bwnZcJKOltLC78EdCJTTIZysg34xmbCEL+7ZUfnNors=; b=AZufqmYyhIhiIjIXjTNyh2J9tNPyJ/WjQvSjd60viGbcy6GE4yOkipfzzs3Yd3nGrK zzdEigae73Y3oNfJfTSEIGxTOKW0g7mSZpzCX+B1KhzUnNvuEVi21ujVjlN7uj8LZQ5/ CxhTuiDmpGfWSIVaLeECasOAflAsFj2jM7M125qmr1N+8dP/rD28XmdRHK/WrRyEz1Rl Lvxva7DN7Cd94jq7XnSe/3DBFF2Gy3MvH8ArnQVryjdrMF1sJ4/JORUiWCHX+4tYQt8H R4f+uolTnhOINvG6Jc+hTUauTim4L/AKv8xWP/4pnjJRZEaSybGonIFRIXggYWpclFCQ ucHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bwnZcJKOltLC78EdCJTTIZysg34xmbCEL+7ZUfnNors=; b=gZ9iqT4fDtPVKfxZ8FeC1VN+HlSoO6BssIetngt5aQQSFiBOFvJBCfjBQcUDt7XH4z LIBu/QYqG32M6cK7quVG6+o41GWVcKv9x731o+odNM3Cm40LoTOTySyNt+9WHSZeoZD+ 48UQUJ+54MDb96qvSEz5V1kQP4QyxBGkB03JPXtqOstnaoPW3RLt+Q1Le9llKii6i8jJ v7p2TvR1PDC1/QQc6yw0zCXvgOXZxps6uKnOxzHiyjU2tP8hGSiWUMaC+V1ufLkGNaOg 399ZQSDkeHGIq+R17dRGU/XaLyKyvPHeYhm1cvj0F5mSnR8GCm0dSuoBD13pnMGUkOU7 dqWg== X-Gm-Message-State: AOAM530j/786YcFABk2a0ZpAkxKTKP/4rEpWe4lgbIAlhHSn3+3cvTGZ 3wEwsOdcFNWYHuQJJVYGkyFeNVnwS9N6pS53OU/S0w== X-Google-Smtp-Source: ABdhPJxr4+o94bHYhKcEmobnJ9PgmKWvNErqCzXT8KkYnhyRDLin/XHz0bPr2OLcOhJnyokzuUQFUb9zkQ0GKOQ0ZTM= X-Received: by 2002:a7b:cf37:: with SMTP id m23mr2415823wmg.37.1607412236344; Mon, 07 Dec 2020 23:23:56 -0800 (PST) MIME-Version: 1.0 References: <20201124053943.1684874-1-surenb@google.com> <20201124053943.1684874-2-surenb@google.com> <20201125231322.GF1484898@google.com> <20201125234322.GG1484898@google.com> In-Reply-To: From: Suren Baghdasaryan Date: Mon, 7 Dec 2020 23:23:45 -0800 Message-ID: Subject: Re: [PATCH 1/2] mm/madvise: allow process_madvise operations on entire memory range To: Minchan Kim Cc: Andrew Morton , Michal Hocko , Michal Hocko , David Rientjes , Matthew Wilcox , Johannes Weiner , Roman Gushchin , Rik van Riel , Christian Brauner , Oleg Nesterov , Tim Murray , linux-api@vger.kernel.org, linux-mm , LKML , kernel-team Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 30, 2020 at 11:01 AM Suren Baghdasaryan wrote: > > On Wed, Nov 25, 2020 at 3:43 PM Minchan Kim wrote: > > > > On Wed, Nov 25, 2020 at 03:23:40PM -0800, Suren Baghdasaryan wrote: > > > On Wed, Nov 25, 2020 at 3:13 PM Minchan Kim wrote: > > > > > > > > On Mon, Nov 23, 2020 at 09:39:42PM -0800, Suren Baghdasaryan wrote: > > > > > process_madvise requires a vector of address ranges to be provided for > > > > > its operations. When an advice should be applied to the entire process, > > > > > the caller process has to obtain the list of VMAs of the target process > > > > > by reading the /proc/pid/maps or some other way. The cost of this > > > > > operation grows linearly with increasing number of VMAs in the target > > > > > process. Even constructing the input vector can be non-trivial when > > > > > target process has several thousands of VMAs and the syscall is being > > > > > issued during high memory pressure period when new allocations for such > > > > > a vector would only worsen the situation. > > > > > In the case when advice is being applied to the entire memory space of > > > > > the target process, this creates an extra overhead. > > > > > Add PMADV_FLAG_RANGE flag for process_madvise enabling the caller to > > > > > advise a memory range of the target process. For now, to keep it simple, > > > > > only the entire process memory range is supported, vec and vlen inputs > > > > > in this mode are ignored and can be NULL and 0. > > > > > Instead of returning the number of bytes that advice was successfully > > > > > applied to, the syscall in this mode returns 0 on success. This is due > > > > > to the fact that the number of bytes would not be useful for the caller > > > > > that does not know the amount of memory the call is supposed to affect. > > > > > Besides, the ssize_t return type can be too small to hold the number of > > > > > bytes affected when the operation is applied to a large memory range. > > > > > > > > Can we just use one element in iovec to indicate entire address rather > > > > than using up the reserved flags? > > > > > > > > struct iovec { > > > > .iov_base = NULL, > > > > .iov_len = (~(size_t)0), > > > > }; > > > > > > > > Furthermore, it would be applied for other syscalls where have support > > > > iovec if we agree on it. > > > > > > > > > > The flag also changes the return value semantics. If we follow your > > > suggestion we should also agree that in this mode the return value > > > will be 0 on success and negative otherwise instead of the number of > > > bytes madvise was applied to. > > > > Well, return value will depends on the each API. If the operation is > > desruptive, it should return the right size affected by the API but > > would be okay with 0 or error, otherwise. > > I'm fine with dropping the flag, I just thought with the flag it would > be more explicit that this is a special mode operating on ranges. This > way the patch also becomes simpler. > Andrew, Michal, Christian, what do you think about such API? Should I > change the API this way / keep the flag / change it in some other way? Friendly ping to get some feedback on the proposed API please.