From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1DF8DC433EF for ; Wed, 11 May 2022 15:35:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 909456B0074; Wed, 11 May 2022 11:35:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B5DA6B0075; Wed, 11 May 2022 11:35:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7569A8D0001; Wed, 11 May 2022 11:35:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 671C26B0074 for ; Wed, 11 May 2022 11:35:13 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 74F4D60B94 for ; Wed, 11 May 2022 15:35:12 +0000 (UTC) X-FDA: 79453860864.08.C55194D Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) by imf16.hostedemail.com (Postfix) with ESMTP id 0FAF91800CC for ; Wed, 11 May 2022 15:35:02 +0000 (UTC) Received: by mail-lf1-f45.google.com with SMTP id p26so4201970lfh.10 for ; Wed, 11 May 2022 08:35:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8z7yrLRorHtSBCr58XHk5wOf4FuBFt6Bv6g40MNkFfs=; b=PlDdgp36UD28x7nLw4GgE7IxwWSp1Tf265TFyB1ceXbIQkRXuJMugVn3Tf5Geu2TFo Iow/Ul4WYRKo8XqCQFlMxdUmQsd0nW4kiyFS8ILaIHLO2FsY/iR5llvJTaFB+iZGDHox pJYY5xbuQAcgtxPilX4Tk59H8OFSIkHRqmXgA71VtIXmBxKMlnCllIC1j2uMN1/684e9 tmHk1LdDg6bxhTDW+O8uUOlXwiJJhM9S0i0KsHvYa+A8MLk3lGqRemDJmvkOWLkN+OTf X2/AwY5RrNcXbAJjs4Dtk8XNAzsxkhESZZJnvaOHqshS2w1WiaLJWxJ5VkhY9ubdFbKJ Nd2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8z7yrLRorHtSBCr58XHk5wOf4FuBFt6Bv6g40MNkFfs=; b=qDyoTaCqh6ip3SNaC4UQNUVXlnhKQNuwIOrBPVS1uN8HNohWG6BaaTrqRxnvCsQMnm OhFtelt1AbH3tXuUzI83rrRDjNgUvMVq8NRCqhih4XdD3SCYG+vBa2xoQ3qYWl8mzacV 36gTSnAeJ3YsiPMHqiS2qHgEMVq4fbVVf9P2lNNNIImmda40ETGdGgrisF3WrzzQaof2 OZlK+KFZC7d7o8gf30NAwY4gYPAzMbUL5tJsdNOvx/9+csqjBvxotgNAJT83358KlEBb FSrRE6EhqzvFQN65Pl5RNu9bmdu/oKfhD9gNoH8Ots/p6ryVRD7HO9pBjsMpgYc4PfDi AjIQ== X-Gm-Message-State: AOAM533V7WQvgsQxTubwZvk+j/43Qwtiyhq26ggBxtlz7GFmnGpuGD+y fhbNWLleXgoTt2mGB6zqBSOjxVT7j8jkPriEZOrhmw== X-Google-Smtp-Source: ABdhPJyZgdLSqVJdBAHnxiQtHxOHh2Fl6Fqu2WZ10srCF+JICZ8sAMfUSIGlaSMQuGeUxx1Kf4a8s0FMm4pEi4Jqgdo= X-Received: by 2002:a19:4952:0:b0:472:1056:a93c with SMTP id l18-20020a194952000000b004721056a93cmr19746819lfj.60.1652283309833; Wed, 11 May 2022 08:35:09 -0700 (PDT) MIME-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> <20220504214437.2850685-11-zokeefe@google.com> <502a3ced-f3c6-7117-3b24-d80d204d66ee@linux.alibaba.com> In-Reply-To: <502a3ced-f3c6-7117-3b24-d80d204d66ee@linux.alibaba.com> From: "Zach O'Keefe" Date: Wed, 11 May 2022 08:34:33 -0700 Message-ID: Subject: Re: [PATCH v5 10/13] mm/madvise: add MADV_COLLAPSE to process_madvise() To: Rongwei Wang Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , calling@linux.alibaba.com Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 0FAF91800CC X-Stat-Signature: 7zwoe6uon87ifxkd6acx9udjjfezca7r X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=PlDdgp36; spf=pass (imf16.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.45 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1652283302-949653 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey Rongwei, Thanks for taking the time to review! On Tue, May 10, 2022 at 5:49 PM Rongwei Wang wrote: > > Hi, Zach > > Thanks for your great patchset! > Recently, We also try to collapse THP in this way, likes performance > degradation due to using too much hugepages in our scenes. > > And there is a doubt about process_madvise(MADV_COLLAPSE) when we test > this patchset:. It seems that process_madvise(MADV_COLLAPSE) rely on > madvise(MADV_HUGEPAGE)? If the vma wasn't marked with 'hg', > process_madvise(MADV_COLLAPSE) will fail to collapse. And if I miss > something, please let me know. > I tried to have MADV_COLLAPSE follow the same THP eligibility semantics as khugepaged and at-fault: either THP=always, or THP=madvise and the vma is marked with MADV_HUGEPAGE, as you point out. If I understand you correctly, the usefulness of process_madvise(MADV_COLLAPSE) is limited in the case where THP=madvise and a CAP_SYS_ADMIN user is requesting a collapse of behalf of another process since they don't have a way to mark the target memory as eligible (which requires VM_HUGEPAGE). If so, I think that's a valid point, and your suggestion below of a supporting MADV_[NO]HUGEPAGE for process_madvise(2) makes sense. For the sake of exploring all options, I'll mention that there was also a previous idea suggested by Yang Shi where MADV_COLLAPSE could also set VM_HUGEPAGE[1]. Since it's possible supporting MADV_[NO]HUGEPAGE for process_madivse(2) has applications outside a subsequent MADV_COLLAPSE, and since I don't see process_madvise(MADV_COLLAPSE) to be in a hot path, I'd vote in favor of your suggestion and include process_madvise(MADV_[NO]HUGEPAGE) support in v6 unless others object. Thanks again for your review and your suggestion! Zach [1] https://lore.kernel.org/linux-mm/CAHbLzkqLRBd6u3qn=KqpOhRcPZtpGXbTXLUjK1z=4d_dQ06Pvw@mail.gmail.com/ > If so, how about introducing process_madvise(MADV_HUGEPAGE) or > process_madvise(MADV_NOHUGEPAGE)? The former helps to mark the target > vma with 'hg', and the collapse process can be finished completely with > the help of other processes. the latter could let some special vma avoid > collapsing when setting 'THP=always'. > > Best regards, > -wrw > > On 5/5/22 5:44 AM, Zach O'Keefe wrote: > > Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has > > CAP_SYS_ADMIN or is requesting collapse of it's own memory. > > > > Signed-off-by: Zach O'Keefe > > --- > > mm/madvise.c | 6 ++++-- > > 1 file changed, 4 insertions(+), 2 deletions(-) > > > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 638517952bd2..08c11217025a 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior) > > } > > > > static bool > > -process_madvise_behavior_valid(int behavior) > > +process_madvise_behavior_valid(int behavior, struct task_struct *task) > > { > > switch (behavior) { > > case MADV_COLD: > > case MADV_PAGEOUT: > > case MADV_WILLNEED: > > return true; > > + case MADV_COLLAPSE: > > + return task == current || capable(CAP_SYS_ADMIN); > > default: > > return false; > > } > > @@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > > goto free_iov; > > } > > > > - if (!process_madvise_behavior_valid(behavior)) { > > + if (!process_madvise_behavior_valid(behavior, task)) { > > ret = -EINVAL; > > goto release_task; > > }