All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rongwei Wang <rongwei.wang@linux.alibaba.com>
To: Zach O'Keefe <zokeefe@google.com>
Cc: Alex Shi <alex.shi@linux.alibaba.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@suse.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Peter Xu <peterx@redhat.com>, SeongJae Park <sj@kernel.org>,
	Song Liu <songliubraving@fb.com>,
	Vlastimil Babka <vbabka@suse.cz>, Yang Shi <shy828301@gmail.com>,
	Zi Yan <ziy@nvidia.com>,
	linux-mm@kvack.org, Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Chris Kennelly <ckennelly@google.com>,
	Chris Zankel <chris@zankel.net>, Helge Deller <deller@gmx.de>,
	Hugh Dickins <hughd@google.com>,
	Ivan Kokshaysky <ink@jurassic.park.msu.ru>,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Jens Axboe <axboe@kernel.dk>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	Matt Turner <mattst88@gmail.com>,
	Max Filippov <jcmvbkbc@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	Patrick Xia <patrickx@google.com>,
	Pavel Begunkov <asml.silence@gmail.com>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	calling@linux.alibaba.com
Subject: Re: [PATCH v5 10/13] mm/madvise: add MADV_COLLAPSE to process_madvise()
Date: Thu, 12 May 2022 23:53:16 +0800	[thread overview]
Message-ID: <f79ddd99-a556-8484-f39d-f64f3bf9d5f2@linux.alibaba.com> (raw)
In-Reply-To: <CAAa6QmSYu5+3Uoe2ObkjEP9x76aY2DC3Nnu62o9Xc8tfB7J+bQ@mail.gmail.com>



On 5/11/22 11:34 PM, Zach O'Keefe wrote:
> Hey Rongwei,
> 
> Thanks for taking the time to review!
> 
> On Tue, May 10, 2022 at 5:49 PM Rongwei Wang
> <rongwei.wang@linux.alibaba.com> wrote:
>>
>> Hi, Zach
>>
>> Thanks for your great patchset!
>> Recently, We also try to collapse THP in this way, likes performance
>> degradation due to using too much hugepages in our scenes.
>>
>> And there is a doubt about process_madvise(MADV_COLLAPSE) when we test
>> this patchset:. It seems that process_madvise(MADV_COLLAPSE) rely on
>> madvise(MADV_HUGEPAGE)? If the vma wasn't marked with 'hg',
>> process_madvise(MADV_COLLAPSE) will fail to collapse. And if I miss
>> something, please let me know.
>>
> 
> I tried to have MADV_COLLAPSE follow the same THP eligibility
> semantics as khugepaged and at-fault: either THP=always, or
> THP=madvise and the vma is marked with MADV_HUGEPAGE, as you point
> out.
> 
> If I understand you correctly, the usefulness of
> process_madvise(MADV_COLLAPSE) is limited in the case where
> THP=madvise and a CAP_SYS_ADMIN user is requesting a collapse of
> behalf of another process since they don't have a way to mark the
> target memory as eligible (which requires VM_HUGEPAGE).
Hi Zach

Yes, that's my means. It seems that MADV_[NO]HUGEPAGE for 
process_madivse(2) just needs little codes.

Anyway, looking forward to your next patchset.

Thanks
-wrw

> 
> If so, I think that's a valid point, and your suggestion below of a
> supporting MADV_[NO]HUGEPAGE for process_madvise(2) makes sense. For
> the sake of exploring all options, I'll mention that there was also a
> previous idea suggested by Yang Shi where MADV_COLLAPSE could also set
> VM_HUGEPAGE[1].
> 
> Since it's possible supporting MADV_[NO]HUGEPAGE for
> process_madivse(2) has applications outside a subsequent
> MADV_COLLAPSE, and since I don't see process_madvise(MADV_COLLAPSE) to
> be in a hot path, I'd vote in favor of your suggestion and include
> process_madvise(MADV_[NO]HUGEPAGE) support in v6 unless others object.
> 
> Thanks again for your review and your suggestion!
> Zach
> 
> [1] https://lore.kernel.org/linux-mm/CAHbLzkqLRBd6u3qn=KqpOhRcPZtpGXbTXLUjK1z=4d_dQ06Pvw@mail.gmail.com/
> 
>> If so, how about introducing process_madvise(MADV_HUGEPAGE) or
>> process_madvise(MADV_NOHUGEPAGE)? The former helps to mark the target
>> vma with 'hg', and the collapse process can be finished completely with
>> the help of other processes. the latter could let some special vma avoid
>> collapsing when setting 'THP=always'.
>>
>> Best regards,
>> -wrw
>>
>> On 5/5/22 5:44 AM, Zach O'Keefe wrote:
>>> Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has
>>> CAP_SYS_ADMIN or is requesting collapse of it's own memory.
>>>
>>> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
>>> ---
>>>    mm/madvise.c | 6 ++++--
>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/madvise.c b/mm/madvise.c
>>> index 638517952bd2..08c11217025a 100644
>>> --- a/mm/madvise.c
>>> +++ b/mm/madvise.c
>>> @@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior)
>>>    }
>>>
>>>    static bool
>>> -process_madvise_behavior_valid(int behavior)
>>> +process_madvise_behavior_valid(int behavior, struct task_struct *task)
>>>    {
>>>        switch (behavior) {
>>>        case MADV_COLD:
>>>        case MADV_PAGEOUT:
>>>        case MADV_WILLNEED:
>>>                return true;
>>> +     case MADV_COLLAPSE:
>>> +             return task == current || capable(CAP_SYS_ADMIN);
>>>        default:
>>>                return false;
>>>        }
>>> @@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec,
>>>                goto free_iov;
>>>        }
>>>
>>> -     if (!process_madvise_behavior_valid(behavior)) {
>>> +     if (!process_madvise_behavior_valid(behavior, task)) {
>>>                ret = -EINVAL;
>>>                goto release_task;
>>>        }


  reply	other threads:[~2022-05-12 15:53 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-04 21:44 [PATCH v5 00/12] mm: userspace hugepage collapse Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 01/13] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP Zach O'Keefe
2022-05-18 18:41   ` Peter Xu
2022-05-19 21:06     ` Zach O'Keefe
2022-05-20  1:12       ` Peter Xu
2022-05-04 21:44 ` [PATCH v5 02/13] mm/khugepaged: add struct collapse_control Zach O'Keefe
2022-05-12 20:02   ` David Rientjes
2022-05-18 20:03   ` Peter Xu
2022-05-18 20:11     ` Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 03/13] mm/khugepaged: dedup and simplify hugepage alloc and charging Zach O'Keefe
2022-05-12 20:02   ` David Rientjes
2022-05-13 18:26     ` Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 04/13] mm/khugepaged: make hugepage allocation context-specific Zach O'Keefe
2022-05-12 20:02   ` David Rientjes
2022-05-13 23:04     ` Zach O'Keefe
2022-05-13 23:17       ` Yang Shi
2022-05-13 23:55         ` Zach O'Keefe
2022-05-17 17:18           ` Yang Shi
2022-05-17 22:35             ` Zach O'Keefe
2022-05-25 17:58               ` Yang Shi
2022-05-25 18:27                 ` Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 05/13] mm/khugepaged: pipe enum scan_result codes back to callers Zach O'Keefe
2022-05-12 20:02   ` David Rientjes
2022-05-04 21:44 ` [PATCH v5 06/13] mm/khugepaged: add flag to ignore khugepaged_max_ptes_* Zach O'Keefe
2022-05-12 20:03   ` David Rientjes
2022-05-04 21:44 ` [PATCH v5 07/13] mm/khugepaged: add flag to ignore page young/referenced requirement Zach O'Keefe
2022-05-12 20:03   ` David Rientjes
2022-05-13 18:17     ` Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 08/13] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse Zach O'Keefe
2022-05-05 18:50   ` Zach O'Keefe
2022-05-05 18:58     ` Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 09/13] mm/khugepaged: rename prefix of shared collapse functions Zach O'Keefe
2022-05-12 20:03   ` David Rientjes
2022-05-04 21:44 ` [PATCH v5 10/13] mm/madvise: add MADV_COLLAPSE to process_madvise() Zach O'Keefe
2022-05-11  0:49   ` Rongwei Wang
2022-05-11 15:34     ` Zach O'Keefe
2022-05-12 15:53       ` Rongwei Wang [this message]
2022-05-12 20:03       ` David Rientjes
2022-05-13 21:06         ` Zach O'Keefe
2022-05-16  3:56         ` Rongwei Wang
2022-05-12 20:03   ` David Rientjes
2022-05-04 21:44 ` [PATCH v5 11/13] selftests/vm: modularize collapse selftests Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 12/13] selftests/vm: add MADV_COLLAPSE collapse context to selftests Zach O'Keefe
2022-05-04 21:44 ` [PATCH v5 13/13] selftests/vm: add test to verify recollapse of THPs Zach O'Keefe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f79ddd99-a556-8484-f39d-f64f3bf9d5f2@linux.alibaba.com \
    --to=rongwei.wang@linux.alibaba.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@linux.alibaba.com \
    --cc=arnd@arndb.de \
    --cc=asml.silence@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=axelrasmussen@google.com \
    --cc=calling@linux.alibaba.com \
    --cc=chris@zankel.net \
    --cc=ckennelly@google.com \
    --cc=david@redhat.com \
    --cc=deller@gmx.de \
    --cc=hughd@google.com \
    --cc=ink@jurassic.park.msu.ru \
    --cc=jcmvbkbc@gmail.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-mm@kvack.org \
    --cc=mattst88@gmail.com \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=patrickx@google.com \
    --cc=peterx@redhat.com \
    --cc=rientjes@google.com \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=tsbogend@alpha.franken.de \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.