linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mike Kravetz <mike.kravetz@oracle.com>
To: David Hildenbrand <david@redhat.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 1/3] mm: enable MADV_DONTNEED for hugetlb mappings
Date: Thu, 10 Feb 2022 14:11:14 -0800	[thread overview]
Message-ID: <55b6e78f-9594-8d5b-6ce2-bb699b9549fd@oracle.com> (raw)
In-Reply-To: <b7ebafed-7dc3-b08d-15d4-859c7bc0fe2e@redhat.com>

On 2/10/22 05:09, David Hildenbrand wrote:
> On 08.02.22 00:47, Mike Kravetz wrote:
>> On 2/4/22 00:35, David Hildenbrand wrote:
>>>> I thought this was simple. :)
>>>
>>> It really bugs me that it's under-specified what's supposed to happen
>>> when the length is not aligned.
>>>
>>> BUT: in the posix world, "calling posix_madvise() shall not affect the
>>> semantics of access to memory in the specified range". So we don't care
>>> too much about if we align up/down, because it wouldn't affect the
>>> semantics. Especially for MADV_DONTNEED/MADV_REMOVE as implemented by
>>> Linux this is certainly different and the alignment handling matters.
>>>
>>> So I guess especially for MADV_DONTNEED/MADV_REMOVE we need a clear
>>> specification what's supposed to happen if the length falls into the
>>> middle of a huge page. We should document alignment handling for
>>> madvise() in general I assume.
>>>
>>> IMHO we should have bailed out right from the start whenever something
>>> is not properly aligned, but that ship has sailed. So I agree, maybe we
>>> can make at least hugetlb MADV_DONTNEED obey the same (weird) rules as
>>> ordinary pages.
>>>
>>> So b) would mean, requiring start to be hugepage aligned and aligning-up
>>> the end. Still feels wrong but at least matches existing semantics.
>>>
>>> Hugetlb MADV_REMOVE semantics are unfortunate and we should document the
>>> exception.
>>
>> Thank you for all your comments David!
>>
>> So, my plan was to make MADV_DONTNEED behave as described above:
>> - EINVAL if start address not huge page size aligned
>> - Align end/length up to huge page size.
>>
>> The code I had for this was very specific to MADV_DONTNEED.  I then thought,
>> why not do the same for MADV_REMOVE as well?  Or even more general, add this
>> check and alignment to the vma parsing code in madvise.
>>
>> It was then that I realized there are several madvise behaviors that take
>> non-huge page size aligned addresses for hugetlb mappings today.  Making
>> huge page size alignment a requirement for all madvise behaviors could break
>> existing code.  So, it seems like it could only be added to MADV_DONTNEED as
>> this functionality does not exist today.  We then end up with MADV_DONTNEED
>> as the only behavior requiring huge page size alignment for hugetlb mappings.
>> Sigh!!!
> 
> :/
> 
>>
>> I am now rethinking the decision to proceed with b) as described above.
>>
>> With the exception of MADV_REMOVE (which we may be able to change for
>> hugetlb),  madvise operations operate on huge page size pages for hugetlb
>> mappings.  If start address is in the middle of a hugetlb page, we essentially
>> align down to the beginning of the hugetlb page.  If length lands in the
>> middle of a hugetlb page, we essentially round up.
> 
> Which MADV calls would be affected?

Not sure I understand the question.  I was saying that madvise calls which
operate on hugetlb mappings today only operate on huge pages.  So, this is
essentially align down starting address and align up end address.
For example consider the MADV_POPULATE calls you recently added.  They will
only fault in huge pages in a hugetlb vma.

> The "bad" thing about MADV_DONTNEED and MADV_REMOVE are that they
> destroy data, which is why they heavily diverge from the original posix
> madvise odea.

Hmmm.  That may be a good argument for strict alignment.  We do not want
to remove (or unmap) more than the user intended.  The unmap system call
has such alignment requirements.

Darn!  I'm thinking that I should go back to requiring alignment for
MADV_DONTNEED.

There really is no 'right' answer.

>>
>> When adding MADV_REMOVE perhaps we should go with this align down start and
>> align up end strategy that is used everywhere else?  I really wish we could
>> go back and change things, but as you know it is too late for that.
> 
> I assume whatever we do, we should document it at least cleanly in the
> man page. Unfortunately, hugetlb is a gift that keeps on giving. Making
> it at least somehow consistent, even if it's "hugtlb being consistent in
> its own mess", that would be preferable I guess.

Yes, more than anything we need to document behavior.
-- 
Mike Kravetz


  reply	other threads:[~2022-02-10 22:11 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02  1:40 [PATCH v2 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
2022-02-02  1:40 ` [PATCH v2 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
2022-02-02  8:14   ` David Hildenbrand
2022-02-02 19:32     ` Mike Kravetz
2022-02-04  8:35       ` David Hildenbrand
2022-02-07 23:47         ` Mike Kravetz
2022-02-10 13:09           ` David Hildenbrand
2022-02-10 22:11             ` Mike Kravetz [this message]
2022-02-11  8:43               ` David Hildenbrand
2022-02-10  3:21   ` Peter Xu
2022-02-10 21:36     ` Mike Kravetz
2022-02-11  2:28       ` Peter Xu
2022-02-11 19:08         ` Axel Rasmussen
2022-02-11 19:18           ` Mike Kravetz
2022-02-02  1:40 ` [PATCH v2 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
2022-02-02  1:40 ` [PATCH v2 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing Mike Kravetz
2022-02-02  6:11   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55b6e78f-9594-8d5b-6ce2-bb699b9549fd@oracle.com \
    --to=mike.kravetz@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).