linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Mina Almasry <almasrymina@google.com>,
	Michal Hocko <mhocko@suse.com>, Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Shuah Khan <shuah@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v2 1/3] mm: enable MADV_DONTNEED for hugetlb mappings
Date: Wed, 2 Feb 2022 09:14:19 +0100	[thread overview]
Message-ID: <20571829-9d3d-0b48-817c-b6b15565f651@redhat.com> (raw)
In-Reply-To: <20220202014034.182008-2-mike.kravetz@oracle.com>

On 02.02.22 02:40, Mike Kravetz wrote:
> MADV_DONTNEED is currently disabled for hugetlb mappings.  This
> certainly makes sense in shared file mappings as the pagecache maintains
> a reference to the page and it will never be freed.  However, it could
> be useful to unmap and free pages in private mappings.
> 
> The only thing preventing MADV_DONTNEED from working on hugetlb mappings
> is a check in can_madv_lru_vma().  To allow support for hugetlb mappings
> create and use a new routine madvise_dontneed_free_valid_vma() that will
> allow hugetlb mappings.  Also, before calling zap_page_range in the
> DONTNEED case align start and size to huge page size for hugetlb vmas.
> madvise only requires PAGE_SIZE alignment, but the hugetlb unmap routine
> requires huge page size alignment.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  mm/madvise.c | 24 ++++++++++++++++++++++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 5604064df464..7ae891e030a4 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -796,10 +796,30 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
>  static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
>  					unsigned long start, unsigned long end)
>  {
> +	/*
> +	 * start and size (end - start) must be huge page size aligned
> +	 * for hugetlb vmas.
> +	 */
> +	if (is_vm_hugetlb_page(vma)) {
> +		struct hstate *h = hstate_vma(vma);
> +
> +		start = ALIGN_DOWN(start, huge_page_size(h));
> +		end = ALIGN(end, huge_page_size(h));

So you effectively extend the range silently. IIUC, if someone would zap
a 4k range you would implicitly zap a whole 2M page and effectively zero
out more data than requested.


Looking at do_madvise(), we:
(1) reject start addresses that are not page-aligned
(2) shrink lengths that are not page-aligned and refuse if it turns 0

The man page documents (1) but doesn't really document (2).

Naturally I'd have assume that we apply the same logic to huge page
sizes and documenting it in the man page accordingly.


Why did you decide to extend the range? I'd assume MADV_REMOVE behaves
like FALLOC_FL_PUNCH_HOLE:
  "Within the specified range, partial filesystem blocks are zeroed, and
   whole filesystem blocks are removed from the file.  After a
   successful call, subsequent reads from  this  range will return
   zeros."
So we don't "discard more than requested".


I see the following possible alternatives:
(a) Fail if the range is not aligned
-> Clear semantics
(b) Fail if the start is not aligned, shrink the end if required
-> Same rules as for PAGE_SIZE
(c) Zero out the requested part
-> Same semantics as FALLOC_FL_PUNCH_HOLE.

My preference would be a), properly documenting it in the man page.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-02-02  8:14 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02  1:40 [PATCH v2 0/3] Add hugetlb MADV_DONTNEED support Mike Kravetz
2022-02-02  1:40 ` [PATCH v2 1/3] mm: enable MADV_DONTNEED for hugetlb mappings Mike Kravetz
2022-02-02  8:14   ` David Hildenbrand [this message]
2022-02-02 19:32     ` Mike Kravetz
2022-02-04  8:35       ` David Hildenbrand
2022-02-07 23:47         ` Mike Kravetz
2022-02-10 13:09           ` David Hildenbrand
2022-02-10 22:11             ` Mike Kravetz
2022-02-11  8:43               ` David Hildenbrand
2022-02-10  3:21   ` Peter Xu
2022-02-10 21:36     ` Mike Kravetz
2022-02-11  2:28       ` Peter Xu
2022-02-11 19:08         ` Axel Rasmussen
2022-02-11 19:18           ` Mike Kravetz
2022-02-02  1:40 ` [PATCH v2 2/3] selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test Mike Kravetz
2022-02-02  1:40 ` [PATCH v2 3/3] userfaultfd/selftests: enable huegtlb remap and remove event testing Mike Kravetz
2022-02-02  6:11   ` Mike Rapoport

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20571829-9d3d-0b48-817c-b6b15565f651@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=axelrasmussen@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=naoya.horiguchi@linux.dev \
    --cc=peterx@redhat.com \
    --cc=shuah@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).