All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Minchan Kim <minchan@kernel.org>, Colin Cross <ccross@google.com>,
	Suren Baghdasarya <surenb@google.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes
Date: Mon, 27 Sep 2021 05:33:39 -0700	[thread overview]
Message-ID: <4211F6D4-A282-4AB4-8D96-E273C5ABE0DF@gmail.com> (raw)
In-Reply-To: <20210927115507.6xfpugeg3swookbh@box>



> On Sep 27, 2021, at 4:55 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> 
> On Mon, Sep 27, 2021 at 03:11:20AM -0700, Nadav Amit wrote:
>> 
>>> On Sep 27, 2021, at 2:08 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
>>> 
>>> On Sun, Sep 26, 2021 at 09:12:52AM -0700, Nadav Amit wrote:
>>>> From: Nadav Amit <namit@vmware.com>
>>>> 
>>>> The comment in madvise_dontneed_free() says that vma splits that occur
>>>> while the mmap-lock is dropped, during userfaultfd_remove(), should be
>>>> handled correctly, but nothing in the code indicates that it is so: prev
>>>> is invalidated, and do_madvise() will therefore continue to update VMAs
>>>> from the "obsolete" end (i.e., the one before the split).
>>>> 
>>>> Propagate the changes to end from madvise_dontneed_free() back to
>>>> do_madvise() and continue the updates from the new end accordingly.
>>> 
>>> Could you describe in details a race that would lead to wrong behaviour?
>> 
>> Thanks for the quick response.
>> 
>> For instance, madvise(MADV_DONTNEED) can race with mprotect() and cause
>> the VMA to split.
>> 
>> Something like:
>> 
>>  CPU0				CPU1
>>  ----				----
>>  madvise(0x10000, 0x2000, MADV_DONTNEED)
>>  -> userfaultfd_remove()
>>   [ mmap-lock dropped ]
>> 				mprotect(0x11000, 0x1000, PROT_READ)
>> 				[splitting the VMA]
>> 
>> 				read(uffd)
>> 				[unblocking userfaultfd_remove()]
>> 
>>   [ resuming ]
>>   end = vma->vm_end
>>   [end == 0x11000]
>> 
>>   madvise_dontneed_single_vma(vma, 0x10000, 0x11000)
>> 
>>  Following this operation, 0x11000-0x12000 would not be zapped.
> 
> Okay, fair enough.
> 
> Wouldn't something like this work too:
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 0734db8d53a7..0898120c5c04 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -796,6 +796,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> 			 */
> 			return -ENOMEM;
> 		}
> +		*prev = vma;
> 		if (!can_madv_lru_vma(vma))
> 			return -EINVAL;
> 		if (end > vma->vm_end) {

Admittedly (embarrassingly?) I didn’t even consider it since all the
comments say that once the lock is dropped prev should be invalidated.

Let’s see, considering the aforementioned scenario and that there is
initially one VMA between 0x10000-0x12000.

Looking at the code from do_madvise()):

[ end == 0x12000 ]

                tmp = vma->vm_end;

[ tmp == 0x12000 ]

                if (end < tmp)
                        tmp = end;

                /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */

                error = madvise_vma(vma, &prev, start, tmp, behavior);

[ prev->vm_end == 0x11000 after the split]

                if (error)
                        goto out;
                start = tmp;

[ start == 0x12000 ]

                if (prev && start < prev->vm_end)
                        start = prev->vm_end;

[ The condition (start < prev->vm_end) is false, start not updated ]

                error = unmapped_error;
                if (start >= end)
                        goto out;

[ start >= end; so we end without updating the second part of the split ]

So it does not work.

Perhaps adding this one on top of yours? I can test it when I wake up.
It is cleaner, but I am not sure if I am missing something.

diff --git a/mm/madvise.c b/mm/madvise.c
index 0734db8d53a7..548fc106e70b 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1203,7 +1203,7 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
                if (error)
                        goto out;
                start = tmp;
-               if (prev && start < prev->vm_end)
+               if (prev)
                        start = prev->vm_end;
                error = unmapped_error;
                if (start >= end)

  reply	other threads:[~2021-09-27 12:33 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-26 16:12 [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED) Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes Nadav Amit
2021-09-27  9:08   ` Kirill A. Shutemov
2021-09-27 10:11     ` Nadav Amit
2021-09-27 11:55       ` Kirill A. Shutemov
2021-09-27 12:33         ` Nadav Amit [this message]
2021-09-27 12:45           ` Kirill A. Shutemov
2021-09-27 12:59             ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 2/8] mm/madvise: remove unnecessary check on madvise_dontneed_free() Nadav Amit
2021-09-27  9:11   ` Kirill A. Shutemov
2021-09-27 11:05     ` Nadav Amit
2021-09-27 12:19       ` Kirill A. Shutemov
2021-09-27 12:52         ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 3/8] mm/madvise: remove unnecessary checks on madvise_free_single_vma() Nadav Amit
2021-09-27  9:17   ` Kirill A. Shutemov
2021-09-27  9:24     ` Kirill A. Shutemov
2021-09-26 16:12 ` [RFC PATCH 4/8] mm/madvise: define madvise behavior in a struct Nadav Amit
2021-09-27  9:31   ` Kirill A. Shutemov
2021-09-27 10:31     ` Nadav Amit
2021-09-27 12:14       ` Kirill A. Shutemov
2021-09-27 20:36         ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 5/8] mm/madvise: perform certain operations once on process_madvise() Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 6/8] mm/madvise: more aggressive TLB batching Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 7/8] mm/madvise: deduplicate code in madvise_dontneed_free() Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 8/8] mm/madvise: process_madvise(MADV_DONTNEED) Nadav Amit
2021-09-27  9:24 ` [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED) David Hildenbrand
2021-09-27 10:41   ` Nadav Amit
2021-09-27 10:58     ` David Hildenbrand
2021-09-27 12:00       ` Nadav Amit
2021-09-27 12:16         ` Michal Hocko
2021-09-27 19:12           ` Nadav Amit
2021-09-29  7:52             ` Michal Hocko
2021-09-29 18:31               ` Nadav Amit
2021-10-12 23:14                 ` Peter Xu
2021-10-13 15:47                   ` Nadav Amit
2021-10-13 23:09                     ` Peter Xu
2021-09-27 17:05         ` David Hildenbrand
2021-09-27 19:59           ` Nadav Amit
2021-09-28  8:53             ` David Hildenbrand
2021-09-28 22:56               ` Nadav Amit
2021-10-04 17:58                 ` David Hildenbrand
2021-10-07 16:19                   ` Nadav Amit
2021-10-07 16:46                     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4211F6D4-A282-4AB4-8D96-E273C5ABE0DF@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=ccross@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=peterx@redhat.com \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=surenb@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.