From: "Kirill A. Shutemov" <kirill@shutemov.name>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Linux-MM <linux-mm@kvack.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
Peter Xu <peterx@redhat.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Minchan Kim <minchan@kernel.org>, Colin Cross <ccross@google.com>,
Suren Baghdasarya <surenb@google.com>,
Mike Rapoport <rppt@linux.vnet.ibm.com>
Subject: Re: [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes
Date: Mon, 27 Sep 2021 15:45:18 +0300 [thread overview]
Message-ID: <20210927124518.gjas4itro5c3park@box.shutemov.name> (raw)
In-Reply-To: <4211F6D4-A282-4AB4-8D96-E273C5ABE0DF@gmail.com>
On Mon, Sep 27, 2021 at 05:33:39AM -0700, Nadav Amit wrote:
>
>
> > On Sep 27, 2021, at 4:55 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >
> > On Mon, Sep 27, 2021 at 03:11:20AM -0700, Nadav Amit wrote:
> >>
> >>> On Sep 27, 2021, at 2:08 AM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> >>>
> >>> On Sun, Sep 26, 2021 at 09:12:52AM -0700, Nadav Amit wrote:
> >>>> From: Nadav Amit <namit@vmware.com>
> >>>>
> >>>> The comment in madvise_dontneed_free() says that vma splits that occur
> >>>> while the mmap-lock is dropped, during userfaultfd_remove(), should be
> >>>> handled correctly, but nothing in the code indicates that it is so: prev
> >>>> is invalidated, and do_madvise() will therefore continue to update VMAs
> >>>> from the "obsolete" end (i.e., the one before the split).
> >>>>
> >>>> Propagate the changes to end from madvise_dontneed_free() back to
> >>>> do_madvise() and continue the updates from the new end accordingly.
> >>>
> >>> Could you describe in details a race that would lead to wrong behaviour?
> >>
> >> Thanks for the quick response.
> >>
> >> For instance, madvise(MADV_DONTNEED) can race with mprotect() and cause
> >> the VMA to split.
> >>
> >> Something like:
> >>
> >> CPU0 CPU1
> >> ---- ----
> >> madvise(0x10000, 0x2000, MADV_DONTNEED)
> >> -> userfaultfd_remove()
> >> [ mmap-lock dropped ]
> >> mprotect(0x11000, 0x1000, PROT_READ)
> >> [splitting the VMA]
> >>
> >> read(uffd)
> >> [unblocking userfaultfd_remove()]
> >>
> >> [ resuming ]
> >> end = vma->vm_end
> >> [end == 0x11000]
> >>
> >> madvise_dontneed_single_vma(vma, 0x10000, 0x11000)
> >>
> >> Following this operation, 0x11000-0x12000 would not be zapped.
> >
> > Okay, fair enough.
> >
> > Wouldn't something like this work too:
> >
> > diff --git a/mm/madvise.c b/mm/madvise.c
> > index 0734db8d53a7..0898120c5c04 100644
> > --- a/mm/madvise.c
> > +++ b/mm/madvise.c
> > @@ -796,6 +796,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma,
> > */
> > return -ENOMEM;
> > }
> > + *prev = vma;
> > if (!can_madv_lru_vma(vma))
> > return -EINVAL;
> > if (end > vma->vm_end) {
>
> Admittedly (embarrassingly?) I didn’t even consider it since all the
> comments say that once the lock is dropped prev should be invalidated.
>
> Let’s see, considering the aforementioned scenario and that there is
> initially one VMA between 0x10000-0x12000.
>
> Looking at the code from do_madvise()):
>
> [ end == 0x12000 ]
>
> tmp = vma->vm_end;
>
> [ tmp == 0x12000 ]
>
> if (end < tmp)
> tmp = end;
>
> /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
>
> error = madvise_vma(vma, &prev, start, tmp, behavior);
>
> [ prev->vm_end == 0x11000 after the split]
>
> if (error)
> goto out;
> start = tmp;
>
> [ start == 0x12000 ]
>
> if (prev && start < prev->vm_end)
> start = prev->vm_end;
>
> [ The condition (start < prev->vm_end) is false, start not updated ]
>
> error = unmapped_error;
> if (start >= end)
> goto out;
>
> [ start >= end; so we end without updating the second part of the split ]
>
> So it does not work.
>
> Perhaps adding this one on top of yours? I can test it when I wake up.
> It is cleaner, but I am not sure if I am missing something.
It should work.
BTW, shouldn't we bring madvise_willneed() and madvise_remove() to the
same scheme?
--
Kirill A. Shutemov
next prev parent reply other threads:[~2021-09-27 12:45 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-26 16:12 [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED) Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 1/8] mm/madvise: propagate vma->vm_end changes Nadav Amit
2021-09-27 9:08 ` Kirill A. Shutemov
2021-09-27 10:11 ` Nadav Amit
2021-09-27 11:55 ` Kirill A. Shutemov
2021-09-27 12:33 ` Nadav Amit
2021-09-27 12:45 ` Kirill A. Shutemov [this message]
2021-09-27 12:59 ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 2/8] mm/madvise: remove unnecessary check on madvise_dontneed_free() Nadav Amit
2021-09-27 9:11 ` Kirill A. Shutemov
2021-09-27 11:05 ` Nadav Amit
2021-09-27 12:19 ` Kirill A. Shutemov
2021-09-27 12:52 ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 3/8] mm/madvise: remove unnecessary checks on madvise_free_single_vma() Nadav Amit
2021-09-27 9:17 ` Kirill A. Shutemov
2021-09-27 9:24 ` Kirill A. Shutemov
2021-09-26 16:12 ` [RFC PATCH 4/8] mm/madvise: define madvise behavior in a struct Nadav Amit
2021-09-27 9:31 ` Kirill A. Shutemov
2021-09-27 10:31 ` Nadav Amit
2021-09-27 12:14 ` Kirill A. Shutemov
2021-09-27 20:36 ` Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 5/8] mm/madvise: perform certain operations once on process_madvise() Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 6/8] mm/madvise: more aggressive TLB batching Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 7/8] mm/madvise: deduplicate code in madvise_dontneed_free() Nadav Amit
2021-09-26 16:12 ` [RFC PATCH 8/8] mm/madvise: process_madvise(MADV_DONTNEED) Nadav Amit
2021-09-27 9:24 ` [RFC PATCH 0/8] mm/madvise: support process_madvise(MADV_DONTNEED) David Hildenbrand
2021-09-27 10:41 ` Nadav Amit
2021-09-27 10:58 ` David Hildenbrand
2021-09-27 12:00 ` Nadav Amit
2021-09-27 12:16 ` Michal Hocko
2021-09-27 19:12 ` Nadav Amit
2021-09-29 7:52 ` Michal Hocko
2021-09-29 18:31 ` Nadav Amit
2021-10-12 23:14 ` Peter Xu
2021-10-13 15:47 ` Nadav Amit
2021-10-13 23:09 ` Peter Xu
2021-09-27 17:05 ` David Hildenbrand
2021-09-27 19:59 ` Nadav Amit
2021-09-28 8:53 ` David Hildenbrand
2021-09-28 22:56 ` Nadav Amit
2021-10-04 17:58 ` David Hildenbrand
2021-10-07 16:19 ` Nadav Amit
2021-10-07 16:46 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210927124518.gjas4itro5c3park@box.shutemov.name \
--to=kirill@shutemov.name \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=ccross@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=nadav.amit@gmail.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=surenb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).