All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yosry Ahmed <yosryahmed@google.com>
To: Yu Zhao <yuzhao@google.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	akpm@linux-foundation.org, willy@infradead.org, david@redhat.com,
	ryan.roberts@arm.com, shy828301@gmail.com,
	Yin Fengwei <fengwei.yin@intel.com>,
	Hugh Dickins <hughd@google.com>
Subject: Re: [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio
Date: Tue, 18 Jul 2023 18:52:09 -0700	[thread overview]
Message-ID: <CAJD7tkZWXdHwpW5AeKqmn6TVCXm1wmKr-2RN2baRJ7c4ciTJng@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkZtHku-kaK02MAdgaxNzr9hQkPty=cw44R_9HdTS+Pd5w@mail.gmail.com>

On Tue, Jul 18, 2023 at 6:32 PM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Tue, Jul 18, 2023 at 4:47 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
> >
> >
> >
> > On 7/19/23 06:48, Yosry Ahmed wrote:
> > > On Sun, Jul 16, 2023 at 6:58 PM Yin Fengwei <fengwei.yin@intel.com> wrote:
> > >>
> > >>
> > >>
> > >> On 7/17/23 08:35, Yu Zhao wrote:
> > >>> On Sun, Jul 16, 2023 at 6:00 PM Yin, Fengwei <fengwei.yin@intel.com> wrote:
> > >>>>
> > >>>> On 7/15/2023 2:06 PM, Yu Zhao wrote:
> > >>>>> There is a problem here that I didn't have the time to elaborate: we
> > >>>>> can't mlock() a folio that is within the range but not fully mapped
> > >>>>> because this folio can be on the deferred split queue. When the split
> > >>>>> happens, those unmapped folios (not mapped by this vma but are mapped
> > >>>>> into other vmas) will be stranded on the unevictable lru.
> > >>>>
> > >>>> This should be fine unless I missed something. During large folio split,
> > >>>> the unmap_folio() will be migrate(anon)/unmap(file) folio. Folio will be
> > >>>> munlocked in unmap_folio(). So the head/tail pages will be evictable always.
> > >>>
> > >>> It's close but not entirely accurate: munlock can fail on isolated folios.
> > >> Yes. The munlock just clear PG_mlocked bit but with PG_unevictable left.
> > >>
> > >> Could this also happen against normal 4K page? I mean when user try to munlock
> > >> a normal 4K page and this 4K page is isolated. So it become unevictable page?
> > >
> > > Looks like it can be possible. If cpu 1 is in __munlock_folio() and
> > > cpu 2 is isolating the folio for any purpose:
> > >
> > > cpu1                                        cpu2
> > >                                                 isolate folio
> > > folio_test_clear_lru() // 0
> > >                                                 putback folio // add
> > > to unevictable list
> > > folio_test_clear_mlocked()
> > Yes. Yu showed this sequence to me in another email. I thought the putback_lru()
> > could correct the none-mlocked but unevictable folio. But it doesn't because
> > of this race.
>
> (+Hugh Dickins for vis)
>
> Yu, I am not familiar with the split_folio() case, so I am not sure it
> is the same exact race I stated above.
>
> Can you confirm whether or not doing folio_test_clear_mlocked() before
> folio_test_clear_lru() would fix the race you are referring to? IIUC,
> in this case, we make sure we clear PG_mlocked before we try to to
> clear PG_lru. If we fail to clear it, then someone else have the folio
> isolated after we clear PG_mlocked, so we can be sure that when they
> put the folio back it will be correctly made evictable.
>
> Is my understanding correct?

Hmm, actually this might not be enough. In folio_add_lru() we will
call folio_batch_add_and_move(), which calls lru_add_fn() and *then*
sets PG_lru. Since we check folio_evictable() in lru_add_fn(), the
race can still happen:


cpu1                              cpu2
                                      folio_evictable() //false
folio_test_clear_mlocked()
folio_test_clear_lru() //false
                                      folio_set_lru()

Relying on PG_lru for synchronization might not be enough with the
current code. We might need to revert 2262ace60713 ("mm/munlock:
delete smp_mb() from __pagevec_lru_add_fn()").

Sorry for going back and forth here, I am thinking out loud.

>
> If yes, I can add this fix to my next version of the RFC series to
> rework mlock_count. It would be a lot more complicated with the
> current implementation (as I stated in a previous email).
>
> >
> > >
> > >
> > > The page would be stranded on the unevictable list in this case, no?
> > > Maybe we should only try to isolate the page (clear PG_lru) after we
> > > possibly clear PG_mlocked? In this case if we fail to isolate we know
> > > for sure that whoever has the page isolated will observe that
> > > PG_mlocked is clear and correctly make the page evictable.
> > >
> > > This probably would be complicated with the current implementation, as
> > > we first need to decrement mlock_count to determine if we want to
> > > clear PG_mlocked, and to do so we need to isolate the page as
> > > mlock_count overlays page->lru. With the proposal in [1] to rework
> > > mlock_count, it might be much simpler as far as I can tell. I intend
> > > to refresh this proposal soon-ish.
> > >
> > > [1]https://lore.kernel.org/lkml/20230618065719.1363271-1-yosryahmed@google.com/
> > >
> > >>
> > >>
> > >> Regards
> > >> Yin, Fengwei
> > >>

  reply	other threads:[~2023-07-19  1:52 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-12  6:01 [RFC PATCH v2 0/3] support large folio for mlock Yin Fengwei
2023-07-12  6:01 ` [RFC PATCH v2 1/3] mm: add functions folio_in_range() and folio_within_vma() Yin Fengwei
2023-07-12  6:11   ` Yu Zhao
2023-07-12  6:01 ` [RFC PATCH v2 2/3] mm: handle large folio when large folio in VM_LOCKED VMA range Yin Fengwei
2023-07-12  6:23   ` Yu Zhao
2023-07-12  6:43     ` Yin Fengwei
2023-07-12 17:03       ` Yu Zhao
2023-07-13  1:55         ` Yin Fengwei
2023-07-14  2:21       ` Hugh Dickins
2023-07-14  2:49         ` Yin, Fengwei
2023-07-14  3:41           ` Hugh Dickins
2023-07-14  5:45             ` Yin, Fengwei
2023-07-12  6:01 ` [RFC PATCH v2 3/3] mm: mlock: update mlock_pte_range to handle large folio Yin Fengwei
2023-07-12  6:31   ` Yu Zhao
2023-07-15  6:06     ` Yu Zhao
2023-07-16 23:59       ` Yin, Fengwei
2023-07-17  0:35         ` Yu Zhao
2023-07-17  1:58           ` Yin Fengwei
2023-07-18 22:48             ` Yosry Ahmed
2023-07-18 23:47               ` Yin Fengwei
2023-07-19  1:32                 ` Yosry Ahmed
2023-07-19  1:52                   ` Yosry Ahmed [this message]
2023-07-19  1:57                     ` Yin Fengwei
2023-07-19  2:00                       ` Yosry Ahmed
2023-07-19  2:09                         ` Yin Fengwei
2023-07-19  2:22                           ` Yosry Ahmed
2023-07-19  2:28                             ` Yin Fengwei
2023-07-19 14:26                               ` Hugh Dickins
2023-07-19 15:44                                 ` Yosry Ahmed
2023-07-20 12:02                                   ` Yin, Fengwei
2023-07-20 20:51                                     ` Yosry Ahmed
2023-07-21  1:12                                       ` Yin, Fengwei
2023-07-21  1:35                                         ` Yosry Ahmed
2023-07-21  3:18                                           ` Yin, Fengwei
2023-07-21  3:39                                             ` Yosry Ahmed
2023-07-20  1:52                                 ` Yin, Fengwei
2023-07-17  8:12           ` Yin Fengwei
2023-07-18  2:06             ` Yin Fengwei
2023-07-18  3:59               ` Yu Zhao
2023-07-26 12:49       ` Yin Fengwei
2023-07-26 16:57         ` Yu Zhao
2023-07-27  0:15           ` Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJD7tkZWXdHwpW5AeKqmn6TVCXm1wmKr-2RN2baRJ7c4ciTJng@mail.gmail.com \
    --to=yosryahmed@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=fengwei.yin@intel.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.