From: Mauricio Faria de Oliveira <mfo@canonical.com>
To: Minchan Kim <minchan@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>,
linux-mm@kvack.org, linux-block@vger.kernel.org,
Huang Ying <ying.huang@intel.com>,
Miaohe Lin <linmiaohe@huawei.com>
Subject: Re: [PATCH] mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Date: Tue, 4 Jan 2022 20:32:36 -0300 [thread overview]
Message-ID: <CAO9xwp3aiXts8-zjJJB=PWFPUyTMe4UqdQgvwCKqWfi9py1FEw@mail.gmail.com> (raw)
In-Reply-To: <YdTTAjfse0Zwl7eB@google.com>
On Tue, Jan 4, 2022 at 8:06 PM Minchan Kim <minchan@kernel.org> wrote:
>
> On Tue, Jan 04, 2022 at 08:46:17AM -0300, Mauricio Faria de Oliveira wrote:
> > On Thu, Dec 16, 2021 at 3:17 PM Minchan Kim <minchan@kernel.org> wrote:
> > ...
> > > Hi Mauricio,
> > >
> > > Thanks for catching the bug. There is some comment before I would
> > > look the problem in more detail. Please see below.
> > >
> >
> > Hey! Thanks for looking into this. Sorry for the delay; I've been out
> > a few weeks.
> >
> > > > diff --git a/mm/rmap.c b/mm/rmap.c
> > > > index 163ac4e6bcee..f04151aae03b 100644
> > > > --- a/mm/rmap.c
> > > > +++ b/mm/rmap.c
> > > > @@ -1570,7 +1570,18 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
> > > >
> > > > /* MADV_FREE page check */
> > > > if (!PageSwapBacked(page)) {
> > > > - if (!PageDirty(page)) {
> > > > + int refcount = page_ref_count(page);
> > > > +
> > > > + /*
> > > > + * The only page refs must be from the isolation
> > > > + * (checked by the caller shrink_page_list() too)
> > > > + * and the (single) rmap (dropped by discard:).
> > > > + *
> > > > + * Check the reference count before dirty flag
> > > > + * with memory barrier; see __remove_mapping().
> > > > + */
> > > > + smp_rmb();
> > > > + if (refcount == 2 && !PageDirty(page)) {
> > >
> > > A madv_free marked page could be mapped at several processes so
> > > it wouldn't be refcount two all the time, I think.
> > > Shouldn't we check it with page_mapcount with page_refcount?
> > >
> > > page_ref_count(page) - 1 > page_mapcount(page)
> > >
> >
> > It's the other way around, isn't it? The madvise(MADV_FREE) call only clears
> > the page dirty flag if page_mapcount() == 1 (ie not mapped by more processes).
> >
> > @ madvise_free_pte_range()
> >
> > /*
> > * If page is shared with others, we couldn't clear
> > * PG_dirty of the page.
> > */
> > if (page_mapcount(page) != 1) {
> > unlock_page(page);
> > continue;
> > }
> > ...
> > ClearPageDirty(page);
> > unlock_page(page);
> >
> >
> > If that's right, the refcount of 2 should be OK (one from the isolation,
> > another one from the single map/one process.)
> >
> > Does that make sense? I might be missing something.
>
> A child process could be forked from parent after madvise so the page
> mapcount of the hinted page could have elevated refcount.
Thanks for clarifying. I was indeed missing something. :)
I revisited your commit 854e9ed09ded ("mm: support madvise(MADV_FREE)"),
and now realized that the restriction for map count == 1 is only for _cleaning_
the dirty flag, not for discarding the (clean) page mapping later on.
"""
To solve the problem, this patch clears PG_dirty if only the page is
owned exclusively by current process when madvise is called because
PG_dirty represents ptes's dirtiness in several processes so we could
clear it only if we own it exclusively.
"""
I'll look at this tomorrow and submit a v2.
Thanks!
--
Mauricio Faria de Oliveira
next prev parent reply other threads:[~2022-01-04 23:32 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-11 2:21 [PATCH] mm: fix race between MADV_FREE reclaim and blkdev direct IO read Mauricio Faria de Oliveira
2021-12-16 18:17 ` Minchan Kim
2021-12-17 2:10 ` Huang, Ying
2022-01-04 11:49 ` Mauricio Faria de Oliveira
2022-01-04 11:46 ` Mauricio Faria de Oliveira
2022-01-04 23:06 ` Minchan Kim
2022-01-04 23:32 ` Mauricio Faria de Oliveira [this message]
2021-12-17 18:51 ` Yang Shi
2022-01-04 11:57 ` Mauricio Faria de Oliveira
2022-01-05 0:32 ` Huang, Ying
2022-01-05 1:20 ` Yang Shi
2022-01-05 1:42 ` Huang, Ying
2022-01-05 2:16 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAO9xwp3aiXts8-zjJJB=PWFPUyTMe4UqdQgvwCKqWfi9py1FEw@mail.gmail.com' \
--to=mfo@canonical.com \
--cc=akpm@linux-foundation.org \
--cc=linmiaohe@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).