From: Mauricio Faria de Oliveira <mfo@canonical.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Minchan Kim <minchan@kernel.org>, Linux MM <linux-mm@kvack.org>,
linux-block@vger.kernel.org, Huang Ying <ying.huang@intel.com>,
Miaohe Lin <linmiaohe@huawei.com>
Subject: Re: [PATCH] mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Date: Tue, 4 Jan 2022 08:57:32 -0300 [thread overview]
Message-ID: <CAO9xwp1zkGRdn1BKoE=Np6BvOQ-G5bzr5URnp2t9_a2PyynYSQ@mail.gmail.com> (raw)
In-Reply-To: <CAHbLzkoZXHQ2WuuQGafuo0YV_KOML91g2ZkDjyzw_J7E40yVsA@mail.gmail.com>
On Fri, Dec 17, 2021 at 3:51 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Fri, Dec 10, 2021 at 6:22 PM Mauricio Faria de Oliveira
> <mfo@canonical.com> wrote:
...
> > MADV_FREE'd buffers:
> > ===================
> >
> > So, back to the "if MADV_FREE pages are used as buffers" note.
> > The case is arguable, and subject to multiple interpretations.
> >
> > The madvise(2) manual page on the MADV_FREE advice value says:
> > - 'After a successful MADV_FREE ... data will be lost when
> > the kernel frees the pages.'
> > - 'the free operation will be canceled if the caller writes
> > into the page' / 'subsequent writes ... will succeed and
> > then [the] kernel cannot free those dirtied pages'
> > - 'If there is no subsequent write, the kernel can free the
> > pages at any time.'
> >
> > Thoughts, questions, considerations...
> > - Since the kernel didn't actually free the page (page_ref_freeze()
> > failed), should the data not have been lost? (on userspace read.)
> > - Should writes performed by the direct IO read be able to cancel
> > the free operation?
> > - Should the direct IO read be considered as 'the caller' too,
> > as it's been requested by 'the caller'?
> > - Should the bio technique to dirty pages on return to userspace
> > (bio_check_pages_dirty() is called/used by __blkdev_direct_IO())
> > be considered in another/special way here?
> > - Should an upcoming write from a previously requested direct IO
> > read be considered as a subsequent write, so the kernel should
> > not free the pages? (as it's known at the time of page reclaim.)
> >
> > Technically, the last point would seem a reasonable consideration
> > and balance, as the madvise(2) manual page apparently (and fairly)
> > seem to assume that 'writes' are memory access from the userspace
> > process (not explicitly considering writes from the kernel or its
> > corner cases; again, fairly).. plus the kernel fix implementation
> > for the corner case of the largely 'non-atomic write' encompassed
> > by a direct IO read operation, is relatively simple; and it helps.
...
> IIUC, you are expecting to get the old data after MADV_FREE? TBH, you
> should not expect so at all after MADV_FREE since those pages may get
> freed at any time.
Hey, thanks for checking this.
Correct; the discussion behind this is covered in the text above. It's indeed
arguable, but the fix makes the behavior more consistent for the case of a
direct IO read (rather than potentially returning zero-pages a bit randomly.)
cheers,
--
Mauricio Faria de Oliveira
next prev parent reply other threads:[~2022-01-04 11:57 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-11 2:21 [PATCH] mm: fix race between MADV_FREE reclaim and blkdev direct IO read Mauricio Faria de Oliveira
2021-12-16 18:17 ` Minchan Kim
2021-12-17 2:10 ` Huang, Ying
2022-01-04 11:49 ` Mauricio Faria de Oliveira
2022-01-04 11:46 ` Mauricio Faria de Oliveira
2022-01-04 23:06 ` Minchan Kim
2022-01-04 23:32 ` Mauricio Faria de Oliveira
2021-12-17 18:51 ` Yang Shi
2022-01-04 11:57 ` Mauricio Faria de Oliveira [this message]
2022-01-05 0:32 ` Huang, Ying
2022-01-05 1:20 ` Yang Shi
2022-01-05 1:42 ` Huang, Ying
2022-01-05 2:16 ` Yang Shi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAO9xwp1zkGRdn1BKoE=Np6BvOQ-G5bzr5URnp2t9_a2PyynYSQ@mail.gmail.com' \
--to=mfo@canonical.com \
--cc=akpm@linux-foundation.org \
--cc=linmiaohe@huawei.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=shy828301@gmail.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).