All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: Minchan Kim <minchan@kernel.org>, Yu Zhao <yuzhao@google.com>
Cc: Mauricio Faria de Oliveira <mfo@canonical.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-block@vger.kernel.org,
	Huang Ying <ying.huang@intel.com>,
	Miaohe Lin <linmiaohe@huawei.com>, Yang Shi <shy828301@gmail.com>
Subject: Re: [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read
Date: Tue, 11 Jan 2022 11:29:36 -0800	[thread overview]
Message-ID: <e75c8f37-782f-f4d4-b197-8fda18090b42@nvidia.com> (raw)
In-Reply-To: <Yd3SUXVy7MbyBzFw@google.com>

On 1/11/22 10:54, Minchan Kim wrote:
...
> Hi Yu,
> 
> I think you're correct. I think we don't like memory barrier
> there in page_dup_rmap. Then, how about make gup_fast is aware
> of FOLL_TOUCH?
> 
> FOLL_TOUCH means it's going to write something so the page

Actually, my understanding of FOLL_TOUCH is that it does *not* mean that
data will be written to the page. That is what FOLL_WRITE is for.
FOLL_TOUCH means: update the "accessed" metadata, without actually
writing to the memory that the page represents.


> should be dirty. Currently, get_user_pages works like that.
> Howver, problem is get_user_pages_fast since it looks like
> that lockless_pages_from_mm doesn't support FOLL_TOUCH.
> 
> So the idea is if param in internal_get_user_pages_fast
> includes FOLL_TOUCH, gup_{pmd,pte}_range try to make the
> page dirty under trylock_page(If the lock fails, it goes

Marking a page dirty solely because FOLL_TOUCH is specified would
be an API-level mistake. That's why it isn't "supported". Or at least,
that's how I'm reading things.

Hope that helps!

> slow path with __gup_longterm_unlocked and set_dirty_pages
> for them).
> 
> This approach would solve other cases where map userspace
> pages into kernel space and then write. Since the write
> didn't go through with the process's page table, we will
> lose the dirty bit in the page table of the process and
> it turns out same problem. That's why I'd like to approach
> this.
> 
> If it doesn't work, the other option to fix this specific
> case is can't we make pages dirty in advance in DIO read-case?
> 
> When I look at DIO code, it's already doing in async case.
> Could't we do the same thing for the other cases?
> I guess the worst case we will see would be more page
> writeback since the page becomes dirty unnecessary.

Marking pages dirty after pinning them is a pre-existing area of
problems. See the long-running LWN articles about get_user_pages() [1].


[1] https://lwn.net/Kernel/Index/#Memory_management-get_user_pages

thanks,
-- 
John Hubbard
NVIDIA

  reply	other threads:[~2022-01-11 19:29 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-05 23:34 [PATCH v2] mm: fix race between MADV_FREE reclaim and blkdev direct IO read Mauricio Faria de Oliveira
2022-01-06 23:15 ` Minchan Kim
2022-01-07  0:11   ` Yang Shi
2022-01-07  1:08     ` Yang Shi
2022-01-11  1:34   ` Huang, Ying
2022-01-11  6:48 ` Yu Zhao
2022-01-11 18:54   ` Minchan Kim
2022-01-11 19:29     ` John Hubbard [this message]
2022-01-11 20:20       ` Minchan Kim
2022-01-11 20:21         ` Minchan Kim
2022-01-11 21:59           ` Minchan Kim
2022-01-11 23:38             ` John Hubbard
2022-01-12  0:01               ` Minchan Kim
2022-01-12  1:46   ` Huang, Ying
2022-01-12 17:33     ` Minchan Kim
2022-01-12 21:53       ` Mauricio Faria de Oliveira
2022-01-12 22:37         ` Minchan Kim
2022-01-13  8:54           ` Huang, Ying
2022-01-13 12:30             ` Huang, Ying
2022-01-13 14:54               ` Mauricio Faria de Oliveira
2022-01-13 14:30           ` Mauricio Faria de Oliveira
2022-01-13  7:29         ` Yu Zhao
2022-01-14  0:35           ` Minchan Kim
2022-01-31 23:10             ` Mauricio Faria de Oliveira
2022-01-13  5:47       ` Huang, Ying
2022-01-13  6:37         ` Miaohe Lin
2022-01-13  8:04           ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e75c8f37-782f-f4d4-b197-8fda18090b42@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mfo@canonical.com \
    --cc=minchan@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.