linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: Potential race in TLB flush batching?
Date: Wed, 19 Jul 2017 15:19:00 -0700	[thread overview]
Message-ID: <3D1386AD-7875-40B9-8C6F-DE02CF8A45A1@gmail.com> (raw)
In-Reply-To: <20170719214708.wuzq3di6rt43txtn@suse.de>

Mel Gorman <mgorman@suse.de> wrote:

> On Wed, Jul 19, 2017 at 01:20:01PM -0700, Nadav Amit wrote:
>>> From a PTE you cannot know the state of mmap_sem because you can rmap
>>> back to multiple mm's for shared mappings. It's also fairly heavy handed.
>>> Technically, you could lock on the basis of the VMA but that has other
>>> consequences for scalability. The staleness is also a factor because
>>> it's a case of "does the staleness matter". Sometimes it does, sometimes
>>> it doesn't.  mmap_sem even if it could be used does not always tell us
>>> the right information either because it can matter whether we are racing
>>> against a userspace reference or a kernel operation.
>>> 
>>> It's possible your idea could be made work, but right now I'm not seeing a
>>> solution that handles every corner case. I asked to hear what your ideas
>>> were because anything I thought of that could batch TLB flushing in the
>>> general case had flaws that did not improve over what is already there.
>> 
>> I don???t disagree with what you say - perhaps my scheme is too simplistic.
>> But the bottom line, if you cannot form simple rules for when TLB needs to
>> be flushed, what are the chances others would get it right?
> 
> Broad rule is "flush before the page is freed/reallocated for clean pages
> or any IO is initiated for dirty pages" with a lot of details that are not
> documented. Often it's the PTL and flush with it held that protects the
> majority of cases but it's not universal as the page lock and mmap_sem
> play important rules depending ont the context and AFAIK, that's also
> not documented.
> 
>>> shrink_page_list is the caller of try_to_unmap in reclaim context. It
>>> has this check
>>> 
>>>               if (!trylock_page(page))
>>>                       goto keep;
>>> 
>>> For pages it cannot lock, they get put back on the LRU and recycled instead
>>> of reclaimed. Hence, if KSM or anything else holds the page lock, reclaim
>>> can't unmap it.
>> 
>> Yes, of course, since KSM does not batch TLB flushes. I regarded the other
>> direction - first try_to_unmap() removes the PTE (but still does not flush),
>> unlocks the page, and then KSM acquires the page lock and calls
>> write_protect_page(). It finds out the PTE is not present and does not flush
>> the TLB.
> 
> When KSM acquires the page lock, it then acquires the PTL where the
> cleared PTE is observed directly and skipped.

I don’t see why. Let’s try again - CPU0 reclaims while CPU1 deduplicates:

CPU0				CPU1
----				----
shrink_page_list()

=> try_to_unmap()
==> try_to_unmap_one()
[ unmaps from some page-tables ]

[ try_to_unmap returns false;
  page not reclaimed ]

=> keep_locked: unlock_page()

[ TLB flush deferred ]
				try_to_merge_one_page()
				=> trylock_page()
				=> write_protect_page()
				==> acquire ptl
				  [ PTE non-present —> no PTE change
				    and no flush ]
				==> release ptl
				==> replace_page()


At this point, while replace_page() is running, CPU0 may still not have
flushed the TLBs. Another CPU (CPU2) may hold a stale PTE, which is not
write-protected. It can therefore write to that page while replace_page() is
running, resulting in memory corruption.

No?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-19 22:19 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11  0:52 Potential race in TLB flush batching? Nadav Amit
2017-07-11  6:41 ` Mel Gorman
2017-07-11  7:30   ` Nadav Amit
2017-07-11  9:29     ` Mel Gorman
2017-07-11 10:40       ` Nadav Amit
2017-07-11 13:20         ` Mel Gorman
2017-07-11 14:58           ` Andy Lutomirski
2017-07-11 15:53             ` Mel Gorman
2017-07-11 17:23               ` Andy Lutomirski
2017-07-11 19:18                 ` Mel Gorman
2017-07-11 20:06                   ` Nadav Amit
2017-07-11 21:09                     ` Mel Gorman
2017-07-11 20:09                   ` Mel Gorman
2017-07-11 21:52                     ` Mel Gorman
2017-07-11 22:27                       ` Nadav Amit
2017-07-11 22:34                         ` Nadav Amit
2017-07-12  8:27                         ` Mel Gorman
2017-07-12 23:27                           ` Nadav Amit
2017-07-12 23:36                             ` Andy Lutomirski
2017-07-12 23:42                               ` Nadav Amit
2017-07-13  5:38                                 ` Andy Lutomirski
2017-07-13 16:05                                   ` Nadav Amit
2017-07-13 16:06                                     ` Andy Lutomirski
2017-07-13  6:07                             ` Mel Gorman
2017-07-13 16:08                               ` Andy Lutomirski
2017-07-13 17:07                                 ` Mel Gorman
2017-07-13 17:15                                   ` Andy Lutomirski
2017-07-13 18:23                                     ` Mel Gorman
2017-07-14 23:16                               ` Nadav Amit
2017-07-15 15:55                                 ` Mel Gorman
2017-07-15 16:41                                   ` Andy Lutomirski
2017-07-17  7:49                                     ` Mel Gorman
2017-07-18 21:28                                   ` Nadav Amit
2017-07-19  7:41                                     ` Mel Gorman
2017-07-19 19:41                                       ` Nadav Amit
2017-07-19 19:58                                         ` Mel Gorman
2017-07-19 20:20                                           ` Nadav Amit
2017-07-19 21:47                                             ` Mel Gorman
2017-07-19 22:19                                               ` Nadav Amit [this message]
2017-07-19 22:59                                                 ` Mel Gorman
2017-07-19 23:39                                                   ` Nadav Amit
2017-07-20  7:43                                                     ` Mel Gorman
2017-07-22  1:19                                                       ` Nadav Amit
2017-07-24  9:58                                                         ` Mel Gorman
2017-07-24 19:46                                                           ` Nadav Amit
2017-07-25  7:37                                                           ` Minchan Kim
2017-07-25  8:51                                                             ` Mel Gorman
2017-07-25  9:11                                                               ` Minchan Kim
2017-07-25 10:10                                                                 ` Mel Gorman
2017-07-26  5:43                                                                   ` Minchan Kim
2017-07-26  9:22                                                                     ` Mel Gorman
2017-07-26 19:18                                                                       ` Nadav Amit
2017-07-26 23:40                                                                         ` Minchan Kim
2017-07-27  0:09                                                                           ` Nadav Amit
2017-07-27  0:34                                                                             ` Minchan Kim
2017-07-27  0:48                                                                               ` Nadav Amit
2017-07-27  1:13                                                                                 ` Nadav Amit
2017-07-27  7:04                                                                                   ` Minchan Kim
2017-07-27  7:21                                                                                     ` Mel Gorman
2017-07-27 16:04                                                                                       ` Nadav Amit
2017-07-27 17:36                                                                                         ` Mel Gorman
2017-07-26 23:44                                                                       ` Minchan Kim
2017-07-11 22:07                   ` Andy Lutomirski
2017-07-11 22:33                     ` Mel Gorman
2017-07-14  7:00                     ` Benjamin Herrenschmidt
2017-07-14  8:31                       ` Mel Gorman
2017-07-14  9:02                         ` Benjamin Herrenschmidt
2017-07-14  9:27                           ` Mel Gorman
2017-07-14 22:21                             ` Andy Lutomirski
2017-07-11 16:22           ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D1386AD-7875-40B9-8C6F-DE02CF8A45A1@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).