All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: Potential race in TLB flush batching?
Date: Wed, 19 Jul 2017 15:19:00 -0700	[thread overview]
Message-ID: <3D1386AD-7875-40B9-8C6F-DE02CF8A45A1@gmail.com> (raw)
In-Reply-To: <20170719214708.wuzq3di6rt43txtn@suse.de>

Mel Gorman <mgorman@suse.de> wrote:

> On Wed, Jul 19, 2017 at 01:20:01PM -0700, Nadav Amit wrote:
>>> From a PTE you cannot know the state of mmap_sem because you can rmap
>>> back to multiple mm's for shared mappings. It's also fairly heavy handed.
>>> Technically, you could lock on the basis of the VMA but that has other
>>> consequences for scalability. The staleness is also a factor because
>>> it's a case of "does the staleness matter". Sometimes it does, sometimes
>>> it doesn't.  mmap_sem even if it could be used does not always tell us
>>> the right information either because it can matter whether we are racing
>>> against a userspace reference or a kernel operation.
>>> 
>>> It's possible your idea could be made work, but right now I'm not seeing a
>>> solution that handles every corner case. I asked to hear what your ideas
>>> were because anything I thought of that could batch TLB flushing in the
>>> general case had flaws that did not improve over what is already there.
>> 
>> I don???t disagree with what you say - perhaps my scheme is too simplistic.
>> But the bottom line, if you cannot form simple rules for when TLB needs to
>> be flushed, what are the chances others would get it right?
> 
> Broad rule is "flush before the page is freed/reallocated for clean pages
> or any IO is initiated for dirty pages" with a lot of details that are not
> documented. Often it's the PTL and flush with it held that protects the
> majority of cases but it's not universal as the page lock and mmap_sem
> play important rules depending ont the context and AFAIK, that's also
> not documented.
> 
>>> shrink_page_list is the caller of try_to_unmap in reclaim context. It
>>> has this check
>>> 
>>>               if (!trylock_page(page))
>>>                       goto keep;
>>> 
>>> For pages it cannot lock, they get put back on the LRU and recycled instead
>>> of reclaimed. Hence, if KSM or anything else holds the page lock, reclaim
>>> can't unmap it.
>> 
>> Yes, of course, since KSM does not batch TLB flushes. I regarded the other
>> direction - first try_to_unmap() removes the PTE (but still does not flush),
>> unlocks the page, and then KSM acquires the page lock and calls
>> write_protect_page(). It finds out the PTE is not present and does not flush
>> the TLB.
> 
> When KSM acquires the page lock, it then acquires the PTL where the
> cleared PTE is observed directly and skipped.

I don’t see why. Let’s try again - CPU0 reclaims while CPU1 deduplicates:

CPU0				CPU1
----				----
shrink_page_list()

=> try_to_unmap()
==> try_to_unmap_one()
[ unmaps from some page-tables ]

[ try_to_unmap returns false;
  page not reclaimed ]

=> keep_locked: unlock_page()

[ TLB flush deferred ]
				try_to_merge_one_page()
				=> trylock_page()
				=> write_protect_page()
				==> acquire ptl
				  [ PTE non-present —> no PTE change
				    and no flush ]
				==> release ptl
				==> replace_page()


At this point, while replace_page() is running, CPU0 may still not have
flushed the TLBs. Another CPU (CPU2) may hold a stale PTE, which is not
write-protected. It can therefore write to that page while replace_page() is
running, resulting in memory corruption.

No?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-19 22:19 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11  0:52 Potential race in TLB flush batching? Nadav Amit
2017-07-11  6:41 ` Mel Gorman
2017-07-11  7:30   ` Nadav Amit
2017-07-11  9:29     ` Mel Gorman
2017-07-11 10:40       ` Nadav Amit
2017-07-11 13:20         ` Mel Gorman
2017-07-11 14:58           ` Andy Lutomirski
2017-07-11 15:53             ` Mel Gorman
2017-07-11 17:23               ` Andy Lutomirski
2017-07-11 19:18                 ` Mel Gorman
2017-07-11 20:06                   ` Nadav Amit
2017-07-11 21:09                     ` Mel Gorman
2017-07-11 20:09                   ` Mel Gorman
2017-07-11 21:52                     ` Mel Gorman
2017-07-11 22:27                       ` Nadav Amit
2017-07-11 22:34                         ` Nadav Amit
2017-07-12  8:27                         ` Mel Gorman
2017-07-12 23:27                           ` Nadav Amit
2017-07-12 23:36                             ` Andy Lutomirski
2017-07-12 23:42                               ` Nadav Amit
2017-07-13  5:38                                 ` Andy Lutomirski
2017-07-13 16:05                                   ` Nadav Amit
2017-07-13 16:06                                     ` Andy Lutomirski
2017-07-13  6:07                             ` Mel Gorman
2017-07-13 16:08                               ` Andy Lutomirski
2017-07-13 17:07                                 ` Mel Gorman
2017-07-13 17:15                                   ` Andy Lutomirski
2017-07-13 18:23                                     ` Mel Gorman
2017-07-14 23:16                               ` Nadav Amit
2017-07-15 15:55                                 ` Mel Gorman
2017-07-15 16:41                                   ` Andy Lutomirski
2017-07-17  7:49                                     ` Mel Gorman
2017-07-18 21:28                                   ` Nadav Amit
2017-07-19  7:41                                     ` Mel Gorman
2017-07-19 19:41                                       ` Nadav Amit
2017-07-19 19:58                                         ` Mel Gorman
2017-07-19 20:20                                           ` Nadav Amit
2017-07-19 21:47                                             ` Mel Gorman
2017-07-19 22:19                                               ` Nadav Amit [this message]
2017-07-19 22:59                                                 ` Mel Gorman
2017-07-19 23:39                                                   ` Nadav Amit
2017-07-20  7:43                                                     ` Mel Gorman
2017-07-22  1:19                                                       ` Nadav Amit
2017-07-24  9:58                                                         ` Mel Gorman
2017-07-24 19:46                                                           ` Nadav Amit
2017-07-25  7:37                                                           ` Minchan Kim
2017-07-25  8:51                                                             ` Mel Gorman
2017-07-25  9:11                                                               ` Minchan Kim
2017-07-25 10:10                                                                 ` Mel Gorman
2017-07-26  5:43                                                                   ` Minchan Kim
2017-07-26  9:22                                                                     ` Mel Gorman
2017-07-26 19:18                                                                       ` Nadav Amit
2017-07-26 23:40                                                                         ` Minchan Kim
2017-07-27  0:09                                                                           ` Nadav Amit
2017-07-27  0:34                                                                             ` Minchan Kim
2017-07-27  0:48                                                                               ` Nadav Amit
2017-07-27  1:13                                                                                 ` Nadav Amit
2017-07-27  7:04                                                                                   ` Minchan Kim
2017-07-27  7:21                                                                                     ` Mel Gorman
2017-07-27 16:04                                                                                       ` Nadav Amit
2017-07-27 17:36                                                                                         ` Mel Gorman
2017-07-26 23:44                                                                       ` Minchan Kim
2017-07-11 22:07                   ` Andy Lutomirski
2017-07-11 22:33                     ` Mel Gorman
2017-07-14  7:00                     ` Benjamin Herrenschmidt
2017-07-14  8:31                       ` Mel Gorman
2017-07-14  9:02                         ` Benjamin Herrenschmidt
2017-07-14  9:27                           ` Mel Gorman
2017-07-14 22:21                             ` Andy Lutomirski
2017-07-11 16:22           ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3D1386AD-7875-40B9-8C6F-DE02CF8A45A1@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.