linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: Potential race in TLB flush batching?
Date: Tue, 11 Jul 2017 00:30:28 -0700	[thread overview]
Message-ID: <D810A11D-1827-48C7-BA74-C1A6DCD80862@gmail.com> (raw)
In-Reply-To: <20170711064149.bg63nvi54ycynxw4@suse.de>

Mel Gorman <mgorman@suse.de> wrote:

> On Mon, Jul 10, 2017 at 05:52:25PM -0700, Nadav Amit wrote:
>> Something bothers me about the TLB flushes batching mechanism that Linux
>> uses on x86 and I would appreciate your opinion regarding it.
>> 
>> As you know, try_to_unmap_one() can batch TLB invalidations. While doing so,
>> however, the page-table lock(s) are not held, and I see no indication of the
>> pending flush saved (and regarded) in the relevant mm-structs.
>> 
>> So, my question: what prevents, at least in theory, the following scenario:
>> 
>> 	CPU0 				CPU1
>> 	----				----
>> 					user accesses memory using RW PTE 
>> 					[PTE now cached in TLB]
>> 	try_to_unmap_one()
>> 	==> ptep_get_and_clear()
>> 	==> set_tlb_ubc_flush_pending()
>> 					mprotect(addr, PROT_READ)
>> 					==> change_pte_range()
>> 					==> [ PTE non-present - no flush ]
>> 
>> 					user writes using cached RW PTE
>> 	...
>> 
>> 	try_to_unmap_flush()
>> 
>> 
>> As you see CPU1 write should have failed, but may succeed. 
>> 
>> Now I don???t have a PoC since in practice it seems hard to create such a
>> scenario: try_to_unmap_one() is likely to find the PTE accessed and the PTE
>> would not be reclaimed.
> 
> That is the same to a race whereby there is no batching mechanism and the
> racing operation happens between a pte clear and a flush as ptep_clear_flush
> is not atomic. All that differs is that the race window is a different size.
> The application on CPU1 is buggy in that it may or may not succeed the write
> but it is buggy regardless of whether a batching mechanism is used or not.

Thanks for your quick and detailed response, but I fail to see how it can
happen without batching. Indeed, the PTE clear and flush are not “atomic”,
but without batching they are both performed under the page table lock
(which is acquired in page_vma_mapped_walk and released in
page_vma_mapped_walk_done). Since the lock is taken, other cores should not
be able to inspect/modify the PTE. Relevant functions, e.g., zap_pte_range
and change_pte_range, acquire the lock before accessing the PTEs.

Can you please explain why you consider the application to be buggy? AFAIU
an application can wish to trap certain memory accesses using userfaultfd or
SIGSEGV. For example, it may do it for garbage collection or sandboxing. To
do so, it can use mprotect with PROT_NONE and expect to be able to trap
future accesses to that memory. This use-case is described in usefaultfd
documentation.

> The user accessed the PTE before the mprotect so, at the time of mprotect,
> the PTE is either clean or dirty. If it is clean then any subsequent write
> would transition the PTE from clean to dirty and an architecture enabling
> the batching mechanism must trap a clean->dirty transition for unmapped
> entries as commented upon in try_to_unmap_one (and was checked that this
> is true for x86 at least). This avoids data corruption due to a lost update.
> 
> If the previous access was a write then the batching flushes the page if
> any IO is required to avoid any writes after the IO has been initiated
> using try_to_unmap_flush_dirty so again there is no data corruption. There
> is a window where the TLB entry exists after the unmapping but this exists
> regardless of whether we batch or not.
> 
> In either case, before a page is freed and potentially allocated to another
> process, the TLB is flushed.

To clarify my concern again - I am not regarding a memory corruption as you
do, but situations in which the application wishes to trap certain memory
accesses but fails to do so. Having said that, I would add, that even if an
application has a bug, it may expect this bug not to affect memory that was
previously unmapped (and may be written to permanent storage).

Thanks (again),
Nadav

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-11  7:30 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11  0:52 Potential race in TLB flush batching? Nadav Amit
2017-07-11  6:41 ` Mel Gorman
2017-07-11  7:30   ` Nadav Amit [this message]
2017-07-11  9:29     ` Mel Gorman
2017-07-11 10:40       ` Nadav Amit
2017-07-11 13:20         ` Mel Gorman
2017-07-11 14:58           ` Andy Lutomirski
2017-07-11 15:53             ` Mel Gorman
2017-07-11 17:23               ` Andy Lutomirski
2017-07-11 19:18                 ` Mel Gorman
2017-07-11 20:06                   ` Nadav Amit
2017-07-11 21:09                     ` Mel Gorman
2017-07-11 20:09                   ` Mel Gorman
2017-07-11 21:52                     ` Mel Gorman
2017-07-11 22:27                       ` Nadav Amit
2017-07-11 22:34                         ` Nadav Amit
2017-07-12  8:27                         ` Mel Gorman
2017-07-12 23:27                           ` Nadav Amit
2017-07-12 23:36                             ` Andy Lutomirski
2017-07-12 23:42                               ` Nadav Amit
2017-07-13  5:38                                 ` Andy Lutomirski
2017-07-13 16:05                                   ` Nadav Amit
2017-07-13 16:06                                     ` Andy Lutomirski
2017-07-13  6:07                             ` Mel Gorman
2017-07-13 16:08                               ` Andy Lutomirski
2017-07-13 17:07                                 ` Mel Gorman
2017-07-13 17:15                                   ` Andy Lutomirski
2017-07-13 18:23                                     ` Mel Gorman
2017-07-14 23:16                               ` Nadav Amit
2017-07-15 15:55                                 ` Mel Gorman
2017-07-15 16:41                                   ` Andy Lutomirski
2017-07-17  7:49                                     ` Mel Gorman
2017-07-18 21:28                                   ` Nadav Amit
2017-07-19  7:41                                     ` Mel Gorman
2017-07-19 19:41                                       ` Nadav Amit
2017-07-19 19:58                                         ` Mel Gorman
2017-07-19 20:20                                           ` Nadav Amit
2017-07-19 21:47                                             ` Mel Gorman
2017-07-19 22:19                                               ` Nadav Amit
2017-07-19 22:59                                                 ` Mel Gorman
2017-07-19 23:39                                                   ` Nadav Amit
2017-07-20  7:43                                                     ` Mel Gorman
2017-07-22  1:19                                                       ` Nadav Amit
2017-07-24  9:58                                                         ` Mel Gorman
2017-07-24 19:46                                                           ` Nadav Amit
2017-07-25  7:37                                                           ` Minchan Kim
2017-07-25  8:51                                                             ` Mel Gorman
2017-07-25  9:11                                                               ` Minchan Kim
2017-07-25 10:10                                                                 ` Mel Gorman
2017-07-26  5:43                                                                   ` Minchan Kim
2017-07-26  9:22                                                                     ` Mel Gorman
2017-07-26 19:18                                                                       ` Nadav Amit
2017-07-26 23:40                                                                         ` Minchan Kim
2017-07-27  0:09                                                                           ` Nadav Amit
2017-07-27  0:34                                                                             ` Minchan Kim
2017-07-27  0:48                                                                               ` Nadav Amit
2017-07-27  1:13                                                                                 ` Nadav Amit
2017-07-27  7:04                                                                                   ` Minchan Kim
2017-07-27  7:21                                                                                     ` Mel Gorman
2017-07-27 16:04                                                                                       ` Nadav Amit
2017-07-27 17:36                                                                                         ` Mel Gorman
2017-07-26 23:44                                                                       ` Minchan Kim
2017-07-11 22:07                   ` Andy Lutomirski
2017-07-11 22:33                     ` Mel Gorman
2017-07-14  7:00                     ` Benjamin Herrenschmidt
2017-07-14  8:31                       ` Mel Gorman
2017-07-14  9:02                         ` Benjamin Herrenschmidt
2017-07-14  9:27                           ` Mel Gorman
2017-07-14 22:21                             ` Andy Lutomirski
2017-07-11 16:22           ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=D810A11D-1827-48C7-BA74-C1A6DCD80862@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).