All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>,
	"open list:MEMORY MANAGEMENT" <linux-mm@kvack.org>
Subject: Re: Potential race in TLB flush batching?
Date: Wed, 19 Jul 2017 12:41:01 -0700	[thread overview]
Message-ID: <E9EE838F-F1E3-43A8-BB87-8B5B8388FF61@gmail.com> (raw)
In-Reply-To: <20170719074131.75wexoal3fiyoxw5@suse.de>

Mel Gorman <mgorman@suse.de> wrote:

> On Tue, Jul 18, 2017 at 02:28:27PM -0700, Nadav Amit wrote:
>>> If there are separate address spaces using a shared mapping then the
>>> same race does not occur.
>> 
>> I missed the fact you reverted the two operations since the previous version
>> of the patch. This specific scenario should be solved with this patch.
>> 
>> But in general, I think there is a need for a simple locking scheme.
> 
> Such as?

Something like:

bool is_potentially_stale_pte(pte_t pte, pgprot_t prot, int lock_state);

which would get the current PTE, the protection bits that the user is
interested in, and whether mmap_sem is taken read/write/none. 

It would return whether this PTE may be potentially stale and needs to be
invalidated. Obviously, any code that removes protection or unmaps need to
be updated for this information to be correct.

[snip]

>> As a result, concurrent operation such as KSM???s write_protect_page() or
> 
> write_protect_page operates under the page lock and cannot race with reclaim.

I still do not understand this claim. IIUC, reclaim can unmap the page in
some page table, decide not to reclaim the page and release the page-lock
before flush.

>> page_mkclean_one() can consider the page write-protected while in fact it is
>> still accessible - since the TLB flush was deferred.
> 
> As long as it's flushed before any IO occurs that would lose a data update,
> it's not a data integrity issue.
> 
>> As a result, they may
>> mishandle the PTE without flushing the page. In the case of
>> page_mkclean_one(), I suspect it may even lead to memory corruption. I admit
>> that in x86 there are some mitigating factors that would make such ???attack???
>> complicated, but it still seems wrong to me, no?
> 
> I worry that you're beginning to see races everywhere. I admit that the
> rules and protections here are varied and complex but it's worth keeping
> in mind that data integrity is the key concern (no false reads to wrong
> data, no lost writes) and the first race you identified found some problems
> here. However, with or without batching, there is always a delay between
> when a PTE is cleared and when the TLB entries are removed.

Sure, but usually the delay occurs while the page-table lock is taken so
there is no race.

Now, it is not fair to call me a paranoid, considering that these races are
real - I confirmed that at least two can happen in practice. There are many
possibilities for concurrent TLB batching and you cannot expect developers
to consider all of them. I don’t think many people are capable of doing the
voodoo tricks of avoiding a TLB flush if the page-lock is taken or the VMA
is anonymous. I doubt that these tricks work and anyhow IMHO they are likely
to fail in the future since they are undocumented and complicated.

As for “data integrity is the key concern” - violating the memory management
API can cause data integrity issues for programs. It may not cause the OS to
crash, but it should not be acceptable either, and may potentially raise
security concerns. If you think that the current behavior is ok, let the
documentation and man pages clarify that mprotect may not protect, madvise
may not advise and so on.

And although you would use it against me, I would say: Nobody knew that TLB
flushing could be so complicated.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-07-19 19:41 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-07-11  0:52 Potential race in TLB flush batching? Nadav Amit
2017-07-11  6:41 ` Mel Gorman
2017-07-11  7:30   ` Nadav Amit
2017-07-11  9:29     ` Mel Gorman
2017-07-11 10:40       ` Nadav Amit
2017-07-11 13:20         ` Mel Gorman
2017-07-11 14:58           ` Andy Lutomirski
2017-07-11 15:53             ` Mel Gorman
2017-07-11 17:23               ` Andy Lutomirski
2017-07-11 19:18                 ` Mel Gorman
2017-07-11 20:06                   ` Nadav Amit
2017-07-11 21:09                     ` Mel Gorman
2017-07-11 20:09                   ` Mel Gorman
2017-07-11 21:52                     ` Mel Gorman
2017-07-11 22:27                       ` Nadav Amit
2017-07-11 22:34                         ` Nadav Amit
2017-07-12  8:27                         ` Mel Gorman
2017-07-12 23:27                           ` Nadav Amit
2017-07-12 23:36                             ` Andy Lutomirski
2017-07-12 23:42                               ` Nadav Amit
2017-07-13  5:38                                 ` Andy Lutomirski
2017-07-13 16:05                                   ` Nadav Amit
2017-07-13 16:06                                     ` Andy Lutomirski
2017-07-13  6:07                             ` Mel Gorman
2017-07-13 16:08                               ` Andy Lutomirski
2017-07-13 17:07                                 ` Mel Gorman
2017-07-13 17:15                                   ` Andy Lutomirski
2017-07-13 18:23                                     ` Mel Gorman
2017-07-14 23:16                               ` Nadav Amit
2017-07-15 15:55                                 ` Mel Gorman
2017-07-15 16:41                                   ` Andy Lutomirski
2017-07-17  7:49                                     ` Mel Gorman
2017-07-18 21:28                                   ` Nadav Amit
2017-07-19  7:41                                     ` Mel Gorman
2017-07-19 19:41                                       ` Nadav Amit [this message]
2017-07-19 19:58                                         ` Mel Gorman
2017-07-19 20:20                                           ` Nadav Amit
2017-07-19 21:47                                             ` Mel Gorman
2017-07-19 22:19                                               ` Nadav Amit
2017-07-19 22:59                                                 ` Mel Gorman
2017-07-19 23:39                                                   ` Nadav Amit
2017-07-20  7:43                                                     ` Mel Gorman
2017-07-22  1:19                                                       ` Nadav Amit
2017-07-24  9:58                                                         ` Mel Gorman
2017-07-24 19:46                                                           ` Nadav Amit
2017-07-25  7:37                                                           ` Minchan Kim
2017-07-25  8:51                                                             ` Mel Gorman
2017-07-25  9:11                                                               ` Minchan Kim
2017-07-25 10:10                                                                 ` Mel Gorman
2017-07-26  5:43                                                                   ` Minchan Kim
2017-07-26  9:22                                                                     ` Mel Gorman
2017-07-26 19:18                                                                       ` Nadav Amit
2017-07-26 23:40                                                                         ` Minchan Kim
2017-07-27  0:09                                                                           ` Nadav Amit
2017-07-27  0:34                                                                             ` Minchan Kim
2017-07-27  0:48                                                                               ` Nadav Amit
2017-07-27  1:13                                                                                 ` Nadav Amit
2017-07-27  7:04                                                                                   ` Minchan Kim
2017-07-27  7:21                                                                                     ` Mel Gorman
2017-07-27 16:04                                                                                       ` Nadav Amit
2017-07-27 17:36                                                                                         ` Mel Gorman
2017-07-26 23:44                                                                       ` Minchan Kim
2017-07-11 22:07                   ` Andy Lutomirski
2017-07-11 22:33                     ` Mel Gorman
2017-07-14  7:00                     ` Benjamin Herrenschmidt
2017-07-14  8:31                       ` Mel Gorman
2017-07-14  9:02                         ` Benjamin Herrenschmidt
2017-07-14  9:27                           ` Mel Gorman
2017-07-14 22:21                             ` Andy Lutomirski
2017-07-11 16:22           ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=E9EE838F-F1E3-43A8-BB87-8B5B8388FF61@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.