All of lore.kernel.org
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: linux-mm <linux-mm@kvack.org>,
	lkml <linux-kernel@vger.kernel.org>, Yu Zhao <yuzhao@google.com>,
	Andy Lutomirski <luto@kernel.org>, Peter Xu <peterx@redhat.com>,
	Pavel Emelyanov <xemul@openvz.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Minchan Kim <minchan@kernel.org>, Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup
Date: Tue, 5 Jan 2021 21:22:51 +0000	[thread overview]
Message-ID: <B1B85771-B211-4FCC-AEEF-BDFD37332C25@vmware.com> (raw)
In-Reply-To: <X/TOhyzggcBL64N2@redhat.com>

> On Jan 5, 2021, at 12:39 PM, Andrea Arcangeli <aarcange@redhat.com> wrote:
> 
> On Tue, Jan 05, 2021 at 07:26:43PM +0000, Nadav Amit wrote:
>>> On Jan 5, 2021, at 10:20 AM, Andrea Arcangeli <aarcange@redhat.com> wrote:
>>> 
>>> On Fri, Dec 25, 2020 at 01:25:29AM -0800, Nadav Amit wrote:
>>>> Fixes: 0f8975ec4db2 ("mm: soft-dirty bits for user memory changes tracking")
>>> 
>>> Targeting a backport down to 2013 when nothing could wrong in practice
>>> with page_mapcount sounds backwards and unnecessarily risky.
>>> 
>>> In theory it was already broken and in theory
>>> 09854ba94c6aad7886996bfbee2530b3d8a7f4f4 is absolutely perfect and the
>>> previous code of 2013 is completely wrong, but in practice the code
>>> from 2013 worked perfectly until Aug 21 2020.
>> 
>> Well… If you consider the bug that Will recently fixed [1], then soft-dirty
>> was broken (for a different, yet related reason) since 0758cd830494
>> ("asm-generic/tlb: avoid potential double flush”).
>> 
>> This is not to say that I argue that the patch should be backported to 2013,
>> just to say that memory corruption bugs can be unnoticed.
>> 
>> [1] https://patchwork.kernel.org/project/linux-mm/patch/20201210121110.10094-2-will@kernel.org/
> 
> Is this a fix or a cleanup?
> 
> The above is precisely what I said earlier that tlb_gather had no
> reason to stay in clear_refs and it had to use inc_tlb_flush_pending
> as mprotect, but it's not a fix? Is it? I suggested it as a pure
> cleanup. So again no backport required. The commit says fix this but
> it means "clean this up".

It is actually a fix. I think the commit log is not entirely correct and
should include:

  Fixes: 0758cd830494 ("asm-generic/tlb: avoid potential double flush”).

Since 0758cd830494, calling tlb_finish_mmu() without any previous call to
pte_free_tlb() and friends does not flush the TLB. The soft-dirty bug
producer that I sent fails without this patch of Will.

> So setting a Fixes back to 2013 that would go mess with all stable
> tree by actively backporting a performance regressions to clear_refs
> that can break runtime performance to fix a philosophical issue that
> isn't even a theoretical issue, doesn't sound ideal to me.

Point taken.

> 
>> To summarize my action items based your (and others) feedback on both
>> patches:
>> 
>> 1. I will break the first patch into two different patches, one with the
>> “optimization” for write-unprotect, based on your feedback. It will not
>> be backported.
>> 
>> 2. I will try to add a patch to avoid TLB flushes on
>> userfaultfd-writeunprotect. It will also not be backported.
> 
> I think 1 and 2 above could be in the same patch. Mixing an uffd-wp optimization with the
> actual fix the memory corruption wasn't ideal, but doing the same
> optimization to both wrprotect and un-wrprotect in the same patch
> sounds ideal. The commit explanation would be identical and it can be
> de-duplicated this way.
> 
> I'd suggest to coordinate with Peter on that, since I wasn't planning
> to work on this if somebody else offered to do it.
> 
>> 3. Let me know if you want me to use your version of testing
>> mm_tlb_flush_pending() and conditionally flushing, wait for new version fro
>> you or Peter or to go with taking mmap_lock for write.
> 
> Yes, as you suggested, I'm trying to clean it up and send a new
> version.
> 
> Ultimately my view is there are an huge number of cases where
> mmap_write_lock or some other heavy lock that will require
> occasionally to block on I/O is beyond impossible not to take. Even
> speculative page faults only attack the low hanging anon memory and
> there's still MADV_DONTNEED/FREE and other stuff that may have to run
> in parallel with UFFDIO_WRITEPROTECT and clear_refs, not just page
> faults.
> 
> As a reminder: the only case when modifying the vmas is allowed under
> mmap_read_lock (I already tried once to make it safer by adding
> READ_ONCE/WRITE_ONCE but wasn't merged see
> https://www.spinics.net/lists/linux-mm/msg173420.html), is when
> updating vm_end/vm_start in growsdown/up, where the vma is extended
> down or up in the page fault under only mmap_read_lock.
> 
> I'm doing all I can to document and make it more explicit the
> complexity we deal with in the code (as well as reducing the gcc
> dependency in emitting atomic writes to update vm_end/vm_start, as we
> should do in ptes as well in theory). As you may notice in the
> feedback from the above submission not all even realized that we're
> modifying vmas already under mmap_read_lock. So it'd be great to get
> help to merge that READ_ONCE/WRITE_ONCE cleanup that is still valid
> and pending for merge but it needs forward porting.
> 
> This one, for both soft dirty and uffd_wrprotect, is a walk in the
> park to optimize in comparison to the vma modifications.

I am sure you are right.

> 
> From my point of view in fact, doing the tlb flush or the wait on the
> atomic to be released, does not increase kernel complexity compared to
> what we had until now.

It is also about performance due to unwarranted TLB flushes.

I think avoiding them requires some finer granularity detection of pending
page-faults. But anyhow, I still owe some TLB optimization patches (and v2
for userfaultfd+iouring) before I can even look at that.

In addition, as I stated before, having some clean interfaces that tell
whether a TLB flush is needed or not would be helpful and simpler to follow.
For instance, we can have is_pte_prot_demotion(oldprot, newprot) to figure
out whether a TLB flush is needed in change_pte_range() and avoid
unnecessary flushes when unprotecting pages with either mprotect() or
userfaultfd.


  parent reply	other threads:[~2021-01-05 21:23 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-25  9:25 [RFC PATCH v2 0/2] mm: fix races due to deferred TLB flushes Nadav Amit
2020-12-25  9:25 ` [RFC PATCH v2 1/2] mm/userfaultfd: fix memory corruption due to writeprotect Nadav Amit
2021-01-04 12:22   ` Peter Zijlstra
2021-01-04 19:24     ` Andrea Arcangeli
2021-01-04 19:35       ` Nadav Amit
2021-01-04 20:19         ` Andrea Arcangeli
2021-01-04 20:39           ` Nadav Amit
2021-01-04 21:01             ` Andrea Arcangeli
2021-01-04 21:26               ` Nadav Amit
2021-01-05 18:45                 ` Andrea Arcangeli
2021-01-05 19:05                   ` Nadav Amit
2021-01-05 19:45                     ` Andrea Arcangeli
2021-01-05 20:06                       ` Nadav Amit
2021-01-05 21:06                         ` Andrea Arcangeli
2021-01-05 21:43                           ` Peter Xu
2021-01-05  8:13       ` Peter Zijlstra
2021-01-05  8:52         ` Nadav Amit
2021-01-05 14:26           ` Peter Zijlstra
2021-01-05  8:58       ` Peter Zijlstra
2021-01-05  9:22         ` Nadav Amit
2021-01-05 17:58         ` Andrea Arcangeli
2021-01-05 15:08   ` Peter Xu
2021-01-05 18:08     ` Andrea Arcangeli
2021-01-05 18:41       ` Peter Xu
2021-01-05 18:55         ` Andrea Arcangeli
2021-01-05 19:07     ` Nadav Amit
2021-01-05 19:43       ` Peter Xu
2020-12-25  9:25 ` [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup Nadav Amit
2021-01-05 15:08   ` Will Deacon
2021-01-05 18:20   ` Andrea Arcangeli
2021-01-05 19:26     ` Nadav Amit
2021-01-05 20:39       ` Andrea Arcangeli
2021-01-05 21:20         ` Yu Zhao
2021-01-05 21:22         ` Nadav Amit [this message]
2021-01-05 22:16           ` Will Deacon
2021-01-06  0:29             ` Andrea Arcangeli
2021-01-06  0:02           ` Andrea Arcangeli
2021-01-07 20:04           ` [PATCH 0/2] page_count can't be used to decide when wp_page_copy Andrea Arcangeli
2021-01-07 20:04             ` [PATCH 1/2] mm: proc: Invalidate TLB after clearing soft-dirty page state Andrea Arcangeli
2021-01-07 20:04             ` [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending Andrea Arcangeli
2021-01-07 20:17               ` Linus Torvalds
2021-01-07 20:17                 ` Linus Torvalds
2021-01-07 20:25                 ` Linus Torvalds
2021-01-07 20:25                   ` Linus Torvalds
2021-01-07 20:58                 ` Andrea Arcangeli
2021-01-07 21:29                   ` Linus Torvalds
2021-01-07 21:29                     ` Linus Torvalds
2021-01-07 21:53                     ` John Hubbard
2021-01-07 22:00                       ` Linus Torvalds
2021-01-07 22:00                         ` Linus Torvalds
2021-01-07 22:14                         ` John Hubbard
2021-01-07 22:20                           ` Linus Torvalds
2021-01-07 22:20                             ` Linus Torvalds
2021-01-07 22:24                             ` Linus Torvalds
2021-01-07 22:24                               ` Linus Torvalds
2021-01-07 22:37                               ` John Hubbard
2021-01-15 11:27                       ` Jan Kara
2021-01-07 22:31                     ` Andrea Arcangeli
2021-01-07 22:42                       ` Linus Torvalds
2021-01-07 22:42                         ` Linus Torvalds
2021-01-07 22:51                         ` Linus Torvalds
2021-01-07 22:51                           ` Linus Torvalds
2021-01-07 23:48                           ` Andrea Arcangeli
2021-01-08  0:25                             ` Linus Torvalds
2021-01-08  0:25                               ` Linus Torvalds
2021-01-08 12:48                               ` Will Deacon
2021-01-08 16:14                                 ` Andrea Arcangeli
2021-01-08 17:39                                   ` Linus Torvalds
2021-01-08 17:39                                     ` Linus Torvalds
2021-01-08 17:53                                     ` Andrea Arcangeli
2021-01-08 19:25                                       ` Linus Torvalds
2021-01-08 19:25                                         ` Linus Torvalds
2021-01-09  0:12                                         ` Andrea Arcangeli
2021-01-08 17:30                                 ` Linus Torvalds
2021-01-08 17:30                                   ` Linus Torvalds
2021-01-07 23:28                         ` Andrea Arcangeli
2021-01-07 21:36               ` kernel test robot
2021-01-07 21:36                 ` kernel test robot
2021-01-07 20:25             ` [PATCH 0/2] page_count can't be used to decide when wp_page_copy Jason Gunthorpe
2021-01-07 20:32               ` Linus Torvalds
2021-01-07 20:32                 ` Linus Torvalds
2021-01-07 21:05                 ` Linus Torvalds
2021-01-07 21:05                   ` Linus Torvalds
2021-01-07 22:02                   ` Andrea Arcangeli
2021-01-07 22:17                     ` Linus Torvalds
2021-01-07 22:17                       ` Linus Torvalds
2021-01-07 22:56                       ` Andrea Arcangeli
2021-01-09 19:32                   ` Matthew Wilcox
2021-01-09 19:46                     ` Linus Torvalds
2021-01-09 19:46                       ` Linus Torvalds
2021-01-15 14:30                       ` Jan Kara
2021-01-07 21:54                 ` Andrea Arcangeli
2021-01-07 21:45               ` Andrea Arcangeli
2021-01-08 13:36                 ` Jason Gunthorpe
2021-01-08 17:00                   ` Andrea Arcangeli
2021-01-08 18:19                     ` Jason Gunthorpe
2021-01-08 18:31                       ` Andy Lutomirski
2021-01-08 18:31                         ` Andy Lutomirski
2021-01-08 18:38                         ` Linus Torvalds
2021-01-08 18:38                           ` Linus Torvalds
2021-01-08 23:34                         ` Andrea Arcangeli
2021-01-09 19:03                           ` Andy Lutomirski
2021-01-09 19:03                             ` Andy Lutomirski
2021-01-09 19:15                             ` Linus Torvalds
2021-01-09 19:15                               ` Linus Torvalds
2021-01-08 18:59                       ` Linus Torvalds
2021-01-08 18:59                         ` Linus Torvalds
2021-01-08 22:43                       ` Andrea Arcangeli
2021-01-09  0:42                         ` Jason Gunthorpe
2021-01-09  2:50                           ` Andrea Arcangeli
2021-01-11 14:30                             ` Jason Gunthorpe
2021-01-13 21:56                           ` Jerome Glisse
2021-01-13 23:39                             ` Jason Gunthorpe
2021-01-14  2:35                               ` Jerome Glisse
2021-01-09  3:49                       ` Hillf Danton
2021-01-11 14:39                         ` Jason Gunthorpe
2021-01-05 21:55         ` [RFC PATCH v2 2/2] fs/task_mmu: acquire mmap_lock for write on soft-dirty cleanup Peter Xu
2021-03-02 22:13 ` [RFC PATCH v2 0/2] mm: fix races due to deferred TLB flushes Peter Xu
2021-03-02 22:14   ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B1B85771-B211-4FCC-AEEF-BDFD37332C25@vmware.com \
    --to=namit@vmware.com \
    --cc=aarcange@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=will@kernel.org \
    --cc=xemul@openvz.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.