From: Yu Zhao <yuzhao@google.com>
To: Will Deacon <will@kernel.org>
Cc: linux-kernel@vger.kernel.org, kernel-team@android.com,
Catalin Marinas <catalin.marinas@arm.com>,
Minchan Kim <minchan@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Anshuman Khandual <anshuman.khandual@arm.com>,
linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 4/6] mm: proc: Invalidate TLB after clearing soft-dirty page state
Date: Fri, 20 Nov 2020 19:49:22 -0700 [thread overview]
Message-ID: <20201121024922.GA1363491@google.com> (raw)
In-Reply-To: <20201120202253.GB1303870@google.com>
On Fri, Nov 20, 2020 at 01:22:53PM -0700, Yu Zhao wrote:
> On Fri, Nov 20, 2020 at 02:35:55PM +0000, Will Deacon wrote:
> > Since commit 0758cd830494 ("asm-generic/tlb: avoid potential double flush"),
> > TLB invalidation is elided in tlb_finish_mmu() if no entries were batched
> > via the tlb_remove_*() functions. Consequently, the page-table modifications
> > performed by clear_refs_write() in response to a write to
> > /proc/<pid>/clear_refs do not perform TLB invalidation. Although this is
> > fine when simply aging the ptes, in the case of clearing the "soft-dirty"
> > state we can end up with entries where pte_write() is false, yet a
> > writable mapping remains in the TLB.
>
> I don't think we need a TLB flush in this context, same reason as we
> don't have one in copy_present_pte() which uses ptep_set_wrprotect()
> to write-protect a src PTE.
>
> ptep_modify_prot_start/commit() and ptep_set_wrprotect() guarantee
> either the dirty bit is set (when a PTE is still writable) or a PF
> happens (when a PTE has become r/o) when h/w page table walker races
> with kernel that modifies a PTE using the two APIs.
After we remove the writable bit, if we end up with a clean PTE, any
subsequent write will trigger a page fault. We can't have a stale
writable tlb entry. The architecture-specific APIs guarantee this.
If we end up with a dirty PTE, then yes, there will be a stale
writable tlb entry. But this won't be a problem because when we
write-protect a page (not PTE), we always check both pte_dirty()
and pte_write(), i.e., write_protect_page() and page_mkclean_one().
When they see this dirty PTE, they will flush. And generally, only
callers of pte_mkclean() should flush tlb; otherwise we end up one
extra if callers of pte_mkclean() and pte_wrprotect() both flush.
Now let's take a step back and see why we got
tlb_gather/finish_mmu() here in the first place. Commit b3a81d0841a95
("mm: fix KSM data corruption") explains the problem clearly. But
to fix a problem created by two threads clearing pte_write() and
pte_dirty() independently, we only need one of them to set
mm_tlb_flush_pending(). Given only removing the writable bit requires
tlb flush, that thread should be the one, as I just explained. Adding
tlb_gather/finish_mmu() is unnecessary in that fix. And there is no
point in having the original flush_tlb_mm() either, given data
integrity is already guaranteed. Of course, with it we have more
accurate access tracking.
Does a similar problem exist for page_mkclean_one()? Possibly. It
checks pte_dirty() and pte_write() but not mm_tlb_flush_pending().
At the moment, madvise_free_pte_range() only supports anonymous
memory, which doesn't do writeback. But the missing
mm_tlb_flush_pending() just seems to be an accident waiting to happen.
E.g., clean_record_pte() calls pte_mkclean() and does batched flush.
I don't know what it's for, but if it's called on file VMAs, a similar
race involving 4 CPUs can happen. This time CPU 1 runs
clean_record_pte() and CPU 3 runs page_mkclean_one().
next prev parent reply other threads:[~2020-11-21 2:50 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-20 14:35 [PATCH 0/6] tlb: Fix access and (soft-)dirty bit management Will Deacon
2020-11-20 14:35 ` [PATCH 1/6] arm64: pgtable: Fix pte_accessible() Will Deacon
2020-11-20 16:03 ` Minchan Kim
2020-11-20 19:53 ` Yu Zhao
2020-11-23 13:27 ` Catalin Marinas
2020-11-24 10:02 ` Anshuman Khandual
2020-11-20 14:35 ` [PATCH 2/6] arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect() Will Deacon
2020-11-20 17:09 ` Minchan Kim
2020-11-23 14:31 ` Catalin Marinas
2020-11-23 14:22 ` Catalin Marinas
2020-11-20 14:35 ` [PATCH 3/6] tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu() Will Deacon
2020-11-20 17:20 ` Linus Torvalds
2020-11-23 16:48 ` Will Deacon
2020-11-20 14:35 ` [PATCH 4/6] mm: proc: Invalidate TLB after clearing soft-dirty page state Will Deacon
2020-11-20 15:00 ` Peter Zijlstra
2020-11-20 15:09 ` Peter Zijlstra
2020-11-20 15:15 ` Will Deacon
2020-11-20 15:27 ` Peter Zijlstra
2020-11-23 18:23 ` Will Deacon
2020-11-20 15:55 ` Minchan Kim
2020-11-23 18:41 ` Will Deacon
2020-11-25 22:51 ` Minchan Kim
2020-11-20 20:22 ` Yu Zhao
2020-11-21 2:49 ` Yu Zhao [this message]
2020-11-23 19:21 ` Yu Zhao
2020-11-23 22:04 ` Will Deacon
2020-11-20 14:35 ` [PATCH 5/6] tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm() Will Deacon
2020-11-20 17:22 ` Linus Torvalds
2020-11-20 17:31 ` Linus Torvalds
2020-11-23 16:48 ` Will Deacon
2021-02-01 11:32 ` [tip: core/mm] tlb: mmu_gather: Remove start/end arguments from tlb_gather_mmu() tip-bot2 for Will Deacon
2020-11-22 15:11 ` [tlb] e242a269fa: WARNING:at_mm/mmu_gather.c:#tlb_gather_mmu kernel test robot
2020-11-23 17:51 ` Will Deacon
2020-11-20 14:35 ` [PATCH 6/6] mm: proc: Avoid fullmm flush for young/dirty bit toggling Will Deacon
2020-11-20 17:41 ` Linus Torvalds
2020-11-20 17:45 ` Linus Torvalds
2020-11-20 20:40 ` Yu Zhao
2020-11-23 18:35 ` Will Deacon
2020-11-23 20:04 ` Yu Zhao
2020-11-23 21:17 ` Will Deacon
2020-11-24 1:13 ` Yu Zhao
2020-11-24 14:31 ` Will Deacon
2020-11-25 22:01 ` Minchan Kim
2020-11-24 14:46 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201121024922.GA1363491@google.com \
--to=yuzhao@google.com \
--cc=anshuman.khandual@arm.com \
--cc=catalin.marinas@arm.com \
--cc=kernel-team@android.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).