linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yu Zhao <yuzhao@google.com>
To: Will Deacon <will@kernel.org>
Cc: linux-kernel@vger.kernel.org, kernel-team@android.com,
	Catalin Marinas <catalin.marinas@arm.com>,
	Minchan Kim <minchan@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH 4/6] mm: proc: Invalidate TLB after clearing soft-dirty page state
Date: Fri, 20 Nov 2020 19:49:22 -0700	[thread overview]
Message-ID: <20201121024922.GA1363491@google.com> (raw)
In-Reply-To: <20201120202253.GB1303870@google.com>

On Fri, Nov 20, 2020 at 01:22:53PM -0700, Yu Zhao wrote:
> On Fri, Nov 20, 2020 at 02:35:55PM +0000, Will Deacon wrote:
> > Since commit 0758cd830494 ("asm-generic/tlb: avoid potential double flush"),
> > TLB invalidation is elided in tlb_finish_mmu() if no entries were batched
> > via the tlb_remove_*() functions. Consequently, the page-table modifications
> > performed by clear_refs_write() in response to a write to
> > /proc/<pid>/clear_refs do not perform TLB invalidation. Although this is
> > fine when simply aging the ptes, in the case of clearing the "soft-dirty"
> > state we can end up with entries where pte_write() is false, yet a
> > writable mapping remains in the TLB.
> 
> I don't think we need a TLB flush in this context, same reason as we
> don't have one in copy_present_pte() which uses ptep_set_wrprotect()
> to write-protect a src PTE.
> 
> ptep_modify_prot_start/commit() and ptep_set_wrprotect() guarantee
> either the dirty bit is set (when a PTE is still writable) or a PF
> happens (when a PTE has become r/o) when h/w page table walker races
> with kernel that modifies a PTE using the two APIs.

After we remove the writable bit, if we end up with a clean PTE, any
subsequent write will trigger a page fault. We can't have a stale
writable tlb entry. The architecture-specific APIs guarantee this.

If we end up with a dirty PTE, then yes, there will be a stale
writable tlb entry. But this won't be a problem because when we
write-protect a page (not PTE), we always check both pte_dirty()
and pte_write(), i.e., write_protect_page() and page_mkclean_one().
When they see this dirty PTE, they will flush. And generally, only
callers of pte_mkclean() should flush tlb; otherwise we end up one
extra if callers of pte_mkclean() and pte_wrprotect() both flush.

Now let's take a step back and see why we got
tlb_gather/finish_mmu() here in the first place. Commit b3a81d0841a95
("mm: fix KSM data corruption") explains the problem clearly. But
to fix a problem created by two threads clearing pte_write() and
pte_dirty() independently, we only need one of them to set
mm_tlb_flush_pending(). Given only removing the writable bit requires
tlb flush, that thread should be the one, as I just explained. Adding
tlb_gather/finish_mmu() is unnecessary in that fix. And there is no
point in having the original flush_tlb_mm() either, given data
integrity is already guaranteed. Of course, with it we have more
accurate access tracking.

Does a similar problem exist for page_mkclean_one()? Possibly. It
checks pte_dirty() and pte_write() but not mm_tlb_flush_pending().
At the moment, madvise_free_pte_range() only supports anonymous
memory, which doesn't do writeback. But the missing
mm_tlb_flush_pending() just seems to be an accident waiting to happen.
E.g., clean_record_pte() calls pte_mkclean() and does batched flush.
I don't know what it's for, but if it's called on file VMAs, a similar
race involving 4 CPUs can happen. This time CPU 1 runs
clean_record_pte() and CPU 3 runs page_mkclean_one().


  reply	other threads:[~2020-11-21  2:49 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-20 14:35 [PATCH 0/6] tlb: Fix access and (soft-)dirty bit management Will Deacon
2020-11-20 14:35 ` [PATCH 1/6] arm64: pgtable: Fix pte_accessible() Will Deacon
2020-11-20 16:03   ` Minchan Kim
2020-11-20 19:53   ` Yu Zhao
2020-11-23 13:27   ` Catalin Marinas
2020-11-24 10:02   ` Anshuman Khandual
2020-11-20 14:35 ` [PATCH 2/6] arm64: pgtable: Ensure dirty bit is preserved across pte_wrprotect() Will Deacon
2020-11-20 17:09   ` Minchan Kim
2020-11-23 14:31     ` Catalin Marinas
2020-11-23 14:22   ` Catalin Marinas
2020-11-20 14:35 ` [PATCH 3/6] tlb: mmu_gather: Remove unused start/end arguments from tlb_finish_mmu() Will Deacon
2020-11-20 17:20   ` Linus Torvalds
2020-11-23 16:48     ` Will Deacon
2020-11-20 14:35 ` [PATCH 4/6] mm: proc: Invalidate TLB after clearing soft-dirty page state Will Deacon
2020-11-20 15:00   ` Peter Zijlstra
2020-11-20 15:09     ` Peter Zijlstra
2020-11-20 15:15     ` Will Deacon
2020-11-20 15:27       ` Peter Zijlstra
2020-11-23 18:23         ` Will Deacon
2020-11-20 15:55     ` Minchan Kim
2020-11-23 18:41       ` Will Deacon
2020-11-25 22:51         ` Minchan Kim
2020-11-20 20:22   ` Yu Zhao
2020-11-21  2:49     ` Yu Zhao [this message]
2020-11-23 19:21       ` Yu Zhao
2020-11-23 22:04       ` Will Deacon
2020-11-20 14:35 ` [PATCH 5/6] tlb: mmu_gather: Introduce tlb_gather_mmu_fullmm() Will Deacon
2020-11-20 17:22   ` Linus Torvalds
2020-11-20 17:31     ` Linus Torvalds
2020-11-23 16:48       ` Will Deacon
2020-11-22 15:11   ` [tlb] e242a269fa: WARNING:at_mm/mmu_gather.c:#tlb_gather_mmu kernel test robot
2020-11-23 17:51     ` Will Deacon
2020-11-20 14:35 ` [PATCH 6/6] mm: proc: Avoid fullmm flush for young/dirty bit toggling Will Deacon
2020-11-20 17:41   ` Linus Torvalds
2020-11-20 17:45     ` Linus Torvalds
2020-11-20 20:40   ` Yu Zhao
2020-11-23 18:35     ` Will Deacon
2020-11-23 20:04       ` Yu Zhao
2020-11-23 21:17         ` Will Deacon
2020-11-24  1:13           ` Yu Zhao
2020-11-24 14:31             ` Will Deacon
2020-11-25 22:01             ` Minchan Kim
2020-11-24 14:46     ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201121024922.GA1363491@google.com \
    --to=yuzhao@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=catalin.marinas@arm.com \
    --cc=kernel-team@android.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).