linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@amacapital.net>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andy Lutomirski <luto@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	X86 ML <x86@kernel.org>
Subject: Re: [RFC 15/20] mm: detect deferred TLB flushes in vma granularity
Date: Mon, 1 Feb 2021 16:14:58 -0800	[thread overview]
Message-ID: <8F37526F-8189-483A-A16E-E0EB8662AD98@amacapital.net> (raw)
In-Reply-To: <A6E4897D-8D5A-4084-8288-8E43F3039921@gmail.com>


> On Feb 1, 2021, at 2:04 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
> 
> 
>> 
>> On Jan 30, 2021, at 4:11 PM, Nadav Amit <nadav.amit@gmail.com> wrote:
>> 
>> From: Nadav Amit <namit@vmware.com>
>> 
>> Currently, deferred TLB flushes are detected in the mm granularity: if
>> there is any deferred TLB flush in the entire address space due to NUMA
>> migration, pte_accessible() in x86 would return true, and
>> ptep_clear_flush() would require a TLB flush. This would happen even if
>> the PTE resides in a completely different vma.
> 
> [ snip ]
> 
>> +static inline void read_defer_tlb_flush_gen(struct mmu_gather *tlb)
>> +{
>> +    struct mm_struct *mm = tlb->mm;
>> +    u64 mm_gen;
>> +
>> +    /*
>> +     * Any change of PTE before calling __track_deferred_tlb_flush() must be
>> +     * performed using RMW atomic operation that provides a memory barriers,
>> +     * such as ptep_modify_prot_start().  The barrier ensure the PTEs are
>> +     * written before the current generation is read, synchronizing
>> +     * (implicitly) with flush_tlb_mm_range().
>> +     */
>> +    smp_mb__after_atomic();
>> +
>> +    mm_gen = atomic64_read(&mm->tlb_gen);
>> +
>> +    /*
>> +     * This condition checks for both first deferred TLB flush and for other
>> +     * TLB pending or executed TLB flushes after the last table that we
>> +     * updated. In the latter case, we are going to skip a generation, which
>> +     * would lead to a full TLB flush. This should therefore not cause
>> +     * correctness issues, and should not induce overheads, since anyhow in
>> +     * TLB storms it is better to perform full TLB flush.
>> +     */
>> +    if (mm_gen != tlb->defer_gen) {
>> +        VM_BUG_ON(mm_gen < tlb->defer_gen);
>> +
>> +        tlb->defer_gen = inc_mm_tlb_gen(mm);
>> +    }
>> +}
> 
> Andy’s comments managed to make me realize this code is wrong. We must
> call inc_mm_tlb_gen(mm) every time.
> 
> Otherwise, a CPU that saw the old tlb_gen and updated it in its local
> cpu_tlbstate on a context-switch. If the process was not running when the
> TLB flush was issued, no IPI will be sent to the CPU. Therefore, later
> switch_mm_irqs_off() back to the process will not flush the local TLB.
> 
> I need to think if there is a better solution. Multiple calls to
> inc_mm_tlb_gen() during deferred flushes would trigger a full TLB flush
> instead of one that is specific to the ranges, once the flush actually takes
> place. On x86 it’s practically a non-issue, since anyhow any update of more
> than 33-entries or so would cause a full TLB flush, but this is still ugly.
> 

What if we had a per-mm ring buffer of flushes?  When starting a flush, we would stick the range in the ring buffer and, when flushing, we would read the ring buffer to catch up.  This would mostly replace the flush_tlb_info struct, and it would let us process multiple partial flushes together.

  reply	other threads:[~2021-02-02  0:16 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-31  0:11 [RFC 00/20] TLB batching consolidation and enhancements Nadav Amit
2021-01-31  0:11 ` [RFC 01/20] mm/tlb: fix fullmm semantics Nadav Amit
2021-01-31  1:02   ` Andy Lutomirski
2021-01-31  1:19     ` Nadav Amit
2021-01-31  2:57       ` Andy Lutomirski
2021-02-01  7:30         ` Nadav Amit
2021-02-01 11:36   ` Peter Zijlstra
2021-02-02  9:32     ` Nadav Amit
2021-02-02 11:00       ` Peter Zijlstra
2021-02-02 21:35         ` Nadav Amit
2021-02-03  9:44           ` Will Deacon
2021-02-04  3:20             ` Nadav Amit
2021-01-31  0:11 ` [RFC 02/20] mm/mprotect: use mmu_gather Nadav Amit
2021-01-31  0:11 ` [RFC 03/20] mm/mprotect: do not flush on permission promotion Nadav Amit
2021-01-31  1:07   ` Andy Lutomirski
2021-01-31  1:17     ` Nadav Amit
2021-01-31  2:59       ` Andy Lutomirski
     [not found]     ` <7a6de15a-a570-31f2-14d6-a8010296e694@citrix.com>
2021-02-01  5:58       ` Nadav Amit
2021-02-01 15:38         ` Andrew Cooper
2021-01-31  0:11 ` [RFC 04/20] mm/mapping_dirty_helpers: use mmu_gather Nadav Amit
2021-01-31  0:11 ` [RFC 05/20] mm/tlb: move BATCHED_UNMAP_TLB_FLUSH to tlb.h Nadav Amit
2021-01-31  0:11 ` [RFC 06/20] fs/task_mmu: use mmu_gather interface of clear-soft-dirty Nadav Amit
2021-01-31  0:11 ` [RFC 07/20] mm: move x86 tlb_gen to generic code Nadav Amit
2021-01-31 18:26   ` Andy Lutomirski
2021-01-31  0:11 ` [RFC 08/20] mm: store completed TLB generation Nadav Amit
2021-01-31 20:32   ` Andy Lutomirski
2021-02-01  7:28     ` Nadav Amit
2021-02-01 16:53       ` Andy Lutomirski
2021-02-01 11:52   ` Peter Zijlstra
2021-01-31  0:11 ` [RFC 09/20] mm: create pte/pmd_tlb_flush_pending() Nadav Amit
2021-01-31  0:11 ` [RFC 10/20] mm: add pte_to_page() Nadav Amit
2021-01-31  0:11 ` [RFC 11/20] mm/tlb: remove arch-specific tlb_start/end_vma() Nadav Amit
2021-02-01 12:09   ` Peter Zijlstra
2021-02-02  6:41     ` Nicholas Piggin
2021-02-02  7:20       ` Nadav Amit
2021-02-02  9:31         ` Peter Zijlstra
2021-02-02  9:54           ` Nadav Amit
2021-02-02 11:04             ` Peter Zijlstra
2021-01-31  0:11 ` [RFC 12/20] mm/tlb: save the VMA that is flushed during tlb_start_vma() Nadav Amit
2021-02-01 12:28   ` Peter Zijlstra
2021-01-31  0:11 ` [RFC 13/20] mm/tlb: introduce tlb_start_ptes() and tlb_end_ptes() Nadav Amit
     [not found]   ` <YBaBcc2jEGaxuxH0@fedora.tometzki.de>
2021-02-01  7:29     ` Nadav Amit
2021-02-01 13:19   ` Peter Zijlstra
2021-02-01 23:00     ` Nadav Amit
2021-01-31  0:11 ` [RFC 14/20] mm: move inc/dec_tlb_flush_pending() to mmu_gather.c Nadav Amit
2021-01-31  0:11 ` [RFC 15/20] mm: detect deferred TLB flushes in vma granularity Nadav Amit
2021-02-01 22:04   ` Nadav Amit
2021-02-02  0:14     ` Andy Lutomirski [this message]
2021-02-02 20:51       ` Nadav Amit
2021-02-04  4:35         ` Andy Lutomirski
2021-01-31  0:11 ` [RFC 16/20] mm/tlb: per-page table generation tracking Nadav Amit
2021-01-31  0:11 ` [RFC 17/20] mm/tlb: updated completed deferred TLB flush conditionally Nadav Amit
2021-01-31  0:11 ` [RFC 18/20] mm: make mm_cpumask() volatile Nadav Amit
2021-01-31  0:11 ` [RFC 19/20] lib/cpumask: introduce cpumask_atomic_or() Nadav Amit
2021-01-31  0:11 ` [RFC 20/20] mm/rmap: avoid potential races Nadav Amit
2021-08-23  8:05   ` Huang, Ying
2021-08-23 15:50     ` Nadav Amit
2021-08-24  0:36       ` Huang, Ying
2021-01-31  0:39 ` [RFC 00/20] TLB batching consolidation and enhancements Andy Lutomirski
2021-01-31  1:08   ` Nadav Amit
2021-01-31  3:30 ` Nicholas Piggin
2021-01-31  7:57   ` Nadav Amit
2021-01-31  8:14     ` Nadav Amit
2021-02-01 12:44     ` Peter Zijlstra
2021-02-02  7:14       ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8F37526F-8189-483A-A16E-E0EB8662AD98@amacapital.net \
    --to=luto@amacapital.net \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=nadav.amit@gmail.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).