From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.4 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51C3AC433E0 for ; Sun, 31 Jan 2021 20:32:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7C11164E24 for ; Sun, 31 Jan 2021 20:32:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7C11164E24 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C4AD16B0006; Sun, 31 Jan 2021 15:32:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BFB356B006C; Sun, 31 Jan 2021 15:32:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B37C96B006E; Sun, 31 Jan 2021 15:32:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 9F9B46B0006 for ; Sun, 31 Jan 2021 15:32:30 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5A2B61EE6 for ; Sun, 31 Jan 2021 20:32:30 +0000 (UTC) X-FDA: 77767218060.17.place49_1105298275bc Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id 39B81180D0180 for ; Sun, 31 Jan 2021 20:32:30 +0000 (UTC) X-HE-Tag: place49_1105298275bc X-Filterd-Recvd-Size: 8245 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf11.hostedemail.com (Postfix) with ESMTP for ; Sun, 31 Jan 2021 20:32:29 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 6B65A64E2A for ; Sun, 31 Jan 2021 20:32:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612125148; bh=YSejjiCQYb6LknDEqwJa4TaQ624HDCCFGdHF3R6s90Y=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=vOk0/sEDH3PIvbeE4LIzumohy5s+fhKcJlARZRx2SbqZS1k98ib0tRait7nIcLtXh W6rqBCAhb8IpKN7TY3nz6+7OYfw/GzSSkd2/WGrjnFpBw9bURWN7gD/89mUsfyTTCU OY7eoya90KKC7JMHX3JVicmkXgiOL5mjcXPOe7AJoSUd14eGvJXCbUIYhNCb4JALYq cRGa9ohapVUlBzdavx26w4gtp94UsyrKFc0EjLL7keXvmcD/XbYVjrXHEelEwjJ0kD EbwOpEYzEUOa94uHShTfXx2IE5+wSsKZDWD5SDXFufmudQ7H03QZBMS/EmEMqvOXyh E+e1IwdNHOTIw== Received: by mail-ej1-f53.google.com with SMTP id i8so4664282ejc.7 for ; Sun, 31 Jan 2021 12:32:28 -0800 (PST) X-Gm-Message-State: AOAM530c5uCXsyUmfUgMtErL+y/5NpTiP4HDdzwF+fkPZebPHCcA+Rp/ EkRq/OKgccxHq64oZSK9RIdX0NNDjWZhe7arepZ22g== X-Google-Smtp-Source: ABdhPJxTdNegKWYNeyWfYoY/gnWURbV1JO3Zf7q06D1o6dENT5MiXDdak+r522ho+dHmTajGQ3m4+d2u4l8rMmyyeWU= X-Received: by 2002:a17:906:3f89:: with SMTP id b9mr14177953ejj.204.1612125146901; Sun, 31 Jan 2021 12:32:26 -0800 (PST) MIME-Version: 1.0 References: <20210131001132.3368247-1-namit@vmware.com> <20210131001132.3368247-9-namit@vmware.com> In-Reply-To: <20210131001132.3368247-9-namit@vmware.com> From: Andy Lutomirski Date: Sun, 31 Jan 2021 12:32:15 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC 08/20] mm: store completed TLB generation To: Nadav Amit Cc: Linux-MM , LKML , Nadav Amit , Andrea Arcangeli , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , X86 ML Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote: > > From: Nadav Amit > > To detect deferred TLB flushes in fine granularity, we need to keep > track on the completed TLB flush generation for each mm. > > Add logic to track for each mm the tlb_gen_completed, which tracks the > completed TLB generation. It is the arch responsibility to call > mark_mm_tlb_gen_done() whenever a TLB flush is completed. > > Start the generation numbers from 1 instead of 0. This would allow later > to detect whether flushes of a certain generation were completed. Can you elaborate on how this helps? I think you should document that tlb_gen_completed only means that no outdated TLB entries will be observably used. In the x86 implementation it's possible for older TLB entries to still exist, unused, in TLBs of cpus running other mms. How does this work with arch_tlbbatch_flush()? > > Signed-off-by: Nadav Amit > Cc: Andrea Arcangeli > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Dave Hansen > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Will Deacon > Cc: Yu Zhao > Cc: Nick Piggin > Cc: x86@kernel.org > --- > arch/x86/mm/tlb.c | 10 ++++++++++ > include/asm-generic/tlb.h | 33 +++++++++++++++++++++++++++++++++ > include/linux/mm_types.h | 15 ++++++++++++++- > 3 files changed, 57 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c > index 7ab21430be41..d17b5575531e 100644 > --- a/arch/x86/mm/tlb.c > +++ b/arch/x86/mm/tlb.c > @@ -14,6 +14,7 @@ > #include > #include > #include > +#include > > #include "mm_internal.h" > > @@ -915,6 +916,9 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, > if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) > flush_tlb_others(mm_cpumask(mm), info); > > + /* Update the completed generation */ > + mark_mm_tlb_gen_done(mm, new_tlb_gen); > + > put_flush_tlb_info(); > put_cpu(); > } > @@ -1147,6 +1151,12 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) > > cpumask_clear(&batch->cpumask); > > + /* > + * We cannot call mark_mm_tlb_gen_done() since we do not know which > + * mm's should be flushed. This may lead to some unwarranted TLB > + * flushes, but not to correction problems. > + */ > + > put_cpu(); > } > > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 517c89398c83..427bfcc6cdec 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -513,6 +513,39 @@ static inline void tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vm > } > #endif > > +#ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS > + > +/* > + * Helper function to update a generation to have a new value, as long as new > + * value is greater or equal to gen. > + */ I read this a couple of times, and I don't understand it. How about: Helper function to atomically set *gen = max(*gen, new_gen) > +static inline void tlb_update_generation(atomic64_t *gen, u64 new_gen) > +{ > + u64 cur_gen = atomic64_read(gen); > + > + while (cur_gen < new_gen) { > + u64 old_gen = atomic64_cmpxchg(gen, cur_gen, new_gen); > + > + /* Check if we succeeded in the cmpxchg */ > + if (likely(cur_gen == old_gen)) > + break; > + > + cur_gen = old_gen; > + }; > +} > + > + > +static inline void mark_mm_tlb_gen_done(struct mm_struct *mm, u64 gen) > +{ > + /* > + * Update the completed generation to the new generation if the new > + * generation is greater than the previous one. > + */ > + tlb_update_generation(&mm->tlb_gen_completed, gen); > +} > + > +#endif /* CONFIG_ARCH_HAS_TLB_GENERATIONS */ > + > /* > * tlb_flush_{pte|pmd|pud|p4d}_range() adjust the tlb->start and tlb->end, > * and set corresponding cleared_*. > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 2035ac319c2b..8a5eb4bfac59 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -571,6 +571,13 @@ struct mm_struct { > * This is not used on Xen PV. > */ > atomic64_t tlb_gen; > + > + /* > + * TLB generation which is guarnateed to be flushed, including guaranteed > + * all the PTE changes that were performed before tlb_gen was > + * incremented. > + */ I will defer judgment to future patches before I believe that this isn't racy :) > + atomic64_t tlb_gen_completed; > #endif > } __randomize_layout; > > @@ -690,7 +697,13 @@ static inline bool mm_tlb_flush_nested(struct mm_struct *mm) > #ifdef CONFIG_ARCH_HAS_TLB_GENERATIONS > static inline void init_mm_tlb_gen(struct mm_struct *mm) > { > - atomic64_set(&mm->tlb_gen, 0); > + /* > + * Start from generation of 1, so default generation 0 will be > + * considered as flushed and would not be regarded as an outstanding > + * deferred invalidation. > + */ Aha, this makes sense. > + atomic64_set(&mm->tlb_gen, 1); > + atomic64_set(&mm->tlb_gen_completed, 1); > } > > static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) > -- > 2.25.1 >