From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,URIBL_DBL_ABUSE_MALW autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 644B7C35247 for ; Tue, 4 Feb 2020 01:37:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A5822086A for ; Tue, 4 Feb 2020 01:37:14 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=kernel.org header.i=@kernel.org header.b="CLnWXix+" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A5822086A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9CE5E6B02C3; Mon, 3 Feb 2020 20:37:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 97F4C6B02C5; Mon, 3 Feb 2020 20:37:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86E826B02C6; Mon, 3 Feb 2020 20:37:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0156.hostedemail.com [216.40.44.156]) by kanga.kvack.org (Postfix) with ESMTP id 6DE296B02C3 for ; Mon, 3 Feb 2020 20:37:13 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 1C0FB180AD806 for ; Tue, 4 Feb 2020 01:37:13 +0000 (UTC) X-FDA: 76450731546.30.gun93_21d8d24b83f35 X-HE-Tag: gun93_21d8d24b83f35 X-Filterd-Recvd-Size: 13715 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf08.hostedemail.com (Postfix) with ESMTP for ; Tue, 4 Feb 2020 01:37:12 +0000 (UTC) Received: from localhost.localdomain (c-73-231-172-41.hsd1.ca.comcast.net [73.231.172.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B9FFC217BA; Tue, 4 Feb 2020 01:37:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1580780232; bh=aC/JiybeAP7qZmfwJnP6dk1pKeljs/01ixxR8vHIwI8=; h=Date:From:To:Subject:In-Reply-To:From; b=CLnWXix+S9Bq0VEtEjBTPvGYeUcrgmBp0VE1TP4JBoOiDRm87W3iuo4x6HL1h6Apj Vqy62M5Lmg4wb+SDr/CSnS1Du50pVzufjnSM7JZOFSB811/kVnEesre6ih9RjBgv3O Q48leuG7BhCCYPS36ZolfForTCm+MpJTk+kcVSxA= Date: Mon, 03 Feb 2020 17:37:11 -0800 From: Andrew Morton To: akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, linux-mm@kvack.org, mm-commits@vger.kernel.org, mpe@ellerman.id.au, peterz@infradead.org, torvalds@linux-foundation.org Subject: [patch 56/67] asm-generic/tlb: provide MMU_GATHER_TABLE_FREE Message-ID: <20200204013711.yprRqDo4k%akpm@linux-foundation.org> In-Reply-To: <20200203173311.6269a8be06a05e5a4aa08a93@linux-foundation.org> User-Agent: s-nail v14.8.16 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Zijlstra Subject: asm-generic/tlb: provide MMU_GATHER_TABLE_FREE As described in the comment, the correct order for freeing pages is: 1) unhook page 2) TLB invalidate page 3) free page This order equally applies to page directories. Currently there are two correct options: - use tlb_remove_page(), when all page directores are full pages and there are no futher contraints placed by things like software walkers (HAVE_FAST_GUP). - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the architecture does not do IPI based TLB invalidate and has HAVE_FAST_GUP (or software TLB fill). This however leaves architectures that don't have page based directories but don't need RCU in a bind. For those, provide MMU_GATHER_TABLE_FREE, which provides the independent batching for directories without the additional RCU freeing. Link: http://lkml.kernel.org/r/20200116064531.483522-10-aneesh.kumar@linux.ibm.com Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Aneesh Kumar K.V Cc: Michael Ellerman Signed-off-by: Andrew Morton --- arch/Kconfig | 5 + arch/arm/include/asm/tlb.h | 4 - include/asm-generic/tlb.h | 72 ++++++++++----------- mm/mmu_gather.c | 120 +++++++++++++++++++++++++---------- 4 files changed, 130 insertions(+), 71 deletions(-) --- a/arch/arm/include/asm/tlb.h~asm-generic-tlb-provide-mmu_gather_table_free +++ a/arch/arm/include/asm/tlb.h @@ -37,10 +37,6 @@ static inline void __tlb_remove_table(vo #include -#ifndef CONFIG_MMU_GATHER_RCU_TABLE_FREE -#define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry) -#endif - static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr) { --- a/arch/Kconfig~asm-generic-tlb-provide-mmu_gather_table_free +++ a/arch/Kconfig @@ -393,8 +393,12 @@ config HAVE_ARCH_JUMP_LABEL config HAVE_ARCH_JUMP_LABEL_RELATIVE bool +config MMU_GATHER_TABLE_FREE + bool + config MMU_GATHER_RCU_TABLE_FREE bool + select MMU_GATHER_TABLE_FREE config MMU_GATHER_PAGE_SIZE bool @@ -404,6 +408,7 @@ config MMU_GATHER_NO_RANGE config MMU_GATHER_NO_GATHER bool + depends on MMU_GATHER_TABLE_FREE config ARCH_HAVE_NMI_SAFE_CMPXCHG bool --- a/include/asm-generic/tlb.h~asm-generic-tlb-provide-mmu_gather_table_free +++ a/include/asm-generic/tlb.h @@ -56,6 +56,15 @@ * Defaults to flushing at tlb_end_vma() to reset the range; helps when * there's large holes between the VMAs. * + * - tlb_remove_table() + * + * tlb_remove_table() is the basic primitive to free page-table directories + * (__p*_free_tlb()). In it's most primitive form it is an alias for + * tlb_remove_page() below, for when page directories are pages and have no + * additional constraints. + * + * See also MMU_GATHER_TABLE_FREE and MMU_GATHER_RCU_TABLE_FREE. + * * - tlb_remove_page() / __tlb_remove_page() * - tlb_remove_page_size() / __tlb_remove_page_size() * @@ -129,17 +138,24 @@ * This might be useful if your architecture has size specific TLB * invalidation instructions. * - * MMU_GATHER_RCU_TABLE_FREE + * MMU_GATHER_TABLE_FREE * * This provides tlb_remove_table(), to be used instead of tlb_remove_page() - * for page directores (__p*_free_tlb()). This provides separate freeing of - * the page-table pages themselves in a semi-RCU fashion (see comment below). - * Useful if your architecture doesn't use IPIs for remote TLB invalidates - * and therefore doesn't naturally serialize with software page-table walkers. + * for page directores (__p*_free_tlb()). + * + * Useful if your architecture has non-page page directories. * * When used, an architecture is expected to provide __tlb_remove_table() * which does the actual freeing of these pages. * + * MMU_GATHER_RCU_TABLE_FREE + * + * Like MMU_GATHER_TABLE_FREE, and adds semi-RCU semantics to the free (see + * comment below). + * + * Useful if your architecture doesn't use IPIs for remote TLB invalidates + * and therefore doesn't naturally serialize with software page-table walkers. + * * MMU_GATHER_NO_RANGE * * Use this if your architecture lacks an efficient flush_tlb_range(). @@ -155,37 +171,12 @@ * various ptep_get_and_clear() functions. */ -#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE -/* - * Semi RCU freeing of the page directories. - * - * This is needed by some architectures to implement software pagetable walkers. - * - * gup_fast() and other software pagetable walkers do a lockless page-table - * walk and therefore needs some synchronization with the freeing of the page - * directories. The chosen means to accomplish that is by disabling IRQs over - * the walk. - * - * Architectures that use IPIs to flush TLBs will then automagically DTRT, - * since we unlink the page, flush TLBs, free the page. Since the disabling of - * IRQs delays the completion of the TLB flush we can never observe an already - * freed page. - * - * Architectures that do not have this (PPC) need to delay the freeing by some - * other means, this is that means. - * - * What we do is batch the freed directory pages (tables) and RCU free them. - * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling - * holds off grace periods. - * - * However, in order to batch these pages we need to allocate storage, this - * allocation is deep inside the MM code and can thus easily fail on memory - * pressure. To guarantee progress we fall back to single table freeing, see - * the implementation of tlb_remove_table_one(). - * - */ +#ifdef CONFIG_MMU_GATHER_TABLE_FREE + struct mmu_table_batch { +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE struct rcu_head rcu; +#endif unsigned int nr; void *tables[0]; }; @@ -195,6 +186,17 @@ struct mmu_table_batch { extern void tlb_remove_table(struct mmu_gather *tlb, void *table); +#else /* !CONFIG_MMU_GATHER_HAVE_TABLE_FREE */ + +/* + * Without MMU_GATHER_TABLE_FREE the architecture is assumed to have page based + * page directories and we can use the normal page batching to free them. + */ +#define tlb_remove_table(tlb, page) tlb_remove_page((tlb), (page)) + +#endif /* CONFIG_MMU_GATHER_TABLE_FREE */ + +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE /* * This allows an architecture that does not use the linux page-tables for * hardware to skip the TLBI when freeing page tables. @@ -248,7 +250,7 @@ extern bool __tlb_remove_page_size(struc struct mmu_gather { struct mm_struct *mm; -#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE +#ifdef CONFIG_MMU_GATHER_TABLE_FREE struct mmu_table_batch *batch; #endif --- a/mm/mmu_gather.c~asm-generic-tlb-provide-mmu_gather_table_free +++ a/mm/mmu_gather.c @@ -91,56 +91,106 @@ bool __tlb_remove_page_size(struct mmu_g #endif /* MMU_GATHER_NO_GATHER */ -#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE +#ifdef CONFIG_MMU_GATHER_TABLE_FREE -/* - * See the comment near struct mmu_table_batch. - */ +static void __tlb_remove_table_free(struct mmu_table_batch *batch) +{ + int i; + + for (i = 0; i < batch->nr; i++) + __tlb_remove_table(batch->tables[i]); + + free_page((unsigned long)batch); +} + +#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE /* - * If we want tlb_remove_table() to imply TLB invalidates. + * Semi RCU freeing of the page directories. + * + * This is needed by some architectures to implement software pagetable walkers. + * + * gup_fast() and other software pagetable walkers do a lockless page-table + * walk and therefore needs some synchronization with the freeing of the page + * directories. The chosen means to accomplish that is by disabling IRQs over + * the walk. + * + * Architectures that use IPIs to flush TLBs will then automagically DTRT, + * since we unlink the page, flush TLBs, free the page. Since the disabling of + * IRQs delays the completion of the TLB flush we can never observe an already + * freed page. + * + * Architectures that do not have this (PPC) need to delay the freeing by some + * other means, this is that means. + * + * What we do is batch the freed directory pages (tables) and RCU free them. + * We use the sched RCU variant, as that guarantees that IRQ/preempt disabling + * holds off grace periods. + * + * However, in order to batch these pages we need to allocate storage, this + * allocation is deep inside the MM code and can thus easily fail on memory + * pressure. To guarantee progress we fall back to single table freeing, see + * the implementation of tlb_remove_table_one(). + * */ -static inline void tlb_table_invalidate(struct mmu_gather *tlb) -{ - if (tlb_needs_table_invalidate()) { - /* - * Invalidate page-table caches used by hardware walkers. Then - * we still need to RCU-sched wait while freeing the pages - * because software walkers can still be in-flight. - */ - tlb_flush_mmu_tlbonly(tlb); - } -} static void tlb_remove_table_smp_sync(void *arg) { /* Simply deliver the interrupt */ } -static void tlb_remove_table_one(void *table) +static void tlb_remove_table_sync_one(void) { /* * This isn't an RCU grace period and hence the page-tables cannot be * assumed to be actually RCU-freed. * * It is however sufficient for software page-table walkers that rely on - * IRQ disabling. See the comment near struct mmu_table_batch. + * IRQ disabling. */ smp_call_function(tlb_remove_table_smp_sync, NULL, 1); - __tlb_remove_table(table); } static void tlb_remove_table_rcu(struct rcu_head *head) { - struct mmu_table_batch *batch; - int i; + __tlb_remove_table_free(container_of(head, struct mmu_table_batch, rcu)); +} - batch = container_of(head, struct mmu_table_batch, rcu); +static void tlb_remove_table_free(struct mmu_table_batch *batch) +{ + call_rcu(&batch->rcu, tlb_remove_table_rcu); +} - for (i = 0; i < batch->nr; i++) - __tlb_remove_table(batch->tables[i]); +#else /* !CONFIG_MMU_GATHER_RCU_TABLE_FREE */ - free_page((unsigned long)batch); +static void tlb_remove_table_sync_one(void) { } + +static void tlb_remove_table_free(struct mmu_table_batch *batch) +{ + __tlb_remove_table_free(batch); +} + +#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */ + +/* + * If we want tlb_remove_table() to imply TLB invalidates. + */ +static inline void tlb_table_invalidate(struct mmu_gather *tlb) +{ + if (tlb_needs_table_invalidate()) { + /* + * Invalidate page-table caches used by hardware walkers. Then + * we still need to RCU-sched wait while freeing the pages + * because software walkers can still be in-flight. + */ + tlb_flush_mmu_tlbonly(tlb); + } +} + +static void tlb_remove_table_one(void *table) +{ + tlb_remove_table_sync_one(); + __tlb_remove_table(table); } static void tlb_table_flush(struct mmu_gather *tlb) @@ -149,7 +199,7 @@ static void tlb_table_flush(struct mmu_g if (*batch) { tlb_table_invalidate(tlb); - call_rcu(&(*batch)->rcu, tlb_remove_table_rcu); + tlb_remove_table_free(*batch); *batch = NULL; } } @@ -173,13 +223,21 @@ void tlb_remove_table(struct mmu_gather tlb_table_flush(tlb); } -#endif /* CONFIG_MMU_GATHER_RCU_TABLE_FREE */ +static inline void tlb_table_init(struct mmu_gather *tlb) +{ + tlb->batch = NULL; +} + +#else /* !CONFIG_MMU_GATHER_TABLE_FREE */ + +static inline void tlb_table_flush(struct mmu_gather *tlb) { } +static inline void tlb_table_init(struct mmu_gather *tlb) { } + +#endif /* CONFIG_MMU_GATHER_TABLE_FREE */ static void tlb_flush_mmu_free(struct mmu_gather *tlb) { -#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE tlb_table_flush(tlb); -#endif #ifndef CONFIG_MMU_GATHER_NO_GATHER tlb_batch_pages_flush(tlb); #endif @@ -220,9 +278,7 @@ void tlb_gather_mmu(struct mmu_gather *t tlb->batch_count = 0; #endif -#ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE - tlb->batch = NULL; -#endif + tlb_table_init(tlb); #ifdef CONFIG_MMU_GATHER_PAGE_SIZE tlb->page_size = 0; #endif _