From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=qryl=4U=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,
	INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id D6C26C3F2CD
	for <linux-mm@archiver.kernel.org>; Tue,  3 Mar 2020 13:01:56 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 7478020866
	for <linux-mm@archiver.kernel.org>; Tue,  3 Mar 2020 13:01:56 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="aSc2i3OK"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7478020866
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id EBEEA6B0005; Tue,  3 Mar 2020 08:01:55 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id E978E6B0006; Tue,  3 Mar 2020 08:01:55 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id D85CE6B0007; Tue,  3 Mar 2020 08:01:55 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0158.hostedemail.com [216.40.44.158])
	by kanga.kvack.org (Postfix) with ESMTP id BC5126B0005
	for <linux-mm@kvack.org>; Tue,  3 Mar 2020 08:01:55 -0500 (EST)
Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay02.hostedemail.com (Postfix) with ESMTP id 304BB1E07E
	for <linux-mm@kvack.org>; Tue,  3 Mar 2020 13:01:55 +0000 (UTC)
X-FDA: 76554063390.29.doll96_4ad607e8bc822
X-HE-Tag: doll96_4ad607e8bc822
X-Filterd-Recvd-Size: 19274
Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [207.211.31.120])
	by imf06.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue,  3 Mar 2020 13:01:54 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1583240513;
	h=from:from:reply-to:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:in-reply-to:in-reply-to:  references:references;
	bh=eecOY/VrXUM0T49BqVcTvDoEdzmcHx3S/MrUSUNFd8k=;
	b=aSc2i3OKiJBwNwS3yNLcMa5kFPbU21HD1vR6nCe2Fq5r97y0HRDyws7KiG2BA2jDiHpluU
	8s/G0AEHjVkakBr0/vIiuuGphK4fyk4fnSq3WCp09swksD704pfNuP7pHZaVC8mCdF9Did
	pHXFOVkYaCbxrC0Lgr1SWc7qpS550/A=
Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com
 [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-123-KapidwBoOKu3clAX9e073g-1; Tue, 03 Mar 2020 08:01:49 -0500
X-MC-Unique: KapidwBoOKu3clAX9e073g-1
Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13])
	(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
	(No client certificate requested)
	by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 503FB189F767;
	Tue,  3 Mar 2020 13:01:47 +0000 (UTC)
Received: from lwoodman.boston.csb (ovpn-116-137.phx2.redhat.com [10.3.116.137])
	by smtp.corp.redhat.com (Postfix) with ESMTP id 12D0491D75;
	Tue,  3 Mar 2020 13:01:42 +0000 (UTC)
Reply-To: lwoodman@redhat.com
Subject: Re: [PATCH 0/3] arm64: tlb: skip tlbi broadcast v2
References: <20200223192520.20808-1-aarcange@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>, Will Deacon <will@kernel.org>,
 Catalin Marinas <catalin.marinas@arm.com>, Rafael Aquini
 <aquini@redhat.com>, Mark Salter <msalter@redhat.com>
Cc: Jon Masters <jcm@jonmasters.org>, linux-kernel@vger.kernel.org,
 linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org,
 Michal Hocko <mhocko@kernel.org>, QI Fuli <qi.fuli@fujitsu.com>
From: Larry Woodman <lwoodman@redhat.com>
Organization: Red Hat
Message-ID: <343996b3-3dab-bca8-23d0-8902218cd233@redhat.com>
Date: Tue, 3 Mar 2020 08:01:42 -0500
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.4.0
MIME-Version: 1.0
In-Reply-To: <20200223192520.20808-1-aarcange@redhat.com>
Content-Type: multipart/alternative;
 boundary="------------15C73262A815626580B2ED1B"
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

This is a multi-part message in MIME format.
--------------15C73262A815626580B2ED1B
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit

On 02/23/2020 02:25 PM, Andrea Arcangeli wrote:
> Hello,
>
> This is introducing a nr_active_mm that allows to optimize away the
> tlbi broadcast also for multi threaded processes, it doesn't rely
> anymore on mm_users <= 1.
>
> This also optimizes away all TLB flushes (including local ones) when
> the process is not running in any cpu (including during exit_mmap with
> lazy tlb state).
>
> This optimization is generally only observable when there are parallel
> TLB flushes from different processes in multiple CPUs. One possible
> use case is an userland malloc libs freeing small objects with
> MADV_DONTNEED and causing a frequent tiny tlb flushes as demonstrated
> by the tcmalloc testsuite.
>
> All memory intensive apps dealing a multitude of frequently freed
> small objects tend to opt-out of glibc and they opt-in jemalloc or
> tcmalloc, so this should facilitate the SMP/NUMA scalability of long
> lived apps with small objects running in different containers if
> they're issuing frequent MADV_DONTNEED tlb flushes while the other
> threads of the process are not running.
>
> I was suggested to implement the mm_cpumask the standard way in
> order to optimize multithreaded apps too and to avoid restricting the
> optimization to mm_users <= 1. So initially I had two bitmasks allocated
> as shown at the bottom of this cover letter, by setting
> ARCH_NR_MM_CPUMASK to 2 with the below patch applied... however I
> figured a single atomic per-mm achieves the exact same runtime behavior
> of the extra bitmap, so I just dropped the extra bitmap and I replaced
> it with nr_active_mm as an optimization.
>
> If the switch_mm atomic ops in the switch_mm fast path would be a
> concern (they're still faster than the cpumask_set_cpu/clear_cpu, with
> less than 256-512 CPUs), it's worth mentioning it'd be possible to
> remove all atomic ops from the switch_mm fast path by restricting this
> optimization to single threaded processes by checking mm_users <= 1
> and < 1 instead of nr_active_mm <= 1 and < 1 similarly to what the
> earlier version of this patchset was doing.
>
> Thanks,
> Andrea
>
> Andrea Arcangeli (3):
>   mm: use_mm: fix for arches checking mm_users to optimize TLB flushes
>   arm64: select CPUMASK_OFFSTACK if NUMA
>   arm64: tlb: skip tlbi broadcast
>
>  arch/arm64/Kconfig                   |  1 +
>  arch/arm64/include/asm/efi.h         |  2 +-
>  arch/arm64/include/asm/mmu.h         |  4 +-
>  arch/arm64/include/asm/mmu_context.h | 33 ++++++++--
>  arch/arm64/include/asm/tlbflush.h    | 95 +++++++++++++++++++++++++++-
>  arch/arm64/mm/context.c              | 54 ++++++++++++++++
>  mm/mmu_context.c                     |  2 +
>  7 files changed, 180 insertions(+), 11 deletions(-)
>
> Early attempt with the standard mm_cpumask follows:
>
> From: Andrea Arcangeli <aarcange@redhat.com>
> Subject: mm: allow per-arch mm_cpumasks based on ARCH_NR_MM_CPUMASK
>
> Allow archs to allocate multiple mm_cpumasks in the mm_struct per-arch
> by definining a ARCH_NR_MM_CPUMASK > 1 (to be included before
> "linux/mm_types.h").
>
> Those extra per-mm cpumasks can be referenced with
> __mm_cpumask(N, mm), where N == 0 points to the mm_cpumask()
> known by the common code and N > 0 points to the per-arch private
> ones.
> ---
>  drivers/firmware/efi/efi.c |  3 ++-
>  include/linux/mm_types.h   | 17 +++++++++++++++--
>  kernel/fork.c              |  3 ++-
>  mm/init-mm.c               |  2 +-
>  4 files changed, 20 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
> index 5da0232ae33f..608c9bf181e5 100644
> --- a/drivers/firmware/efi/efi.c
> +++ b/drivers/firmware/efi/efi.c
> @@ -86,7 +86,8 @@ struct mm_struct efi_mm = {
>  	.mmap_sem		= __RWSEM_INITIALIZER(efi_mm.mmap_sem),
>  	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
>  	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),
> -	.cpu_bitmap		= { [BITS_TO_LONGS(NR_CPUS)] = 0},
> +	.cpu_bitmap		= { [BITS_TO_LONGS(NR_CPUS) *
> +				     ARCH_NR_MM_CPUMASK] = 0},
>  };
>  
>  struct workqueue_struct *efi_rts_wq;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index f29bba20bba1..b53d5622b3b2 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -531,6 +531,9 @@ struct mm_struct {
>  	RH_KABI_RESERVE(7)
>  	RH_KABI_RESERVE(8)
>  
> +#ifndef ARCH_NR_MM_CPUMASK
> +#define ARCH_NR_MM_CPUMASK 1
> +#endif
>  	/*
>  	 * The mm_cpumask needs to be at the end of mm_struct, because it
>  	 * is dynamically sized based on nr_cpu_ids.
> @@ -544,15 +547,25 @@ extern struct mm_struct init_mm;
>  static inline void mm_init_cpumask(struct mm_struct *mm)
>  {
>  	unsigned long cpu_bitmap = (unsigned long)mm;
> +	int i;
>  
>  	cpu_bitmap += offsetof(struct mm_struct, cpu_bitmap);
> -	cpumask_clear((struct cpumask *)cpu_bitmap);
> +	for (i = 0; i < ARCH_NR_MM_CPUMASK; i++) {
> +		cpumask_clear((struct cpumask *)cpu_bitmap);
> +		cpu_bitmap += cpumask_size();
> +	}
>  }
>  
>  /* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
> +static inline cpumask_t *__mm_cpumask(int index, struct mm_struct *mm)
> +{
> +	return (struct cpumask *)((unsigned long)&mm->cpu_bitmap +
> +				  cpumask_size() * index);
> +}
> +
>  static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
>  {
> -	return (struct cpumask *)&mm->cpu_bitmap;
> +	return __mm_cpumask(0, mm);
>  }
>  
>  struct mmu_gather;
> diff --git a/kernel/fork.c b/kernel/fork.c
> index 1dad2f91fac3..a6cbbc1b6008 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -2418,7 +2418,8 @@ void __init proc_caches_init(void)
>  	 * dynamically sized based on the maximum CPU number this system
>  	 * can have, taking hotplug into account (nr_cpu_ids).
>  	 */
> -	mm_size = sizeof(struct mm_struct) + cpumask_size();
> +	mm_size = sizeof(struct mm_struct) + cpumask_size() * \
> +		ARCH_NR_MM_CPUMASK;
>  
>  	mm_cachep = kmem_cache_create_usercopy("mm_struct",
>  			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
> diff --git a/mm/init-mm.c b/mm/init-mm.c
> index a787a319211e..d975f8ce270e 100644
> --- a/mm/init-mm.c
> +++ b/mm/init-mm.c
> @@ -35,6 +35,6 @@ struct mm_struct init_mm = {
>  	.arg_lock	=  __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
>  	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
>  	.user_ns	= &init_user_ns,
> -	.cpu_bitmap	= { [BITS_TO_LONGS(NR_CPUS)] = 0},
> +	.cpu_bitmap	= { [BITS_TO_LONGS(NR_CPUS) * ARCH_NR_MM_CPUMASK] = 0},
>  	INIT_MM_CONTEXT(init_mm)
>  };
>
>
> [bitmap version depending on the above follows]
>
> @@ -248,6 +260,42 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
>  		cpu_switch_mm(mm->pgd, mm);
>  }
>  
> +enum tlb_flush_types tlb_flush_check(struct mm_struct *mm, unsigned int cpu)
> +{
> +	if (cpumask_any_but(mm_cpumask(mm), cpu) >= nr_cpu_ids) {
> +		bool is_local = cpumask_test_cpu(cpu, mm_cpumask(mm));
> +		cpumask_t *stale_cpumask = __mm_cpumask(1, mm);
> +		int next_zero = cpumask_next_zero(-1, stale_cpumask);
> +		bool local_is_clear = false;
> +		if (next_zero < nr_cpu_ids &&
> +		    (is_local && next_zero == cpu)) {
> +			next_zero = cpumask_next_zero(next_zero, stale_cpumask);
> +			local_is_clear = true;
> +		}
> +		if (next_zero < nr_cpu_ids) {
> +			cpumask_setall(stale_cpumask);
> +			local_is_clear = false;
> +		}
> +
> +		/*
> +		 * Enforce CPU ordering between the
> +		 * cpumask_setall() and cpumask_any_but().
> +		 */
> +		smp_mb();
> +
> +		if (likely(cpumask_any_but(mm_cpumask(mm),
> +					   cpu) >= nr_cpu_ids)) {
> +			if (is_local) {
> +				if (!local_is_clear)
> +					cpumask_clear_cpu(cpu, stale_cpumask);
> +				return TLB_FLUSH_LOCAL;
> +			}
> +			return TLB_FLUSH_NO;
> +		}
> +	}
> +	return TLB_FLUSH_BROADCAST;
> +}
> +
>  /* Errata workaround post TTBRx_EL1 update. */
>  asmlinkage void post_ttbr_update_workaround(void)
>  {
>
>
For the 3-part series:


Acked-by: Larry Woodman <lwoodman@redhat.com>


--------------15C73262A815626580B2ED1B
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 7bit

<html>
  <head>
    <meta content="text/html; charset=windows-1252"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <div class="moz-cite-prefix">On 02/23/2020 02:25 PM, Andrea
      Arcangeli wrote:<br>
    </div>
    <blockquote cite="mid:20200223192520.20808-1-aarcange@redhat.com"
      type="cite">
      <pre wrap="">Hello,

This is introducing a nr_active_mm that allows to optimize away the
tlbi broadcast also for multi threaded processes, it doesn't rely
anymore on mm_users &lt;= 1.

This also optimizes away all TLB flushes (including local ones) when
the process is not running in any cpu (including during exit_mmap with
lazy tlb state).

This optimization is generally only observable when there are parallel
TLB flushes from different processes in multiple CPUs. One possible
use case is an userland malloc libs freeing small objects with
MADV_DONTNEED and causing a frequent tiny tlb flushes as demonstrated
by the tcmalloc testsuite.

All memory intensive apps dealing a multitude of frequently freed
small objects tend to opt-out of glibc and they opt-in jemalloc or
tcmalloc, so this should facilitate the SMP/NUMA scalability of long
lived apps with small objects running in different containers if
they're issuing frequent MADV_DONTNEED tlb flushes while the other
threads of the process are not running.

I was suggested to implement the mm_cpumask the standard way in
order to optimize multithreaded apps too and to avoid restricting the
optimization to mm_users &lt;= 1. So initially I had two bitmasks allocated
as shown at the bottom of this cover letter, by setting
ARCH_NR_MM_CPUMASK to 2 with the below patch applied... however I
figured a single atomic per-mm achieves the exact same runtime behavior
of the extra bitmap, so I just dropped the extra bitmap and I replaced
it with nr_active_mm as an optimization.

If the switch_mm atomic ops in the switch_mm fast path would be a
concern (they're still faster than the cpumask_set_cpu/clear_cpu, with
less than 256-512 CPUs), it's worth mentioning it'd be possible to
remove all atomic ops from the switch_mm fast path by restricting this
optimization to single threaded processes by checking mm_users &lt;= 1
and &lt; 1 instead of nr_active_mm &lt;= 1 and &lt; 1 similarly to what the
earlier version of this patchset was doing.

Thanks,
Andrea

Andrea Arcangeli (3):
  mm: use_mm: fix for arches checking mm_users to optimize TLB flushes
  arm64: select CPUMASK_OFFSTACK if NUMA
  arm64: tlb: skip tlbi broadcast

 arch/arm64/Kconfig                   |  1 +
 arch/arm64/include/asm/efi.h         |  2 +-
 arch/arm64/include/asm/mmu.h         |  4 +-
 arch/arm64/include/asm/mmu_context.h | 33 ++++++++--
 arch/arm64/include/asm/tlbflush.h    | 95 +++++++++++++++++++++++++++-
 arch/arm64/mm/context.c              | 54 ++++++++++++++++
 mm/mmu_context.c                     |  2 +
 7 files changed, 180 insertions(+), 11 deletions(-)

Early attempt with the standard mm_cpumask follows:

From: Andrea Arcangeli <a class="moz-txt-link-rfc2396E" href="mailto:aarcange@redhat.com">&lt;aarcange@redhat.com&gt;</a>
Subject: mm: allow per-arch mm_cpumasks based on ARCH_NR_MM_CPUMASK

Allow archs to allocate multiple mm_cpumasks in the mm_struct per-arch
by definining a ARCH_NR_MM_CPUMASK &gt; 1 (to be included before
"linux/mm_types.h").

Those extra per-mm cpumasks can be referenced with
__mm_cpumask(N, mm), where N == 0 points to the mm_cpumask()
known by the common code and N &gt; 0 points to the per-arch private
ones.
---
 drivers/firmware/efi/efi.c |  3 ++-
 include/linux/mm_types.h   | 17 +++++++++++++++--
 kernel/fork.c              |  3 ++-
 mm/init-mm.c               |  2 +-
 4 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 5da0232ae33f..608c9bf181e5 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -86,7 +86,8 @@ struct mm_struct efi_mm = {
 	.mmap_sem		= __RWSEM_INITIALIZER(efi_mm.mmap_sem),
 	.page_table_lock	= __SPIN_LOCK_UNLOCKED(efi_mm.page_table_lock),
 	.mmlist			= LIST_HEAD_INIT(efi_mm.mmlist),
-	.cpu_bitmap		= { [BITS_TO_LONGS(NR_CPUS)] = 0},
+	.cpu_bitmap		= { [BITS_TO_LONGS(NR_CPUS) *
+				     ARCH_NR_MM_CPUMASK] = 0},
 };
 
 struct workqueue_struct *efi_rts_wq;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f29bba20bba1..b53d5622b3b2 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -531,6 +531,9 @@ struct mm_struct {
 	RH_KABI_RESERVE(7)
 	RH_KABI_RESERVE(8)
 
+#ifndef ARCH_NR_MM_CPUMASK
+#define ARCH_NR_MM_CPUMASK 1
+#endif
 	/*
 	 * The mm_cpumask needs to be at the end of mm_struct, because it
 	 * is dynamically sized based on nr_cpu_ids.
@@ -544,15 +547,25 @@ extern struct mm_struct init_mm;
 static inline void mm_init_cpumask(struct mm_struct *mm)
 {
 	unsigned long cpu_bitmap = (unsigned long)mm;
+	int i;
 
 	cpu_bitmap += offsetof(struct mm_struct, cpu_bitmap);
-	cpumask_clear((struct cpumask *)cpu_bitmap);
+	for (i = 0; i &lt; ARCH_NR_MM_CPUMASK; i++) {
+		cpumask_clear((struct cpumask *)cpu_bitmap);
+		cpu_bitmap += cpumask_size();
+	}
 }
 
 /* Future-safe accessor for struct mm_struct's cpu_vm_mask. */
+static inline cpumask_t *__mm_cpumask(int index, struct mm_struct *mm)
+{
+	return (struct cpumask *)((unsigned long)&amp;mm-&gt;cpu_bitmap +
+				  cpumask_size() * index);
+}
+
 static inline cpumask_t *mm_cpumask(struct mm_struct *mm)
 {
-	return (struct cpumask *)&amp;mm-&gt;cpu_bitmap;
+	return __mm_cpumask(0, mm);
 }
 
 struct mmu_gather;
diff --git a/kernel/fork.c b/kernel/fork.c
index 1dad2f91fac3..a6cbbc1b6008 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2418,7 +2418,8 @@ void __init proc_caches_init(void)
 	 * dynamically sized based on the maximum CPU number this system
 	 * can have, taking hotplug into account (nr_cpu_ids).
 	 */
-	mm_size = sizeof(struct mm_struct) + cpumask_size();
+	mm_size = sizeof(struct mm_struct) + cpumask_size() * \
+		ARCH_NR_MM_CPUMASK;
 
 	mm_cachep = kmem_cache_create_usercopy("mm_struct",
 			mm_size, ARCH_MIN_MMSTRUCT_ALIGN,
diff --git a/mm/init-mm.c b/mm/init-mm.c
index a787a319211e..d975f8ce270e 100644
--- a/mm/init-mm.c
+++ b/mm/init-mm.c
@@ -35,6 +35,6 @@ struct mm_struct init_mm = {
 	.arg_lock	=  __SPIN_LOCK_UNLOCKED(init_mm.arg_lock),
 	.mmlist		= LIST_HEAD_INIT(init_mm.mmlist),
 	.user_ns	= &amp;init_user_ns,
-	.cpu_bitmap	= { [BITS_TO_LONGS(NR_CPUS)] = 0},
+	.cpu_bitmap	= { [BITS_TO_LONGS(NR_CPUS) * ARCH_NR_MM_CPUMASK] = 0},
 	INIT_MM_CONTEXT(init_mm)
 };


[bitmap version depending on the above follows]

@@ -248,6 +260,42 @@ void check_and_switch_context(struct mm_struct *mm, unsigned int cpu)
 		cpu_switch_mm(mm-&gt;pgd, mm);
 }
 
+enum tlb_flush_types tlb_flush_check(struct mm_struct *mm, unsigned int cpu)
+{
+	if (cpumask_any_but(mm_cpumask(mm), cpu) &gt;= nr_cpu_ids) {
+		bool is_local = cpumask_test_cpu(cpu, mm_cpumask(mm));
+		cpumask_t *stale_cpumask = __mm_cpumask(1, mm);
+		int next_zero = cpumask_next_zero(-1, stale_cpumask);
+		bool local_is_clear = false;
+		if (next_zero &lt; nr_cpu_ids &amp;&amp;
+		    (is_local &amp;&amp; next_zero == cpu)) {
+			next_zero = cpumask_next_zero(next_zero, stale_cpumask);
+			local_is_clear = true;
+		}
+		if (next_zero &lt; nr_cpu_ids) {
+			cpumask_setall(stale_cpumask);
+			local_is_clear = false;
+		}
+
+		/*
+		 * Enforce CPU ordering between the
+		 * cpumask_setall() and cpumask_any_but().
+		 */
+		smp_mb();
+
+		if (likely(cpumask_any_but(mm_cpumask(mm),
+					   cpu) &gt;= nr_cpu_ids)) {
+			if (is_local) {
+				if (!local_is_clear)
+					cpumask_clear_cpu(cpu, stale_cpumask);
+				return TLB_FLUSH_LOCAL;
+			}
+			return TLB_FLUSH_NO;
+		}
+	}
+	return TLB_FLUSH_BROADCAST;
+}
+
 /* Errata workaround post TTBRx_EL1 update. */
 asmlinkage void post_ttbr_update_workaround(void)
 {


</pre>
    </blockquote>
    <p>For the 3-part series:</p>
    <p><br>
    </p>
    <pre wrap="">Acked-by: Larry Woodman <a class="moz-txt-link-rfc2396E" href="mailto:aquini@redhat.com">&lt;lwoodman@redhat.com&gt;</a></pre>
  </body>
</html>

--------------15C73262A815626580B2ED1B--