From mboxrd@z Thu Jan 1 00:00:00 1970 From: Will Deacon Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Date: Mon, 29 Oct 2018 10:28:40 +0000 Message-ID: <20181029102840.GC13965@arm.com> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> <20181025020907.GA13560@joelaf.mtv.corp.google.com> <20181025101900.phqnqpoju5t2gar5@kshutemo-mobl1> <20181026211148.GA140716@joelaf.mtv.corp.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Balbir Singh , Dave Hansen , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE \(32-BIT AND 64-BIT\)" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.uk, Ingo Molnar , Geer To: Joel Fernandes Return-path: In-Reply-To: <20181026211148.GA140716@joelaf.mtv.corp.google.com> List-Id: Linux on Synopsys ARC Processors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-snps-arc-bounces+gla-linux-snps-arc=m.gmane.org@lists.infradead.org On Fri, Oct 26, 2018 at 02:11:48PM -0700, Joel Fernandes wrote: > My thinking is to take it slow and get the patch in in its current state, > since it improves x86. Then as a next step, look into why the arm64 tlb > flushes are that expensive and look into optimizing that. On arm64 I am > testing on a 4.9 kernel so I'm wondering there are any optimizations since > 4.9 that can help speed it up there. After that, if all else fails about > speeding up arm64, then I look into developing the cleanest possible solution > where we can keep the lock held for longer and flush lesser. We rewrote a good chunk of the arm64 TLB invalidation and core mmu_gather code this merge window, so please do have another look at -rc1! Will