From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joel Fernandes Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Date: Wed, 24 Oct 2018 19:09:07 -0700 Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Balbir Singh , Dave Hansen , Will Deacon , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE \(32-BIT AND 64-BIT\)" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.uk, I To: "Kirill A. Shutemov" Return-path: In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> List-Id: Linux on Synopsys ARC Processors List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: linux-snps-arc-bounces+gla-linux-snps-arc=m.gmane.org@lists.infradead.org On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel From mboxrd@z Thu Jan 1 00:00:00 1970 Received: with ECARTIS (v1.0.0; list linux-mips); Thu, 25 Oct 2018 04:09:22 +0200 (CEST) Received: from mail-pf1-x444.google.com ([IPv6:2607:f8b0:4864:20::444]:33073 "EHLO mail-pf1-x444.google.com" rhost-flags-OK-OK-OK-OK) by eddie.linux-mips.org with ESMTP id S23990392AbeJYCJRaz0FJ (ORCPT ); Thu, 25 Oct 2018 04:09:17 +0200 Received: by mail-pf1-x444.google.com with SMTP id 22-v6so3393375pfz.0 for ; Wed, 24 Oct 2018 19:09:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=G78I9uM4gXdiLalpqQZcUQ0k8S9gAvq6zBmyUxhGe0M=; b=LAnJhDOqLDe5rJ8W4SiC2EjUQZvQu/r5ZibVfgxKfT2epX8iHZwYQmEjB6a8btC1dW r6dDSvdCvkPgsk5m3K0hxUEc1ohpdLPr/504o7kJ7XSyqWkNXqLU77+seSO2HzOgNvQs hVyNi6MDrJlz77EflnU3vSDe0xbeuTe8cghrE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=G78I9uM4gXdiLalpqQZcUQ0k8S9gAvq6zBmyUxhGe0M=; b=h8ADg9YexNOr9DgYnAtTclSkQEZ696o+3jzie9W8pghiSAITWwEgbJ+JuGjJPOf6yw nIOonN/fSfGY1pf8NZYd6SwOch/8k+ww+BZCSpS5YJATe+jkxwUjaAY49/Rh6334Jd1b JczCxgoo5Urp2rn5lkPv2xl1utLmuTaDMNtxW2fBsAAxNH3BqfppkxtGOWoP+43bxqjb C6GUKcRa70mI0EIg2cYf4pZ6n/E/E/mQ9KFaNcg/X/CSPtsO0nFEdvz+tbGavOQlHxU6 wIf8/w5lBPCZ6kzJfgdiWZ+ckeU8ngYuwjs6aBs1m1FQ5ViRxHvdKR33AQZ5xsZeUqmc PIwg== X-Gm-Message-State: AGRZ1gIL2jcaK0zHQfLk5SThI+RrMt/j/mxD2DDWrM3yDeajwWKM8U2e 2ThPpgn12cl7rIjo+QvPZrxlYQ== X-Google-Smtp-Source: AJdET5dUEfywPWvzS+489PJ55mfHZNUk9qKPmDa0EsEQ3lYTg7j+ER6qGO2BektCxwt9F6U/VdSRSA== X-Received: by 2002:aa7:814f:: with SMTP id d15-v6mr5054081pfn.78.1540433350578; Wed, 24 Oct 2018 19:09:10 -0700 (PDT) Received: from localhost ([2620:0:1000:1601:3aef:314f:b9ea:889f]) by smtp.gmail.com with ESMTPSA id 68-v6sm6748192pfg.136.2018.10.24.19.09.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Oct 2018 19:09:08 -0700 (PDT) Date: Wed, 24 Oct 2018 19:09:07 -0700 From: Joel Fernandes To: "Kirill A. Shutemov" Cc: Balbir Singh , linux-kernel@vger.kernel.org, kernel-team@android.com, minchan@kernel.org, pantin@google.com, hughd@google.com, lokeshgidra@google.com, dancol@google.com, mhocko@kernel.org, akpm@linux-foundation.org, Andrey Ryabinin , Andy Lutomirski , anton.ivanov@kot-begemot.co.uk, Borislav Petkov , Catalin Marinas , Chris Zankel , Dave Hansen , "David S. Miller" , elfring@users.sourceforge.net, Fenghua Yu , Geert Uytterhoeven , Guan Xuetao , Helge Deller , Ingo Molnar , "James E.J. Bottomley" , Jeff Dike , Jonas Bonn , Julia Lawall , kasan-dev@googlegroups.com, kvmarm@lists.cs.columbia.edu, Ley Foon Tan , linux-alpha@vger.kernel.org, linux-hexagon@vger.kernel.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-mips@linux-mips.org, linux-mm@kvack.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-um@lists.infradead.org, linux-xtensa@linux-xtensa.org, Max Filippov , nios2-dev@lists.rocketboards.org, Peter Zijlstra , Richard Weinberger , Rich Felker , Sam Creasey , sparclinux@vger.kernel.org, Stafford Horne , Stefan Kristiansson , Thomas Gleixner , Tony Luck , Will Deacon , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , Yoshinori Sato Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> User-Agent: Mutt/1.10.1 (2018-07-13) Return-Path: X-Envelope-To: <"|/home/ecartis/ecartis -s linux-mips"> (uid 0) X-Orcpt: rfc822;linux-mips@linux-mips.org Original-Recipient: rfc822;linux-mips@linux-mips.org X-archive-position: 66931 X-ecartis-version: Ecartis v1.0.0 Sender: linux-mips-bounce@linux-mips.org Errors-to: linux-mips-bounce@linux-mips.org X-original-sender: joel@joelfernandes.org Precedence: bulk List-help: List-unsubscribe: List-software: Ecartis version 1.0.0 List-Id: linux-mips X-List-ID: linux-mips List-subscribe: List-owner: List-post: List-archive: X-list: linux-mips On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel From mboxrd@z Thu Jan 1 00:00:00 1970 From: joel@joelfernandes.org (Joel Fernandes) Date: Wed, 24 Oct 2018 19:09:07 -0700 Subject: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> To: linux-riscv@lists.infradead.org List-Id: linux-riscv.lists.infradead.org On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92DB1ECDE44 for ; Thu, 25 Oct 2018 02:09:30 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 4905020831 for ; Thu, 25 Oct 2018 02:09:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="TezSPkx9"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="LAnJhDOq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4905020831 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-riscv-bounces+infradead-linux-riscv=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=gUsJNxNRgIdSXHW+eHuVOzADgU7+fs3+IWbhxUSd3ac=; b=TezSPkx93hujde 734VxqiOLlg6B6rpCLNc7lvq13s9qqyUY9GeRNZ/6M81OIHsWWAHsDsSDR6ezEP7FHPJtZ04lIhCX mJvIWeo3HJYD3S7Jt3TZN7vxG/4wowwb55lSQiSMSDS3eHl60UefJ8O4k4xgNpuirJd3Co15W7Ufo iNKhkfWPWUVbW03LwLwrb0X2MUQQMxrKCqkZgGhu7nCbwoUtc86wd7A81fIs+VwhUUZCL7amD2SGb Z8XBALHzs4MUU9RCWdOvvZrxmC/uVryS1bfTf/CLfzxlfU7QLgPQiLviTaSNNlVeQ+xsVqF3ntNtA +TKCo3tZ4ZzEqQWBsB/w==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1gFV5T-00005o-57; Thu, 25 Oct 2018 02:09:27 +0000 Received: from mail-pf1-x441.google.com ([2607:f8b0:4864:20::441]) by bombadil.infradead.org with esmtps (Exim 4.90_1 #2 (Red Hat Linux)) id 1gFV5O-0008W0-Ae for linux-riscv@lists.infradead.org; Thu, 25 Oct 2018 02:09:24 +0000 Received: by mail-pf1-x441.google.com with SMTP id f26-v6so3369172pfn.9 for ; Wed, 24 Oct 2018 19:09:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=G78I9uM4gXdiLalpqQZcUQ0k8S9gAvq6zBmyUxhGe0M=; b=LAnJhDOqLDe5rJ8W4SiC2EjUQZvQu/r5ZibVfgxKfT2epX8iHZwYQmEjB6a8btC1dW r6dDSvdCvkPgsk5m3K0hxUEc1ohpdLPr/504o7kJ7XSyqWkNXqLU77+seSO2HzOgNvQs hVyNi6MDrJlz77EflnU3vSDe0xbeuTe8cghrE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=G78I9uM4gXdiLalpqQZcUQ0k8S9gAvq6zBmyUxhGe0M=; b=ID7Zd2Up1+b6ZMUJ/tZauYZ3FLt4Yha3Y1FKhK7eH7hxAPydp3Gy4abLMW3yLG+G8p Ud/IYH98zKVuGxPU6qc8KWbgbzhFKUer6OSJW52tayHJApqpDEODZiCDvoubbMd5pM90 79a6SvMkMPnOmSr7MLrlXk+aOvVgf4G8yj9qKR6YPt+RgUbHzsxhI/0DWtUlkTdsxyRa lK5ptBrSNU7Tqjg4LagF/XYSYjXHsuBXD9WDlrszxFaFblXZAbYAx9ZDmqXPNkC9QQQt mmXhJP316iC/SwsgW/j2P2bVpEDTZKWSyuyxg7lLTM33ch6vnuF89bdIp1xIgehl4dYz vfTw== X-Gm-Message-State: AGRZ1gLmrXQ0m8ZXOZzdhqWYqUW3sqjtYHLIqxHxpb/d9DZwNhgvbpBK jow5UjnvewV434e9NIUO9TI5/Q== X-Google-Smtp-Source: AJdET5dUEfywPWvzS+489PJ55mfHZNUk9qKPmDa0EsEQ3lYTg7j+ER6qGO2BektCxwt9F6U/VdSRSA== X-Received: by 2002:aa7:814f:: with SMTP id d15-v6mr5054081pfn.78.1540433350578; Wed, 24 Oct 2018 19:09:10 -0700 (PDT) Received: from localhost ([2620:0:1000:1601:3aef:314f:b9ea:889f]) by smtp.gmail.com with ESMTPSA id 68-v6sm6748192pfg.136.2018.10.24.19.09.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Oct 2018 19:09:08 -0700 (PDT) Date: Wed, 24 Oct 2018 19:09:07 -0700 From: Joel Fernandes To: "Kirill A. Shutemov" Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> User-Agent: Mutt/1.10.1 (2018-07-13) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20181024_190922_406994_49105C45 X-CRM114-Status: GOOD ( 22.07 ) X-BeenThere: linux-riscv@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Balbir Singh , Dave Hansen , Will Deacon , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE \(32-BIT AND 64-BIT\)" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.uk, Ingo Molnar , Geert Uytterhoeven , Andrey Ryabinin , linux-snps-arc@lists.infradead.org, kernel-team@android.com, Sam Creasey , Fenghua Yu , linux-s390@vger.kernel.org, Jeff Dike , linux-um@lists.infradead.org, Stefan Kristiansson , Julia Lawall , linux-m68k@lists.linux-m68k.org, Borislav Petkov , Andy Lutomirski , nios2-dev@lists.rocketboards.org, Stafford Horne , Guan Xuetao , Chris Zankel , Tony Luck , Richard Weinberger , linux-parisc@vger.kernel.org, pantin@google.com, Max Filippov , linux-kernel@vger.kernel.org, minchan@kernel.org, Thomas Gleixner , linux-alpha@vger.kernel.org, Ley Foon Tan , akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org, "David S. Miller" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-riscv" Errors-To: linux-riscv-bounces+infradead-linux-riscv=archiver.kernel.org@lists.infradead.org Message-ID: <20181025020907.OS777LGVojirFXRJFNyj6eAjdE_C3R5tPDnR3_821vs@z> On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28B6EC46475 for ; Thu, 25 Oct 2018 13:52:49 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7ED7A2083E for ; Thu, 25 Oct 2018 13:52:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="LAnJhDOq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7ED7A2083E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 42gpV62fkyzF37D for ; Fri, 26 Oct 2018 00:52:46 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (1024-bit key; unprotected) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="LAnJhDOq"; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=joelfernandes.org (client-ip=2607:f8b0:4864:20::443; helo=mail-pf1-x443.google.com; envelope-from=joel@joelfernandes.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=none (p=none dis=none) header.from=joelfernandes.org Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=joelfernandes.org header.i=@joelfernandes.org header.b="LAnJhDOq"; dkim-atps=neutral Received: from mail-pf1-x443.google.com (mail-pf1-x443.google.com [IPv6:2607:f8b0:4864:20::443]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 42gVtK2LSbzDrHF for ; Thu, 25 Oct 2018 13:09:12 +1100 (AEDT) Received: by mail-pf1-x443.google.com with SMTP id r64-v6so3365101pfb.13 for ; Wed, 24 Oct 2018 19:09:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=G78I9uM4gXdiLalpqQZcUQ0k8S9gAvq6zBmyUxhGe0M=; b=LAnJhDOqLDe5rJ8W4SiC2EjUQZvQu/r5ZibVfgxKfT2epX8iHZwYQmEjB6a8btC1dW r6dDSvdCvkPgsk5m3K0hxUEc1ohpdLPr/504o7kJ7XSyqWkNXqLU77+seSO2HzOgNvQs hVyNi6MDrJlz77EflnU3vSDe0xbeuTe8cghrE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=G78I9uM4gXdiLalpqQZcUQ0k8S9gAvq6zBmyUxhGe0M=; b=Lvpfk75n06K45R4jxAKp/LkUVY4ppQVCsLmIO0csqgOuSRzHATHof1dneHhSLkibYP /YamXGOjNh9N13q+VrkcowVOB2V3NYZ35YX/0cKgMcuKJGwSWVd9QIHDq5nKkNbArZE3 hwOyUEjT8o3hEXoJWoDfKvSE6sdgRE+GrmLf/YHJWfkjrcX9Brbny5px8rqiDxpOtwmq TJN3Z1sMZ60z1xruan6zhyPGjTj54sJrsHLQhtZLz/7uf6Ft3WqiNiWO4j1VK5u5yxJK 77EZ3nxzh2B+jzGeuT/F0cx1SZK2tQyqnpw62d2cRquwjhHrfljElHuAs5D/tQeEIhkv BMRg== X-Gm-Message-State: AGRZ1gI9upU5Kv8qKEAVQ7T4toCq0Q9+stLdeKHvQjfmaF2a8Dax8nQw bi7dAt+Hccfxmay3wjiQy1OZSQ== X-Google-Smtp-Source: AJdET5dUEfywPWvzS+489PJ55mfHZNUk9qKPmDa0EsEQ3lYTg7j+ER6qGO2BektCxwt9F6U/VdSRSA== X-Received: by 2002:aa7:814f:: with SMTP id d15-v6mr5054081pfn.78.1540433350578; Wed, 24 Oct 2018 19:09:10 -0700 (PDT) Received: from localhost ([2620:0:1000:1601:3aef:314f:b9ea:889f]) by smtp.gmail.com with ESMTPSA id 68-v6sm6748192pfg.136.2018.10.24.19.09.08 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 24 Oct 2018 19:09:08 -0700 (PDT) Date: Wed, 24 Oct 2018 19:09:07 -0700 From: Joel Fernandes To: "Kirill A. Shutemov" Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> User-Agent: Mutt/1.10.1 (2018-07-13) X-Mailman-Approved-At: Fri, 26 Oct 2018 00:50:26 +1100 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Dave Hansen , Will Deacon , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE \(32-BIT AND 64-BIT\)" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.uk, Ingo Molnar , Geert Uytterhoeven , Andrey Ryabinin , linux-snps-arc@lists.infradead.org, kernel-team@android.com, Sam Creasey , Fenghua Yu , linux-s390@vger.kernel.org, Jeff Dike , linux-um@lists.infradead.org, Stefan Kristiansson , Julia Lawall , linux-m68k@lists.linux-m68k.org, Borislav Petkov , Andy Lutomirski , nios2-dev@lists.rocketboards.org, Stafford Horne , Guan Xuetao , Chris Zankel , Tony Luck , Richard Weinberger , linux-parisc@vger.kernel.org, pantin@google.com, Max Filippov , linux-kernel@vger.kernel.org, minchan@kernel.org, Thomas Gleixner , linux-alpha@vger.kernel.org, Ley Foon Tan , akpm@linux-foundation.org, linuxppc-dev@lists.ozlabs.org, "David S. Miller" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel From mboxrd@z Thu Jan 1 00:00:00 1970 From: joel@joelfernandes.org (Joel Fernandes) Date: Wed, 24 Oct 2018 19:09:07 -0700 Subject: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> List-ID: Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> To: linux-snps-arc@lists.infradead.org On Wed, Oct 24, 2018@03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018@10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018@01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018@06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joel Fernandes Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Date: Wed, 24 Oct 2018 19:09:07 -0700 Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-snps-arc" Errors-To: linux-snps-arc-bounces+gla-linux-snps-arc=m.gmane.org@lists.infradead.org To: "Kirill A. Shutemov" Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Balbir Singh , Dave Hansen , Will Deacon , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.ukI List-Id: kvmarm@lists.cs.columbia.edu On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joel Fernandes Subject: Re: [PATCH 2/4] mm: speed up mremap by 500x on large regions (v2) Date: Wed, 24 Oct 2018 19:09:07 -0700 Message-ID: <20181025020907.GA13560@joelaf.mtv.corp.google.com> References: <20181013013200.206928-1-joel@joelfernandes.org> <20181013013200.206928-3-joel@joelfernandes.org> <20181024101255.it4lptrjogalxbey@kshutemo-mobl1> <20181024115733.GN8537@350D> <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=rzUvdW2MB+PzMlbbuqpZ+UubxQx/LyT45D1T3uqRkgk=; b=gUpu+sYr7h/cSk G9J3x8zxLIzUZBEYdmUQcitwhnqmFGR409QgLFdoi37b3jjeF/2NMSs681CJbodXigF9T9QMCKlqY eIPCiiyv0llzxTosVTq2aBefkfeiQefMMFkn9Mol80M8QCLnMz4wrevbqwDaU+rhPSe4Wek2cpT0m f8r5PvzVqA5hWRT1hQlZUwMPXuNW/Eet/NVdP9PKOBQzHUKrw8y1RPhm3qZA1yn+P5Uag18gwZ/Bv U/YEWAI+3HW3pw9lHz0WmN7z5FcVqz5wu22djcleuT+OcADr5DWB2+ACZSig7epQJUTAxah/koYT5 qLyNqNvelFMomAnsNzZA==; DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=joelfernandes.org; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=G78I9uM4gXdiLalpqQZcUQ0k8S9gAvq6zBmyUxhGe0M=; b=LAnJhDOqLDe5rJ8W4SiC2EjUQZvQu/r5ZibVfgxKfT2epX8iHZwYQmEjB6a8btC1dW r6dDSvdCvkPgsk5m3K0hxUEc1ohpdLPr/504o7kJ7XSyqWkNXqLU77+seSO2HzOgNvQs hVyNi6MDrJlz77EflnU3vSDe0xbeuTe8cghrE= Content-Disposition: inline In-Reply-To: <20181024125724.yf6frdimjulf35do@kshutemo-mobl1> List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-snps-arc" Errors-To: linux-snps-arc-bounces+gla-linux-snps-arc=m.gmane.org@lists.infradead.org To: "Kirill A. Shutemov" Cc: linux-mips@linux-mips.org, Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Balbir Singh , Dave Hansen , Will Deacon , mhocko@kernel.org, linux-mm@kvack.org, lokeshgidra@google.com, sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, elfring@users.sourceforge.net, Jonas Bonn , kvmarm@lists.cs.columbia.edu, dancol@google.com, Yoshinori Sato , linux-xtensa@linux-xtensa.org, linux-hexagon@vger.kernel.org, Helge Deller , "maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)" , hughd@google.com, "James E.J. Bottomley" , kasan-dev@googlegroups.com, anton.ivanov@kot-begemot.co.uk, I On Wed, Oct 24, 2018 at 03:57:24PM +0300, Kirill A. Shutemov wrote: > On Wed, Oct 24, 2018 at 10:57:33PM +1100, Balbir Singh wrote: > > On Wed, Oct 24, 2018 at 01:12:56PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 12, 2018 at 06:31:58PM -0700, Joel Fernandes (Google) wrote: > > > > diff --git a/mm/mremap.c b/mm/mremap.c > > > > index 9e68a02a52b1..2fd163cff406 100644 > > > > --- a/mm/mremap.c > > > > +++ b/mm/mremap.c > > > > @@ -191,6 +191,54 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, > > > > drop_rmap_locks(vma); > > > > } > > > > > > > > +static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, > > > > + unsigned long new_addr, unsigned long old_end, > > > > + pmd_t *old_pmd, pmd_t *new_pmd, bool *need_flush) > > > > +{ > > > > + spinlock_t *old_ptl, *new_ptl; > > > > + struct mm_struct *mm = vma->vm_mm; > > > > + > > > > + if ((old_addr & ~PMD_MASK) || (new_addr & ~PMD_MASK) > > > > + || old_end - old_addr < PMD_SIZE) > > > > + return false; > > > > + > > > > + /* > > > > + * The destination pmd shouldn't be established, free_pgtables() > > > > + * should have release it. > > > > + */ > > > > + if (WARN_ON(!pmd_none(*new_pmd))) > > > > + return false; > > > > + > > > > + /* > > > > + * We don't have to worry about the ordering of src and dst > > > > + * ptlocks because exclusive mmap_sem prevents deadlock. > > > > + */ > > > > + old_ptl = pmd_lock(vma->vm_mm, old_pmd); > > > > + if (old_ptl) { > > > > > > How can it ever be false? Kirill, It cannot, you are right. I'll remove the test. By the way, there are new changes upstream by Linus which flush the TLB before releasing the ptlock instead of after. I'm guessing that patch came about because of reviews of this patch and someone spotted an issue in the existing code :) Anyway the patch in concern is: eb66ae030829 ("mremap: properly flush TLB before releasing the page") I need to rebase on top of that with appropriate modifications, but I worry that this patch will slow down performance since we have to flush at every PMD/PTE move before releasing the ptlock. Where as with my patch, the intention is to flush only at once in the end of move_page_tables. When I tried to flush TLB on every PMD move, it was quite slow on my arm64 device [2]. Further observation [1] is, it seems like the move_huge_pmds and move_ptes code is a bit sub optimal in the sense, we are acquiring and releasing the same ptlock for a bunch of PMDs if the said PMDs are on the same page-table page right? Instead we can do better by acquiring and release the ptlock less often. I think this observation [1] and the frequent TLB flush issue [2] can be solved by acquiring the ptlock once for a bunch of PMDs, move them all, then flush the tlb and then release the ptlock, and then proceed to doing the same thing for the PMDs in the next page-table page. What do you think? - Joel