From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DDB3C4360C for ; Tue, 8 Oct 2019 11:48:39 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1C678206C2 for ; Tue, 8 Oct 2019 11:48:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730668AbfJHLsi (ORCPT ); Tue, 8 Oct 2019 07:48:38 -0400 Received: from foss.arm.com ([217.140.110.172]:34512 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730316AbfJHLsi (ORCPT ); Tue, 8 Oct 2019 07:48:38 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4E0A315BE; Tue, 8 Oct 2019 04:48:37 -0700 (PDT) Received: from [10.162.40.139] (p8cg001049571a15.blr.arm.com [10.162.40.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5087D3F703; Tue, 8 Oct 2019 04:48:30 -0700 (PDT) Subject: Re: [PATCH V8 2/2] arm64/mm: Enable memory hot remove To: Catalin Marinas Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, will@kernel.org, mark.rutland@arm.com, mhocko@suse.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com, ira.weiny@intel.com References: <1569217425-23777-1-git-send-email-anshuman.khandual@arm.com> <1569217425-23777-3-git-send-email-anshuman.khandual@arm.com> <20191007141738.GA93112@E120351.arm.com> <6c277085-a430-eab4-3a4e-99fcfa170c10@arm.com> <20191008105520.GA5694@arrakis.emea.arm.com> From: Anshuman Khandual Message-ID: Date: Tue, 8 Oct 2019 17:18:53 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20191008105520.GA5694@arrakis.emea.arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/08/2019 04:25 PM, Catalin Marinas wrote: > On Tue, Oct 08, 2019 at 10:06:26AM +0530, Anshuman Khandual wrote: >> On 10/07/2019 07:47 PM, Catalin Marinas wrote: >>> On Mon, Sep 23, 2019 at 11:13:45AM +0530, Anshuman Khandual wrote: >>>> The arch code for hot-remove must tear down portions of the linear map and >>>> vmemmap corresponding to memory being removed. In both cases the page >>>> tables mapping these regions must be freed, and when sparse vmemmap is in >>>> use the memory backing the vmemmap must also be freed. >>>> >>>> This patch adds unmap_hotplug_range() and free_empty_tables() helpers which >>>> can be used to tear down either region and calls it from vmemmap_free() and >>>> ___remove_pgd_mapping(). The sparse_vmap argument determines whether the >>>> backing memory will be freed. >>> >>> Can you change the 'sparse_vmap' name to something more meaningful which >>> would suggest freeing of the backing memory? >> >> free_mapped_mem or free_backed_mem ? Even shorter forms like free_mapped or >> free_backed might do as well. Do you have a particular preference here ? But >> yes, sparse_vmap has been very much specific to vmemmap for these functions >> which are now very generic in nature. > > free_mapped would do. Sure. > >>>> +static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, >>>> + unsigned long end, bool sparse_vmap) >>>> +{ >>>> + struct page *page; >>>> + pte_t *ptep, pte; >>>> + >>>> + do { >>>> + ptep = pte_offset_kernel(pmdp, addr); >>>> + pte = READ_ONCE(*ptep); >>>> + if (pte_none(pte)) >>>> + continue; >>>> + >>>> + WARN_ON(!pte_present(pte)); >>>> + page = sparse_vmap ? pte_page(pte) : NULL; >>>> + pte_clear(&init_mm, addr, ptep); >>>> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); >>>> + if (sparse_vmap) >>>> + free_hotplug_page_range(page, PAGE_SIZE); >>> >>> You could only set 'page' if sparse_vmap (or even drop 'page' entirely). >> >> I am afraid 'page' is being used to hold pte_page(pte) extraction which >> needs to be freed (sparse_vmap) as we are going to clear the ptep entry >> in the next statement and lose access to it for good. > > You clear *ptep, not pte. Ahh, missed that. We have already captured the contents with READ_ONCE(). > >> We will need some >> where to hold onto pte_page(pte) across pte_clear() as we cannot free it >> before clearing it's entry and flushing the TLB. Hence wondering how the >> 'page' can be completely dropped. >> >>> The compiler is probably smart enough to optimise it but using a >>> pointless ternary operator just makes the code harder to follow. >> >> Not sure I got this but are you suggesting for an 'if' statement here >> >> if (sparse_vmap) >> page = pte_page(pte); >> >> instead of the current assignment ? >> >> page = sparse_vmap ? pte_page(pte) : NULL; > > I suggest: > > if (sparse_vmap) > free_hotplug_pgtable_page(pte_page(pte), PAGE_SIZE); Sure, will do. > >>>> + } while (addr += PAGE_SIZE, addr < end); >>>> +} >>> [...] >>>> +static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, >>>> + unsigned long end) >>>> +{ >>>> + pte_t *ptep, pte; >>>> + >>>> + do { >>>> + ptep = pte_offset_kernel(pmdp, addr); >>>> + pte = READ_ONCE(*ptep); >>>> + WARN_ON(!pte_none(pte)); >>>> + } while (addr += PAGE_SIZE, addr < end); >>>> +} >>>> + >>>> +static void free_empty_pmd_table(pud_t *pudp, unsigned long addr, >>>> + unsigned long end, unsigned long floor, >>>> + unsigned long ceiling) >>>> +{ >>>> + unsigned long next; >>>> + pmd_t *pmdp, pmd; >>>> + >>>> + do { >>>> + next = pmd_addr_end(addr, end); >>>> + pmdp = pmd_offset(pudp, addr); >>>> + pmd = READ_ONCE(*pmdp); >>>> + if (pmd_none(pmd)) >>>> + continue; >>>> + >>>> + WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd)); >>>> + free_empty_pte_table(pmdp, addr, next); >>>> + free_pte_table(pmdp, addr, next, floor, ceiling); >>> >>> Do we need two closely named functions here? Can you not collapse >>> free_empty_pud_table() and free_pte_table() into a single one? The same >>> comment for the pmd/pud variants. I just find this confusing. >> >> The two functions could be collapsed into a single one. But just wanted to >> keep free_pxx_table() part which checks floor/ceiling alignment, non-zero >> entries clear off the actual page table walking. > > With the pmd variant, they both take the floor/ceiling argument while > the free_empty_pte_table() doesn't even free anything. So not entirely > consistent.> > Can you not just copy the free_pgd_range() functions but instead of > p*d_free_tlb() just do the TLB invalidation followed by page freeing? > That seems to be an easier pattern to follow. > Sure, will follow that pattern. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 621C0C4360C for ; Tue, 8 Oct 2019 11:48:47 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 310F0206C2 for ; Tue, 8 Oct 2019 11:48:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="fszRJWJQ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 310F0206C2 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date: Message-ID:From:References:To:Subject:Reply-To:Content-ID:Content-Description :Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=sSu8VBlc0sH1ij+jxzmabbz2TZvss2HbiHFuwCXcG38=; b=fszRJWJQZGHrHj OiDINHBYntx3Z6j/uzA19au21X8YTebAqokUduQGhafMTm0Mk4mMCYqm8DQmJc0uA9RaztQ6ONpJJ Cgpt/1tOD3O1Wqwc7pRbmUqgaJElKrtWffFO08974eHVK2UmiaPlCcCfnOq1V9bXV5h/Fnmh5nWc+ IFxLAK08w+HakVTRsXZVdSEv9N9C+7gBHNYiHS0d+V1wy2cr+c+SkGAP+hWbde7SDerdcliIFKmXM pcpX5THBwoEwiWCq/gvP7TvjF1exRwnqCoJFMvoZmYoShwstYcx8+TVCpj2kTf744ueZVNQiEgYdJ GQAFRflAmZ/Pv8Ord/cg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iHnys-0006A2-9J; Tue, 08 Oct 2019 11:48:42 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92.2 #3 (Red Hat Linux)) id 1iHnyo-000699-C5 for linux-arm-kernel@lists.infradead.org; Tue, 08 Oct 2019 11:48:39 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 4E0A315BE; Tue, 8 Oct 2019 04:48:37 -0700 (PDT) Received: from [10.162.40.139] (p8cg001049571a15.blr.arm.com [10.162.40.139]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5087D3F703; Tue, 8 Oct 2019 04:48:30 -0700 (PDT) Subject: Re: [PATCH V8 2/2] arm64/mm: Enable memory hot remove To: Catalin Marinas References: <1569217425-23777-1-git-send-email-anshuman.khandual@arm.com> <1569217425-23777-3-git-send-email-anshuman.khandual@arm.com> <20191007141738.GA93112@E120351.arm.com> <6c277085-a430-eab4-3a4e-99fcfa170c10@arm.com> <20191008105520.GA5694@arrakis.emea.arm.com> From: Anshuman Khandual Message-ID: Date: Tue, 8 Oct 2019 17:18:53 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <20191008105520.GA5694@arrakis.emea.arm.com> Content-Language: en-US X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20191008_044838_502057_DD5F0A68 X-CRM114-Status: GOOD ( 24.60 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, mhocko@suse.com, david@redhat.com, linux-mm@kvack.org, arunks@codeaurora.org, cpandya@codeaurora.org, will@kernel.org, ira.weiny@intel.com, steven.price@arm.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, Robin.Murphy@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, dan.j.williams@intel.com, linux-arm-kernel@lists.infradead.org, osalvador@suse.de, steve.capper@arm.com, logang@deltatee.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@techsingularity.net Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 10/08/2019 04:25 PM, Catalin Marinas wrote: > On Tue, Oct 08, 2019 at 10:06:26AM +0530, Anshuman Khandual wrote: >> On 10/07/2019 07:47 PM, Catalin Marinas wrote: >>> On Mon, Sep 23, 2019 at 11:13:45AM +0530, Anshuman Khandual wrote: >>>> The arch code for hot-remove must tear down portions of the linear map and >>>> vmemmap corresponding to memory being removed. In both cases the page >>>> tables mapping these regions must be freed, and when sparse vmemmap is in >>>> use the memory backing the vmemmap must also be freed. >>>> >>>> This patch adds unmap_hotplug_range() and free_empty_tables() helpers which >>>> can be used to tear down either region and calls it from vmemmap_free() and >>>> ___remove_pgd_mapping(). The sparse_vmap argument determines whether the >>>> backing memory will be freed. >>> >>> Can you change the 'sparse_vmap' name to something more meaningful which >>> would suggest freeing of the backing memory? >> >> free_mapped_mem or free_backed_mem ? Even shorter forms like free_mapped or >> free_backed might do as well. Do you have a particular preference here ? But >> yes, sparse_vmap has been very much specific to vmemmap for these functions >> which are now very generic in nature. > > free_mapped would do. Sure. > >>>> +static void unmap_hotplug_pte_range(pmd_t *pmdp, unsigned long addr, >>>> + unsigned long end, bool sparse_vmap) >>>> +{ >>>> + struct page *page; >>>> + pte_t *ptep, pte; >>>> + >>>> + do { >>>> + ptep = pte_offset_kernel(pmdp, addr); >>>> + pte = READ_ONCE(*ptep); >>>> + if (pte_none(pte)) >>>> + continue; >>>> + >>>> + WARN_ON(!pte_present(pte)); >>>> + page = sparse_vmap ? pte_page(pte) : NULL; >>>> + pte_clear(&init_mm, addr, ptep); >>>> + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); >>>> + if (sparse_vmap) >>>> + free_hotplug_page_range(page, PAGE_SIZE); >>> >>> You could only set 'page' if sparse_vmap (or even drop 'page' entirely). >> >> I am afraid 'page' is being used to hold pte_page(pte) extraction which >> needs to be freed (sparse_vmap) as we are going to clear the ptep entry >> in the next statement and lose access to it for good. > > You clear *ptep, not pte. Ahh, missed that. We have already captured the contents with READ_ONCE(). > >> We will need some >> where to hold onto pte_page(pte) across pte_clear() as we cannot free it >> before clearing it's entry and flushing the TLB. Hence wondering how the >> 'page' can be completely dropped. >> >>> The compiler is probably smart enough to optimise it but using a >>> pointless ternary operator just makes the code harder to follow. >> >> Not sure I got this but are you suggesting for an 'if' statement here >> >> if (sparse_vmap) >> page = pte_page(pte); >> >> instead of the current assignment ? >> >> page = sparse_vmap ? pte_page(pte) : NULL; > > I suggest: > > if (sparse_vmap) > free_hotplug_pgtable_page(pte_page(pte), PAGE_SIZE); Sure, will do. > >>>> + } while (addr += PAGE_SIZE, addr < end); >>>> +} >>> [...] >>>> +static void free_empty_pte_table(pmd_t *pmdp, unsigned long addr, >>>> + unsigned long end) >>>> +{ >>>> + pte_t *ptep, pte; >>>> + >>>> + do { >>>> + ptep = pte_offset_kernel(pmdp, addr); >>>> + pte = READ_ONCE(*ptep); >>>> + WARN_ON(!pte_none(pte)); >>>> + } while (addr += PAGE_SIZE, addr < end); >>>> +} >>>> + >>>> +static void free_empty_pmd_table(pud_t *pudp, unsigned long addr, >>>> + unsigned long end, unsigned long floor, >>>> + unsigned long ceiling) >>>> +{ >>>> + unsigned long next; >>>> + pmd_t *pmdp, pmd; >>>> + >>>> + do { >>>> + next = pmd_addr_end(addr, end); >>>> + pmdp = pmd_offset(pudp, addr); >>>> + pmd = READ_ONCE(*pmdp); >>>> + if (pmd_none(pmd)) >>>> + continue; >>>> + >>>> + WARN_ON(!pmd_present(pmd) || !pmd_table(pmd) || pmd_sect(pmd)); >>>> + free_empty_pte_table(pmdp, addr, next); >>>> + free_pte_table(pmdp, addr, next, floor, ceiling); >>> >>> Do we need two closely named functions here? Can you not collapse >>> free_empty_pud_table() and free_pte_table() into a single one? The same >>> comment for the pmd/pud variants. I just find this confusing. >> >> The two functions could be collapsed into a single one. But just wanted to >> keep free_pxx_table() part which checks floor/ceiling alignment, non-zero >> entries clear off the actual page table walking. > > With the pmd variant, they both take the floor/ceiling argument while > the free_empty_pte_table() doesn't even free anything. So not entirely > consistent.> > Can you not just copy the free_pgd_range() functions but instead of > p*d_free_tlb() just do the TLB invalidation followed by page freeing? > That seems to be an easier pattern to follow. > Sure, will follow that pattern. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel