From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 773F6C4360F for ; Wed, 3 Apr 2019 12:38:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 1CC2C2084C for ; Wed, 3 Apr 2019 12:38:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726562AbfDCMiC (ORCPT ); Wed, 3 Apr 2019 08:38:02 -0400 Received: from foss.arm.com ([217.140.101.70]:39122 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726097AbfDCMiB (ORCPT ); Wed, 3 Apr 2019 08:38:01 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9CD35374; Wed, 3 Apr 2019 05:38:00 -0700 (PDT) Received: from [10.1.196.75] (e110467-lin.cambridge.arm.com [10.1.196.75]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E26993F59C; Wed, 3 Apr 2019 05:37:56 -0700 (PDT) Subject: Re: [PATCH 2/6] arm64/mm: Enable memory hot remove To: Anshuman Khandual , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, akpm@linux-foundation.org, will.deacon@arm.com, catalin.marinas@arm.com Cc: mhocko@suse.com, mgorman@techsingularity.net, james.morse@arm.com, mark.rutland@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, osalvador@suse.de, logang@deltatee.com, pasha.tatashin@oracle.com, david@redhat.com, cai@lca.pw, Steven Price References: <1554265806-11501-1-git-send-email-anshuman.khandual@arm.com> <1554265806-11501-3-git-send-email-anshuman.khandual@arm.com> From: Robin Murphy Message-ID: Date: Wed, 3 Apr 2019 13:37:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <1554265806-11501-3-git-send-email-anshuman.khandual@arm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org [ +Steve ] Hi Anshuman, On 03/04/2019 05:30, Anshuman Khandual wrote: > Memory removal from an arch perspective involves tearing down two different > kernel based mappings i.e vmemmap and linear while releasing related page > table pages allocated for the physical memory range to be removed. > > Define a common kernel page table tear down helper remove_pagetable() which > can be used to unmap given kernel virtual address range. In effect it can > tear down both vmemap or kernel linear mappings. This new helper is called > from both vmemamp_free() and ___remove_pgd_mapping() during memory removal. > The argument 'direct' here identifies kernel linear mappings. > > Vmemmap mappings page table pages are allocated through sparse mem helper > functions like vmemmap_alloc_block() which does not cycle the pages through > pgtable_page_ctor() constructs. Hence while removing it skips corresponding > destructor construct pgtable_page_dtor(). > > While here update arch_add_mempory() to handle __add_pages() failures by > just unmapping recently added kernel linear mapping. Now enable memory hot > remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. > > This implementation is overall inspired from kernel page table tear down > procedure on X86 architecture. A bit of a nit, but since this depends on at least patch #4 to work properly, it would be good to reorder the series appropriately. > Signed-off-by: Anshuman Khandual > --- > arch/arm64/Kconfig | 3 + > arch/arm64/include/asm/pgtable.h | 14 +++ > arch/arm64/mm/mmu.c | 227 ++++++++++++++++++++++++++++++++++++++- > 3 files changed, 241 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2418fb..db3e625 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -266,6 +266,9 @@ config HAVE_GENERIC_GUP > config ARCH_ENABLE_MEMORY_HOTPLUG > def_bool y > > +config ARCH_ENABLE_MEMORY_HOTREMOVE > + def_bool y > + > config ARCH_MEMORY_PROBE > bool "Enable /sys/devices/system/memory/probe interface" > depends on MEMORY_HOTPLUG > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index de70c1e..858098e 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -355,6 +355,18 @@ static inline int pmd_protnone(pmd_t pmd) > } > #endif > > +#if (CONFIG_PGTABLE_LEVELS > 2) > +#define pmd_large(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) > +#else > +#define pmd_large(pmd) 0 > +#endif > + > +#if (CONFIG_PGTABLE_LEVELS > 3) > +#define pud_large(pud) (pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT)) > +#else > +#define pud_large(pmd) 0 > +#endif These seem rather different from the versions that Steve is proposing in the generic pagewalk series - can you reach an agreement on which implementation is preferred? > + > /* > * THP definitions. > */ > @@ -555,6 +567,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud) > > #else > > +#define pmd_index(addr) 0 > #define pud_page_paddr(pud) ({ BUILD_BUG(); 0; }) > > /* Match pmd_offset folding in */ > @@ -612,6 +625,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd) > > #else > > +#define pud_index(adrr) 0 > #define pgd_page_paddr(pgd) ({ BUILD_BUG(); 0;}) > > /* Match pud_offset folding in */ > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index e97f018..ae0777b 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -714,6 +714,198 @@ int kern_addr_valid(unsigned long addr) > > return pfn_valid(pte_pfn(pte)); > } > + > +#ifdef CONFIG_MEMORY_HOTPLUG > +static void __meminit free_pagetable(struct page *page, int order) Do these need to be __meminit? AFAICS it's effectively redundant with the containing #ifdef, and removal feels like it's inherently a later-than-init thing anyway. > +{ > + unsigned long magic; > + unsigned int nr_pages = 1 << order; > + > + if (PageReserved(page)) { > + __ClearPageReserved(page); > + > + magic = (unsigned long)page->freelist; > + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { > + while (nr_pages--) > + put_page_bootmem(page++); > + } else > + while (nr_pages--) > + free_reserved_page(page++); > + } else > + free_pages((unsigned long)page_address(page), order); > +} > + > +#if (CONFIG_PGTABLE_LEVELS > 2) > +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd, bool direct) > +{ > + pte_t *pte; > + int i; > + > + for (i = 0; i < PTRS_PER_PTE; i++) { > + pte = pte_start + i; > + if (!pte_none(*pte)) > + return; > + } > + > + if (direct) > + pgtable_page_dtor(pmd_page(*pmd)); > + free_pagetable(pmd_page(*pmd), 0); > + spin_lock(&init_mm.page_table_lock); > + pmd_clear(pmd); > + spin_unlock(&init_mm.page_table_lock); > +} > +#else > +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd, bool direct) > +{ > +} > +#endif > + > +#if (CONFIG_PGTABLE_LEVELS > 3) > +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool direct) > +{ > + pmd_t *pmd; > + int i; > + > + for (i = 0; i < PTRS_PER_PMD; i++) { > + pmd = pmd_start + i; > + if (!pmd_none(*pmd)) > + return; > + } > + > + if (direct) > + pgtable_page_dtor(pud_page(*pud)); > + free_pagetable(pud_page(*pud), 0); > + spin_lock(&init_mm.page_table_lock); > + pud_clear(pud); > + spin_unlock(&init_mm.page_table_lock); > +} > + > +static void __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd, bool direct) > +{ > + pud_t *pud; > + int i; > + > + for (i = 0; i < PTRS_PER_PUD; i++) { > + pud = pud_start + i; > + if (!pud_none(*pud)) > + return; > + } > + > + if (direct) > + pgtable_page_dtor(pgd_page(*pgd)); > + free_pagetable(pgd_page(*pgd), 0); > + spin_lock(&init_mm.page_table_lock); > + pgd_clear(pgd); > + spin_unlock(&init_mm.page_table_lock); > +} > +#else > +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool direct) > +{ > +} > + > +static void __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd, bool direct) > +{ > +} > +#endif > + > +static void __meminit > +remove_pte_table(pte_t *pte_start, unsigned long addr, > + unsigned long end, bool direct) > +{ > + pte_t *pte; > + > + pte = pte_start + pte_index(addr); > + for (; addr < end; addr += PAGE_SIZE, pte++) { > + if (!pte_present(*pte)) > + continue; > + > + if (!direct) > + free_pagetable(pte_page(*pte), 0); > + spin_lock(&init_mm.page_table_lock); > + pte_clear(&init_mm, addr, pte); > + spin_unlock(&init_mm.page_table_lock); > + } > +} > + > +static void __meminit > +remove_pmd_table(pmd_t *pmd_start, unsigned long addr, > + unsigned long end, bool direct) > +{ > + unsigned long next; > + pte_t *pte_base; > + pmd_t *pmd; > + > + pmd = pmd_start + pmd_index(addr); > + for (; addr < end; addr = next, pmd++) { > + next = pmd_addr_end(addr, end); > + if (!pmd_present(*pmd)) > + continue; > + > + if (pmd_large(*pmd)) { > + if (!direct) > + free_pagetable(pmd_page(*pmd), > + get_order(PMD_SIZE)); > + spin_lock(&init_mm.page_table_lock); > + pmd_clear(pmd); > + spin_unlock(&init_mm.page_table_lock); > + continue; > + } > + pte_base = pte_offset_kernel(pmd, 0UL); > + remove_pte_table(pte_base, addr, next, direct); > + free_pte_table(pte_base, pmd, direct); > + } > +} > + > +static void __meminit > +remove_pud_table(pud_t *pud_start, unsigned long addr, > + unsigned long end, bool direct) > +{ > + unsigned long next; > + pmd_t *pmd_base; > + pud_t *pud; > + > + pud = pud_start + pud_index(addr); > + for (; addr < end; addr = next, pud++) { > + next = pud_addr_end(addr, end); > + if (!pud_present(*pud)) > + continue; > + > + if (pud_large(*pud)) { > + if (!direct) > + free_pagetable(pud_page(*pud), > + get_order(PUD_SIZE)); > + spin_lock(&init_mm.page_table_lock); > + pud_clear(pud); > + spin_unlock(&init_mm.page_table_lock); > + continue; > + } > + pmd_base = pmd_offset(pud, 0UL); > + remove_pmd_table(pmd_base, addr, next, direct); > + free_pmd_table(pmd_base, pud, direct); > + } > +} > + > +static void __meminit > +remove_pagetable(unsigned long start, unsigned long end, bool direct) > +{ > + unsigned long addr, next; > + pud_t *pud_base; > + pgd_t *pgd; > + > + for (addr = start; addr < end; addr = next) { > + next = pgd_addr_end(addr, end); > + pgd = pgd_offset_k(addr); > + if (!pgd_present(*pgd)) > + continue; > + > + pud_base = pud_offset(pgd, 0UL); > + remove_pud_table(pud_base, addr, next, direct); > + free_pud_table(pud_base, pgd, direct); > + } > + flush_tlb_kernel_range(start, end); > +} > +#endif > + > #ifdef CONFIG_SPARSEMEM_VMEMMAP > #if !ARM64_SWAPPER_USES_SECTION_MAPS > int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > @@ -758,9 +950,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > return 0; > } > #endif /* CONFIG_ARM64_64K_PAGES */ > -void vmemmap_free(unsigned long start, unsigned long end, > +void __ref vmemmap_free(unsigned long start, unsigned long end, Why is the __ref needed? Presumably it's avoidable by addressing the __meminit thing above. > struct vmem_altmap *altmap) > { > +#ifdef CONFIG_MEMORY_HOTPLUG > + remove_pagetable(start, end, false); > +#endif > } > #endif /* CONFIG_SPARSEMEM_VMEMMAP */ > > @@ -1046,10 +1241,16 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) > } > > #ifdef CONFIG_MEMORY_HOTPLUG > +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) > +{ > + WARN_ON(pgdir != init_mm.pgd); > + remove_pagetable(start, start + size, true); > +} > + > int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, > bool want_memblock) > { > - int flags = 0; > + int flags = 0, ret = 0; Initialising ret here is unnecessary. Robin. > > if (rodata_full || debug_pagealloc_enabled()) > flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > @@ -1057,7 +1258,27 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, > __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start), > size, PAGE_KERNEL, pgd_pgtable_alloc, flags); > > - return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, > + ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, > altmap, want_memblock); > + if (ret) > + __remove_pgd_mapping(swapper_pg_dir, > + __phys_to_virt(start), size); > + return ret; > } > + > +#ifdef CONFIG_MEMORY_HOTREMOVE > +int arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap) > +{ > + unsigned long start_pfn = start >> PAGE_SHIFT; > + unsigned long nr_pages = size >> PAGE_SHIFT; > + struct zone *zone = page_zone(pfn_to_page(start_pfn)); > + int ret; > + > + ret = __remove_pages(zone, start_pfn, nr_pages, altmap); > + if (!ret) > + __remove_pgd_mapping(swapper_pg_dir, > + __phys_to_virt(start), size); > + return ret; > +} > +#endif > #endif > From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.0 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69EB4C10F06 for ; Wed, 3 Apr 2019 12:38:11 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 315562084C for ; Wed, 3 Apr 2019 12:38:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="WKHEG4Jo" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 315562084C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender:Content-Type: Content-Transfer-Encoding:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:Date:Message-ID:From: References:To:Subject:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=ZFRSOZb4OfmMNx4ED5/bqsLuZ4BJavHo9HWwp62T7s8=; b=WKHEG4JoyL9aeAJ3gg8gEWYTQ 7Na+R0W0jFhUwV7d7EomuA1ZhRMUUK+fZsW8esX+pCZmmx0YA8R824i1NT2Zdq9/GK5HPqyFvL8ME RFUoQ/ALloZ+A3rBBE0HcwIelxmbgDWZ0zCWXinkV1m1Gcz0BKXHL5jDXagwGYokHOeYHif4wAThn C8tSk7KNFn5uF9BgDF7PTuwmOpa/s8xFW9aK5BqVdIwDXVL9DBpy5j7aUxhaSOnylhdK+gvlIPAID YmVnMfJitEEimlqz8lwxlTxnjWYdg+8PiV/I+F8YzIUx4lDKyoW0yaERWu/CR5BijxBK37uRKAUkN GgYVpf4QQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hBf9a-0004f8-T7; Wed, 03 Apr 2019 12:38:06 +0000 Received: from usa-sjc-mx-foss1.foss.arm.com ([217.140.101.70] helo=foss.arm.com) by bombadil.infradead.org with esmtp (Exim 4.90_1 #2 (Red Hat Linux)) id 1hBf9W-0004e6-Vs for linux-arm-kernel@lists.infradead.org; Wed, 03 Apr 2019 12:38:04 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.72.51.249]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9CD35374; Wed, 3 Apr 2019 05:38:00 -0700 (PDT) Received: from [10.1.196.75] (e110467-lin.cambridge.arm.com [10.1.196.75]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E26993F59C; Wed, 3 Apr 2019 05:37:56 -0700 (PDT) Subject: Re: [PATCH 2/6] arm64/mm: Enable memory hot remove To: Anshuman Khandual , linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, akpm@linux-foundation.org, will.deacon@arm.com, catalin.marinas@arm.com References: <1554265806-11501-1-git-send-email-anshuman.khandual@arm.com> <1554265806-11501-3-git-send-email-anshuman.khandual@arm.com> From: Robin Murphy Message-ID: Date: Wed, 3 Apr 2019 13:37:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.5.1 MIME-Version: 1.0 In-Reply-To: <1554265806-11501-3-git-send-email-anshuman.khandual@arm.com> Content-Language: en-GB X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190403_053803_035698_57A2D4EA X-CRM114-Status: GOOD ( 29.89 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, mhocko@suse.com, david@redhat.com, logang@deltatee.com, cai@lca.pw, pasha.tatashin@oracle.com, Steven Price , james.morse@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org [ +Steve ] Hi Anshuman, On 03/04/2019 05:30, Anshuman Khandual wrote: > Memory removal from an arch perspective involves tearing down two different > kernel based mappings i.e vmemmap and linear while releasing related page > table pages allocated for the physical memory range to be removed. > > Define a common kernel page table tear down helper remove_pagetable() which > can be used to unmap given kernel virtual address range. In effect it can > tear down both vmemap or kernel linear mappings. This new helper is called > from both vmemamp_free() and ___remove_pgd_mapping() during memory removal. > The argument 'direct' here identifies kernel linear mappings. > > Vmemmap mappings page table pages are allocated through sparse mem helper > functions like vmemmap_alloc_block() which does not cycle the pages through > pgtable_page_ctor() constructs. Hence while removing it skips corresponding > destructor construct pgtable_page_dtor(). > > While here update arch_add_mempory() to handle __add_pages() failures by > just unmapping recently added kernel linear mapping. Now enable memory hot > remove on arm64 platforms by default with ARCH_ENABLE_MEMORY_HOTREMOVE. > > This implementation is overall inspired from kernel page table tear down > procedure on X86 architecture. A bit of a nit, but since this depends on at least patch #4 to work properly, it would be good to reorder the series appropriately. > Signed-off-by: Anshuman Khandual > --- > arch/arm64/Kconfig | 3 + > arch/arm64/include/asm/pgtable.h | 14 +++ > arch/arm64/mm/mmu.c | 227 ++++++++++++++++++++++++++++++++++++++- > 3 files changed, 241 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a2418fb..db3e625 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -266,6 +266,9 @@ config HAVE_GENERIC_GUP > config ARCH_ENABLE_MEMORY_HOTPLUG > def_bool y > > +config ARCH_ENABLE_MEMORY_HOTREMOVE > + def_bool y > + > config ARCH_MEMORY_PROBE > bool "Enable /sys/devices/system/memory/probe interface" > depends on MEMORY_HOTPLUG > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h > index de70c1e..858098e 100644 > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -355,6 +355,18 @@ static inline int pmd_protnone(pmd_t pmd) > } > #endif > > +#if (CONFIG_PGTABLE_LEVELS > 2) > +#define pmd_large(pmd) (pmd_val(pmd) && !(pmd_val(pmd) & PMD_TABLE_BIT)) > +#else > +#define pmd_large(pmd) 0 > +#endif > + > +#if (CONFIG_PGTABLE_LEVELS > 3) > +#define pud_large(pud) (pud_val(pud) && !(pud_val(pud) & PUD_TABLE_BIT)) > +#else > +#define pud_large(pmd) 0 > +#endif These seem rather different from the versions that Steve is proposing in the generic pagewalk series - can you reach an agreement on which implementation is preferred? > + > /* > * THP definitions. > */ > @@ -555,6 +567,7 @@ static inline phys_addr_t pud_page_paddr(pud_t pud) > > #else > > +#define pmd_index(addr) 0 > #define pud_page_paddr(pud) ({ BUILD_BUG(); 0; }) > > /* Match pmd_offset folding in */ > @@ -612,6 +625,7 @@ static inline phys_addr_t pgd_page_paddr(pgd_t pgd) > > #else > > +#define pud_index(adrr) 0 > #define pgd_page_paddr(pgd) ({ BUILD_BUG(); 0;}) > > /* Match pud_offset folding in */ > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index e97f018..ae0777b 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -714,6 +714,198 @@ int kern_addr_valid(unsigned long addr) > > return pfn_valid(pte_pfn(pte)); > } > + > +#ifdef CONFIG_MEMORY_HOTPLUG > +static void __meminit free_pagetable(struct page *page, int order) Do these need to be __meminit? AFAICS it's effectively redundant with the containing #ifdef, and removal feels like it's inherently a later-than-init thing anyway. > +{ > + unsigned long magic; > + unsigned int nr_pages = 1 << order; > + > + if (PageReserved(page)) { > + __ClearPageReserved(page); > + > + magic = (unsigned long)page->freelist; > + if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) { > + while (nr_pages--) > + put_page_bootmem(page++); > + } else > + while (nr_pages--) > + free_reserved_page(page++); > + } else > + free_pages((unsigned long)page_address(page), order); > +} > + > +#if (CONFIG_PGTABLE_LEVELS > 2) > +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd, bool direct) > +{ > + pte_t *pte; > + int i; > + > + for (i = 0; i < PTRS_PER_PTE; i++) { > + pte = pte_start + i; > + if (!pte_none(*pte)) > + return; > + } > + > + if (direct) > + pgtable_page_dtor(pmd_page(*pmd)); > + free_pagetable(pmd_page(*pmd), 0); > + spin_lock(&init_mm.page_table_lock); > + pmd_clear(pmd); > + spin_unlock(&init_mm.page_table_lock); > +} > +#else > +static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd, bool direct) > +{ > +} > +#endif > + > +#if (CONFIG_PGTABLE_LEVELS > 3) > +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool direct) > +{ > + pmd_t *pmd; > + int i; > + > + for (i = 0; i < PTRS_PER_PMD; i++) { > + pmd = pmd_start + i; > + if (!pmd_none(*pmd)) > + return; > + } > + > + if (direct) > + pgtable_page_dtor(pud_page(*pud)); > + free_pagetable(pud_page(*pud), 0); > + spin_lock(&init_mm.page_table_lock); > + pud_clear(pud); > + spin_unlock(&init_mm.page_table_lock); > +} > + > +static void __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd, bool direct) > +{ > + pud_t *pud; > + int i; > + > + for (i = 0; i < PTRS_PER_PUD; i++) { > + pud = pud_start + i; > + if (!pud_none(*pud)) > + return; > + } > + > + if (direct) > + pgtable_page_dtor(pgd_page(*pgd)); > + free_pagetable(pgd_page(*pgd), 0); > + spin_lock(&init_mm.page_table_lock); > + pgd_clear(pgd); > + spin_unlock(&init_mm.page_table_lock); > +} > +#else > +static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud, bool direct) > +{ > +} > + > +static void __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd, bool direct) > +{ > +} > +#endif > + > +static void __meminit > +remove_pte_table(pte_t *pte_start, unsigned long addr, > + unsigned long end, bool direct) > +{ > + pte_t *pte; > + > + pte = pte_start + pte_index(addr); > + for (; addr < end; addr += PAGE_SIZE, pte++) { > + if (!pte_present(*pte)) > + continue; > + > + if (!direct) > + free_pagetable(pte_page(*pte), 0); > + spin_lock(&init_mm.page_table_lock); > + pte_clear(&init_mm, addr, pte); > + spin_unlock(&init_mm.page_table_lock); > + } > +} > + > +static void __meminit > +remove_pmd_table(pmd_t *pmd_start, unsigned long addr, > + unsigned long end, bool direct) > +{ > + unsigned long next; > + pte_t *pte_base; > + pmd_t *pmd; > + > + pmd = pmd_start + pmd_index(addr); > + for (; addr < end; addr = next, pmd++) { > + next = pmd_addr_end(addr, end); > + if (!pmd_present(*pmd)) > + continue; > + > + if (pmd_large(*pmd)) { > + if (!direct) > + free_pagetable(pmd_page(*pmd), > + get_order(PMD_SIZE)); > + spin_lock(&init_mm.page_table_lock); > + pmd_clear(pmd); > + spin_unlock(&init_mm.page_table_lock); > + continue; > + } > + pte_base = pte_offset_kernel(pmd, 0UL); > + remove_pte_table(pte_base, addr, next, direct); > + free_pte_table(pte_base, pmd, direct); > + } > +} > + > +static void __meminit > +remove_pud_table(pud_t *pud_start, unsigned long addr, > + unsigned long end, bool direct) > +{ > + unsigned long next; > + pmd_t *pmd_base; > + pud_t *pud; > + > + pud = pud_start + pud_index(addr); > + for (; addr < end; addr = next, pud++) { > + next = pud_addr_end(addr, end); > + if (!pud_present(*pud)) > + continue; > + > + if (pud_large(*pud)) { > + if (!direct) > + free_pagetable(pud_page(*pud), > + get_order(PUD_SIZE)); > + spin_lock(&init_mm.page_table_lock); > + pud_clear(pud); > + spin_unlock(&init_mm.page_table_lock); > + continue; > + } > + pmd_base = pmd_offset(pud, 0UL); > + remove_pmd_table(pmd_base, addr, next, direct); > + free_pmd_table(pmd_base, pud, direct); > + } > +} > + > +static void __meminit > +remove_pagetable(unsigned long start, unsigned long end, bool direct) > +{ > + unsigned long addr, next; > + pud_t *pud_base; > + pgd_t *pgd; > + > + for (addr = start; addr < end; addr = next) { > + next = pgd_addr_end(addr, end); > + pgd = pgd_offset_k(addr); > + if (!pgd_present(*pgd)) > + continue; > + > + pud_base = pud_offset(pgd, 0UL); > + remove_pud_table(pud_base, addr, next, direct); > + free_pud_table(pud_base, pgd, direct); > + } > + flush_tlb_kernel_range(start, end); > +} > +#endif > + > #ifdef CONFIG_SPARSEMEM_VMEMMAP > #if !ARM64_SWAPPER_USES_SECTION_MAPS > int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > @@ -758,9 +950,12 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > return 0; > } > #endif /* CONFIG_ARM64_64K_PAGES */ > -void vmemmap_free(unsigned long start, unsigned long end, > +void __ref vmemmap_free(unsigned long start, unsigned long end, Why is the __ref needed? Presumably it's avoidable by addressing the __meminit thing above. > struct vmem_altmap *altmap) > { > +#ifdef CONFIG_MEMORY_HOTPLUG > + remove_pagetable(start, end, false); > +#endif > } > #endif /* CONFIG_SPARSEMEM_VMEMMAP */ > > @@ -1046,10 +1241,16 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr) > } > > #ifdef CONFIG_MEMORY_HOTPLUG > +static void __remove_pgd_mapping(pgd_t *pgdir, unsigned long start, u64 size) > +{ > + WARN_ON(pgdir != init_mm.pgd); > + remove_pagetable(start, start + size, true); > +} > + > int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, > bool want_memblock) > { > - int flags = 0; > + int flags = 0, ret = 0; Initialising ret here is unnecessary. Robin. > > if (rodata_full || debug_pagealloc_enabled()) > flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS; > @@ -1057,7 +1258,27 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap, > __create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start), > size, PAGE_KERNEL, pgd_pgtable_alloc, flags); > > - return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, > + ret = __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT, > altmap, want_memblock); > + if (ret) > + __remove_pgd_mapping(swapper_pg_dir, > + __phys_to_virt(start), size); > + return ret; > } > + > +#ifdef CONFIG_MEMORY_HOTREMOVE > +int arch_remove_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap) > +{ > + unsigned long start_pfn = start >> PAGE_SHIFT; > + unsigned long nr_pages = size >> PAGE_SHIFT; > + struct zone *zone = page_zone(pfn_to_page(start_pfn)); > + int ret; > + > + ret = __remove_pages(zone, start_pfn, nr_pages, altmap); > + if (!ret) > + __remove_pgd_mapping(swapper_pg_dir, > + __phys_to_virt(start), size); > + return ret; > +} > +#endif > #endif > _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel