From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB55AC31E5E for ; Wed, 19 Jun 2019 04:17:37 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 81FCC2082C for ; Wed, 19 Jun 2019 04:17:37 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726148AbfFSERc (ORCPT ); Wed, 19 Jun 2019 00:17:32 -0400 Received: from foss.arm.com ([217.140.110.172]:46656 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726009AbfFSERc (ORCPT ); Wed, 19 Jun 2019 00:17:32 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D88EA360; Tue, 18 Jun 2019 21:17:31 -0700 (PDT) Received: from p8cg001049571a15.blr.arm.com (p8cg001049571a15.blr.arm.com [10.162.43.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 350E33F718; Tue, 18 Jun 2019 21:17:24 -0700 (PDT) From: Anshuman Khandual To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, catalin.marinas@arm.com, will.deacon@arm.com Cc: mark.rutland@arm.com, mhocko@suse.com, ira.weiny@intel.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, james.morse@arm.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com Subject: [PATCH V6 0/3] arm64/mm: Enable memory hot remove Date: Wed, 19 Jun 2019 09:47:37 +0530 Message-Id: <1560917860-26169-1-git-send-email-anshuman.khandual@arm.com> X-Mailer: git-send-email 2.7.4 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This series enables memory hot remove on arm64 after fixing a memblock removal ordering problem in generic try_remove_memory() and a possible arm64 platform specific kernel page table race condition. This series is based on linux-next (next-20190613). Concurrent vmalloc() and hot-remove conflict: As pointed out earlier on the v5 thread [2] there can be potential conflict between concurrent vmalloc() and memory hot-remove operation. This can be solved or at least avoided with some possible methods. The problem here is caused by inadequate locking in vmalloc() which protects installation of a page table page but not the walk or the leaf entry modification. Option 1: Making locking in vmalloc() adequate Current locking scheme protects installation of page table pages but not the page table walk or leaf entry creation which can conflict with hot-remove. This scheme is sufficient for now as vmalloc() works on mutually exclusive ranges which can proceed concurrently only if their shared page table pages can be created while inside the lock. It achieves performance improvement which will be compromised if entire vmalloc() operation (even if with some optimization) has to be completed under a lock. Option 2: Making sure hot-remove does not happen during vmalloc() Take mem_hotplug_lock in read mode through [get|put]_online_mems() constructs for the entire duration of vmalloc(). It protects from concurrent memory hot remove operation and does not add any significant overhead to other concurrent vmalloc() threads. It solves the problem in right way unless we do not want to extend the usage of mem_hotplug_lock in generic MM. Option 3: Memory hot-remove does not free (conflicting) page table pages Don't not free page table pages (if any) for vmemmap mappings after unmapping it's virtual range. The only downside here is that some page table pages might remain empty and unused until next memory hot-add operation of the same memory range. Option 4: Dont let vmalloc and vmemmap share intermediate page table pages The conflict does not arise if vmalloc and vmemap range do not share kernel page table pages to start with. If such placement can be ensured in platform kernel virtual address layout, this problem can be successfully avoided. There are two generic solutions (Option 1 and 2) and two platform specific solutions (Options 2 and 3). This series has decided to go with (Option 3) which requires minimum changes while self-contained inside the functionality. Testing: Memory hot remove has been tested on arm64 for 4K, 16K, 64K page config options with all possible CONFIG_ARM64_VA_BITS and CONFIG_PGTABLE_LEVELS combinations. Its only build tested on non-arm64 platforms. Changes in V6: - Implemented most of the suggestions from Mark Rutland - Added in ptdump - remove_pagetable() now has two distinct passes over the kernel page table - First pass unmap_hotplug_range() removes leaf level entries at all level - Second pass free_empty_tables() removes empty page table pages - Kernel page table lock has been dropped completely - vmemmap_free() does not call freee_empty_tables() to avoid conflict with vmalloc() - All address range scanning are converted to do {} while() loop - Added 'unsigned long end' in __remove_pgd_mapping() - Callers need not provide starting pointer argument to free_[pte|pmd|pud]_table() - Drop the starting pointer argument from free_[pte|pmd|pud]_table() functions - Fetching pxxp[i] in free_[pte|pmd|pud]_table() is wrapped around in READ_ONCE() - free_[pte|pmd|pud]_table() now computes starting pointer inside the function - Fixed TLB handling while freeing huge page section mappings at PMD or PUD level - Added WARN_ON(!page) in free_hotplug_page_range() - Added WARN_ON(![pm|pud]_table(pud|pmd)) when there is no section mapping - [PATCH 1/3] mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() - Request earlier for separate merger (https://patchwork.kernel.org/patch/10986599/) - s/__remove_memory/try_remove_memory in the subject line - s/arch_remove_memory/memblock_[free|remove] in the subject line - A small change in the commit message as re-order happens now for memblock remove functions not for arch_remove_memory() Changes in V5: (https://lkml.org/lkml/2019/5/29/218) - Have some agreement [1] over using memory_hotplug_lock for arm64 ptdump - Change 7ba36eccb3f8 ("arm64/mm: Inhibit huge-vmap with ptdump") already merged - Dropped the above patch from this series - Fixed indentation problem in arch_[add|remove]_memory() as per David - Collected all new Acked-by tags Changes in V4: (https://lkml.org/lkml/2019/5/20/19) - Implemented most of the suggestions from Mark Rutland - Interchanged patch [PATCH 2/4] <---> [PATCH 3/4] and updated commit message - Moved CONFIG_PGTABLE_LEVELS inside free_[pud|pmd]_table() - Used READ_ONCE() in missing instances while accessing page table entries - s/p???_present()/p???_none() for checking valid kernel page table entries - WARN_ON() when an entry is !p???_none() and !p???_present() at the same time - Updated memory hot-remove commit message with additional details as suggested - Rebased the series on 5.2-rc1 with hotplug changes from David and Michal Hocko - Collected all new Acked-by tags Changes in V3: (https://lkml.org/lkml/2019/5/14/197) - Implemented most of the suggestions from Mark Rutland for remove_pagetable() - Fixed applicable PGTABLE_LEVEL wrappers around pgtable page freeing functions - Replaced 'direct' with 'sparse_vmap' in remove_pagetable() with inverted polarity - Changed pointer names ('p' at end) and removed tmp from iterations - Perform intermediate TLB invalidation while clearing pgtable entries - Dropped flush_tlb_kernel_range() in remove_pagetable() - Added flush_tlb_kernel_range() in remove_pte_table() instead - Renamed page freeing functions for pgtable page and mapped pages - Used page range size instead of order while freeing mapped or pgtable pages - Removed all PageReserved() handling while freeing mapped or pgtable pages - Replaced XXX_index() with XXX_offset() while walking the kernel page table - Used READ_ONCE() while fetching individual pgtable entries - Taken overall init_mm.page_table_lock instead of just while changing an entry - Dropped previously added [pmd|pud]_index() which are not required anymore - Added a new patch to protect kernel page table race condition for ptdump - Added a new patch from Mark Rutland to prevent huge-vmap with ptdump Changes in V2: (https://lkml.org/lkml/2019/4/14/5) - Added all received review and ack tags - Split the series from ZONE_DEVICE enablement for better review - Moved memblock re-order patch to the front as per Robin Murphy - Updated commit message on memblock re-order patch per Michal Hocko - Dropped [pmd|pud]_large() definitions - Used existing [pmd|pud]_sect() instead of earlier [pmd|pud]_large() - Removed __meminit and __ref tags as per Oscar Salvador - Dropped unnecessary 'ret' init in arch_add_memory() per Robin Murphy - Skipped calling into pgtable_page_dtor() for linear mapping page table pages and updated all relevant functions Changes in V1: (https://lkml.org/lkml/2019/4/3/28) References: [1] https://lkml.org/lkml/2019/5/28/584 [2] https://lkml.org/lkml/2019/6/11/709 Anshuman Khandual (3): mm/hotplug: Reorder memblock_[free|remove]() calls in try_remove_memory() arm64/mm: Hold memory hotplug lock while walking for kernel page table dump arm64/mm: Enable memory hot remove arch/arm64/Kconfig | 3 + arch/arm64/mm/mmu.c | 290 +++++++++++++++++++++++++++++++++++++++-- arch/arm64/mm/ptdump_debugfs.c | 4 + mm/memory_hotplug.c | 4 +- 4 files changed, 290 insertions(+), 11 deletions(-) -- 2.7.4