From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22911C49ED9 for ; Tue, 10 Sep 2019 16:18:12 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 04D0C2089F for ; Tue, 10 Sep 2019 16:18:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2436739AbfIJQSL (ORCPT ); Tue, 10 Sep 2019 12:18:11 -0400 Received: from foss.arm.com ([217.140.110.172]:37816 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2394155AbfIJQSK (ORCPT ); Tue, 10 Sep 2019 12:18:10 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0AB911000; Tue, 10 Sep 2019 09:18:10 -0700 (PDT) Received: from C02TF0J2HF1T.local (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ABDE73F71F; Tue, 10 Sep 2019 09:18:04 -0700 (PDT) Date: Tue, 10 Sep 2019 17:17:59 +0100 From: Catalin Marinas To: Anshuman Khandual Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, akpm@linux-foundation.org, will@kernel.org, mark.rutland@arm.com, mhocko@suse.com, ira.weiny@intel.com, david@redhat.com, cai@lca.pw, logang@deltatee.com, cpandya@codeaurora.org, arunks@codeaurora.org, dan.j.williams@intel.com, mgorman@techsingularity.net, osalvador@suse.de, ard.biesheuvel@arm.com, steve.capper@arm.com, broonie@kernel.org, valentin.schneider@arm.com, Robin.Murphy@arm.com, steven.price@arm.com, suzuki.poulose@arm.com Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove Message-ID: <20190910161759.GI14442@C02TF0J2HF1T.local> References: <1567503958-25831-1-git-send-email-anshuman.khandual@arm.com> <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com> User-Agent: Mutt/1.12.1 (2019-06-15) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Sep 03, 2019 at 03:15:58PM +0530, Anshuman Khandual wrote: > @@ -770,6 +1022,28 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > void vmemmap_free(unsigned long start, unsigned long end, > struct vmem_altmap *altmap) > { > +#ifdef CONFIG_MEMORY_HOTPLUG > + /* > + * FIXME: We should have called remove_pagetable(start, end, true). > + * vmemmap and vmalloc virtual range might share intermediate kernel > + * page table entries. Removing vmemmap range page table pages here > + * can potentially conflict with a concurrent vmalloc() allocation. > + * > + * This is primarily because vmalloc() does not take init_mm ptl for > + * the entire page table walk and it's modification. Instead it just > + * takes the lock while allocating and installing page table pages > + * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table > + * entry via memory hot remove can cause vmalloc() kernel page table > + * walk pointers to be invalid on the fly which can cause corruption > + * or worst, a crash. > + * > + * So free_empty_tables() gets called where vmalloc and vmemmap range > + * do not overlap at any intermediate level kernel page table entry. > + */ > + unmap_hotplug_range(start, end, true); > + if (!vmalloc_vmemmap_overlap) > + free_empty_tables(start, end); > +#endif > } > #endif /* CONFIG_SPARSEMEM_VMEMMAP */ I wonder whether we could simply ignore the vmemmap freeing altogether, just leave it around and not unmap it. This way, we could call unmap_kernel_range() for removing the linear map and we save some code. For the linear map, I think we use just above 2MB of tables for 1GB of memory mapped (worst case with 4KB pages we need 512 pte pages). For vmemmap we'd use slightly above 2MB for a 64GB hotplugged memory. Do we expect such memory to be re-plugged again in the same range? If we do, then I shouldn't even bother with removing the vmmemmap. I don't fully understand the use-case for memory hotremove, so any additional info would be useful to make a decision here. -- Catalin From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37A88C49ED9 for ; Tue, 10 Sep 2019 16:18:24 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0A9BA2089F for ; Tue, 10 Sep 2019 16:18:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="KYRDs/1k" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0A9BA2089F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=OnrmhBnhMJLS4KBNqO0vpktlSWngIIu1l8Gf/oAZ//A=; b=KYRDs/1ksDB2BV JlF4xVAr2Te8/clbmeYthiKSgiIXRd7isFvaakp1uxKKWIjATBsQFLkvkhkIyKWI20voFr3omEtqz IsDAmAAhBnxhTbSAdxGHIsKYXmtJzFs88IgUbTmE3yQOq4RYi9s5Td3SsNVUniLrxU4p3X1J9nIh6 fyNdSLjbqQFtMuBSSucLKrhlb5gsxxJ1UD51DL0fWhKjTbVl+BmSESXAGg+g5Ouh9+hsVc/e6TJsm BYDTplaVYa8+Sbvp/8RRNqaxthWrUeng9pV9aTBKHxYyapB19HGQVmIJVjHY/6wl5uqN828wIbLT+ VAXZKyOgxNYR6rI6zPdw==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1i7iqP-0000nt-PD; Tue, 10 Sep 2019 16:18:17 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1i7iqM-0000kI-GG for linux-arm-kernel@lists.infradead.org; Tue, 10 Sep 2019 16:18:15 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 0AB911000; Tue, 10 Sep 2019 09:18:10 -0700 (PDT) Received: from C02TF0J2HF1T.local (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id ABDE73F71F; Tue, 10 Sep 2019 09:18:04 -0700 (PDT) Date: Tue, 10 Sep 2019 17:17:59 +0100 From: Catalin Marinas To: Anshuman Khandual Subject: Re: [PATCH V7 3/3] arm64/mm: Enable memory hot remove Message-ID: <20190910161759.GI14442@C02TF0J2HF1T.local> References: <1567503958-25831-1-git-send-email-anshuman.khandual@arm.com> <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <1567503958-25831-4-git-send-email-anshuman.khandual@arm.com> User-Agent: Mutt/1.12.1 (2019-06-15) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190910_091814_653050_129C0F64 X-CRM114-Status: GOOD ( 15.41 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mark.rutland@arm.com, mhocko@suse.com, david@redhat.com, linux-mm@kvack.org, arunks@codeaurora.org, cpandya@codeaurora.org, ira.weiny@intel.com, will@kernel.org, steven.price@arm.com, valentin.schneider@arm.com, suzuki.poulose@arm.com, Robin.Murphy@arm.com, broonie@kernel.org, cai@lca.pw, ard.biesheuvel@arm.com, dan.j.williams@intel.com, linux-arm-kernel@lists.infradead.org, osalvador@suse.de, steve.capper@arm.com, logang@deltatee.com, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, mgorman@techsingularity.net Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, Sep 03, 2019 at 03:15:58PM +0530, Anshuman Khandual wrote: > @@ -770,6 +1022,28 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, > void vmemmap_free(unsigned long start, unsigned long end, > struct vmem_altmap *altmap) > { > +#ifdef CONFIG_MEMORY_HOTPLUG > + /* > + * FIXME: We should have called remove_pagetable(start, end, true). > + * vmemmap and vmalloc virtual range might share intermediate kernel > + * page table entries. Removing vmemmap range page table pages here > + * can potentially conflict with a concurrent vmalloc() allocation. > + * > + * This is primarily because vmalloc() does not take init_mm ptl for > + * the entire page table walk and it's modification. Instead it just > + * takes the lock while allocating and installing page table pages > + * via [p4d|pud|pmd|pte]_alloc(). A concurrently vanishing page table > + * entry via memory hot remove can cause vmalloc() kernel page table > + * walk pointers to be invalid on the fly which can cause corruption > + * or worst, a crash. > + * > + * So free_empty_tables() gets called where vmalloc and vmemmap range > + * do not overlap at any intermediate level kernel page table entry. > + */ > + unmap_hotplug_range(start, end, true); > + if (!vmalloc_vmemmap_overlap) > + free_empty_tables(start, end); > +#endif > } > #endif /* CONFIG_SPARSEMEM_VMEMMAP */ I wonder whether we could simply ignore the vmemmap freeing altogether, just leave it around and not unmap it. This way, we could call unmap_kernel_range() for removing the linear map and we save some code. For the linear map, I think we use just above 2MB of tables for 1GB of memory mapped (worst case with 4KB pages we need 512 pte pages). For vmemmap we'd use slightly above 2MB for a 64GB hotplugged memory. Do we expect such memory to be re-plugged again in the same range? If we do, then I shouldn't even bother with removing the vmmemmap. I don't fully understand the use-case for memory hotremove, so any additional info would be useful to make a decision here. -- Catalin _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel