From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,USER_AGENT_NEOMUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 608D0C43381 for ; Fri, 29 Mar 2019 09:20:28 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 3778F2082F for ; Fri, 29 Mar 2019 09:20:28 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729308AbfC2JU1 (ORCPT ); Fri, 29 Mar 2019 05:20:27 -0400 Received: from nat.nue.novell.com ([195.135.221.2]:4019 "EHLO suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728676AbfC2JU0 (ORCPT ); Fri, 29 Mar 2019 05:20:26 -0400 Received: by suse.de (Postfix, from userid 1000) id 7DB014740; Fri, 29 Mar 2019 10:20:25 +0100 (CET) Date: Fri, 29 Mar 2019 10:20:25 +0100 From: Oscar Salvador To: David Hildenbrand Cc: akpm@linux-foundation.org, mhocko@suse.com, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, anshuman.khandual@arm.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 0/4] mm,memory_hotplug: allocate memmap from hotadded memory Message-ID: <20190329092025.2cw3igplwzrij2sr@d104.suse.de> References: <20190328134320.13232-1-osalvador@suse.de> <20190329084547.5k37xjwvkgffwajo@d104.suse.de> <23dcfb4a-339b-dcaf-c037-331f82fdef5a@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <23dcfb4a-339b-dcaf-c037-331f82fdef5a@redhat.com> User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 29, 2019 at 09:56:37AM +0100, David Hildenbrand wrote: > Oh okay, so actually the way I guessed it would be now. > > While this makes totally sense, I'll have to look how it is currently > handled, meaning if there is a change. I somewhat remembering that > delayed struct pages initialization would initialize vmmap per section, > not per memory resource. Uhm, the memmap array for each section is built early during boot. We actually do not care about deferred struct pages initialization there. What we do is: - We go through all memblock regions marked as memory - We mark the sections within those regions present - We initialize those sections and build the corresponding memmap array The thing is that sparse_init_nid() allocates/reserves a buffer big enough to allocate the memmap array for all those sections, and for each memmap array to need to allocate, we consume it from that buffer, using contigous memory. Have a look at: - sparse_memory_present_with_active_regions() - sparse_init() - sparse_init_nid - sparse_buffer_init > But as I work on 10 things differently, my mind sometimes seems to > forget stuff in order to replace it with random nonsense. Will look into > the details to not have to ask too many dumb questions. > > > > > So, the taken approach is to allocate the vmemmap data corresponging to the > > whole DIMM/memory-device/memory-resource from the beginning of its memory. > > > > In the example from above, the vmemmap data for both sections is allocated from > > the beginning of the first section: > > > > memmap array takes 2MB per section, so 512 pfns. > > If we add 2 sections: > > > > [ pfn#0 ] \ > > [ ... ] | vmemmap used for memmap array > > [pfn#1023 ] / > > > > [pfn#1024 ] \ > > [ ... ] | used as normal memory > > [pfn#65536] / > > > > So, out of 256M, we get 252M to use as a real memory, as 4M will be used for > > building the memmap array. > > > > Actually, it can happen that depending on how big a DIMM/memory-device is, > > the first/s memblock is fully used for the memmap array (of course, this > > can only be seen when adding a huge DIMM/memory-device). > > > > Just stating here, that with your code, add_memory() and remove_memory() > always have to be called in the same granularity. Will have to see if > that implies a change. Well, I only tested it in such scenario yes, but I think that ACPI code enforces that somehow. I will take a closer look though. -- Oscar Salvador SUSE L3