From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A07FEC433DB for ; Thu, 25 Mar 2021 11:08:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 185406192D for ; Thu, 25 Mar 2021 11:08:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 185406192D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 86DB16B0036; Thu, 25 Mar 2021 07:08:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 844C56B006C; Thu, 25 Mar 2021 07:08:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6E5F06B0070; Thu, 25 Mar 2021 07:08:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 56D876B0036 for ; Thu, 25 Mar 2021 07:08:51 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0F7E119B2B for ; Thu, 25 Mar 2021 11:08:51 +0000 (UTC) X-FDA: 77958124062.25.DC086BE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf19.hostedemail.com (Postfix) with ESMTP id E901890009E2 for ; Thu, 25 Mar 2021 11:08:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616670530; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=0rQiSQhOepB0YWYVbYm6m/u8MoxDOz3b0raAhip7+HQ=; b=A3k5VxXgj+mpYtT4KupYIlQs1AEuJN8IAJhtpOT7IZu1PjsZn4pSUXrFcGLU6MZxL/WiW+ DnI6iFKj1luc7m94qZVgi36z5Eu1Khf00zXimlnoVP3yJo3QUzj1QmeiiYQkNOxo7PNBl7 SOso5jjGAF73lfjf99HiZRXYAb3ocEQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-51-Gn4qqs2RO_GlWKusopfZMg-1; Thu, 25 Mar 2021 07:08:47 -0400 X-MC-Unique: Gn4qqs2RO_GlWKusopfZMg-1 Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 403338030D6; Thu, 25 Mar 2021 11:08:46 +0000 (UTC) Received: from [10.36.115.72] (ovpn-115-72.ams2.redhat.com [10.36.115.72]) by smtp.corp.redhat.com (Postfix) with ESMTP id E678488F02; Thu, 25 Mar 2021 11:08:43 +0000 (UTC) To: Oscar Salvador , Michal Hocko Cc: Andrew Morton , Anshuman Khandual , Vlastimil Babka , Pavel Tatashin , linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20210319092635.6214-2-osalvador@suse.de> <20210324101259.GB16560@linux> <3bc4168c-fd31-0c9a-44ac-88e25d524eef@redhat.com> <9591a0b8-c000-2f61-67a6-4402678fe50b@redhat.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v5 1/5] mm,memory_hotplug: Allocate memmap from the added memory range Message-ID: Date: Thu, 25 Mar 2021 12:08:43 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Stat-Signature: awsdtih5z1siqs5go7dz5kccnc13xz4n X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: E901890009E2 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616670527-318554 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 25.03.21 11:55, Oscar Salvador wrote: > On Thu, Mar 25, 2021 at 10:17:33AM +0100, Michal Hocko wrote: >> Why do you think it is wrong to initialize/account pages when they are >> used? Keep in mind that offline pages are not used until they are >> onlined. But vmemmap pages are used since the vmemmap is established >> which happens in the hotadd stage. >=20 > Yes, that is true. > vmemmap pages are used right when we populate the vmemmap space. >=20 Note: I once herd of a corner-case use case where people offline memory=20 blocks to then use the "free" memory via /dev/mem for other purposes=20 ("large physical memory"). Not that I encourage such use cases, but they=20 would be fundamentally broken if the vmemmap ends up on offline memory=20 and is supposed to keep its state ... >>> plus the fact that I dislike to place those pages in >>> ZONE_NORMAL, although they are not movable. >>> But I think the vmemmap pages should lay within the same zone the pag= es >>> they describe, doing so simplifies things, and I do not see any outri= ght >>> downside. >> >> Well, both ways likely have its pros and cons. Nevertheless, if the >> vmemmap storage is independent (which is the case for normal hotplug) >> then the state is consistent over hotadd, {online, offline} N times, >> hotremove cycles. Which is conceptually reasonable as vmemmap doesn't >> go away on each offline. >> >> If you are going to bind accounting to the online/offline stages then >> the accounting changes each time you go through the cycle and dependin= g >> on the onlining type it would travel among zones. I find it quite >> confusing as the storage for vmemmap hasn't changed any of its >> properties. >=20 > That is a good point I guess. > vmemmap pages do not really go away until the memory is unplugged. >=20 > But I see some questions to raise: >=20 > - As I said, I really dislike it tiding vmemmap memory to ZONE_NORMAL > unconditionally and this might result in the problems David mentione= d. > I remember David and I discussed such problems but the problems with > zones not being contiguos have also been discussed in the past and > IIRC, we reached the conclusion that a maximal effort should be made > to keep them that way, otherwise other things suffer e.g: compaction > code. > So if we really want to move the initialization/account to the > hot-add/hot-remove stage, I would really like to be able to set the > proper zone in there (that is, the same zone where the memory will l= ay). Determining the zone when hot-adding does not make too much sense: you=20 don't know what user space might end up deciding (online_kernel,=20 online_movable...). >=20 > - When moving the initialization/accounting to hot-add/hot-remove, > the section containing the vmemmap pages will remain offline. > It might get onlined once the pages get online in online_pages(), > or not if vmemmap pages span a whole section. > I remember (but maybe David rmemeber better) that that was a problem > wrt. pfn_to_online_page() and hybernation/kdump. > So, if that is really a problem, we would have to care of ot setting > the section to the right state. Good memory. Indeed, hibernation/kdump won't save the state of the=20 vmemmap, because the memory is marked as offline and, thus, logically=20 without any valuable content. >=20 > - AFAICS, doing all the above brings us to former times were some > initialization/accounting was done in a previous stage, and I rememb= er > it was pushed hard to move those in online/offline_pages(). > Are we ok with that? > As I said, we might have to set the right zone in hot-add stage, as > otherwise problems might come up. > Being that case, would not that also be conflating different concept= s > at a wrong phases? >=20 I expressed my opinion already, no need to repeat. Sub-section online=20 maps would make it cleaner, but I am still not convinced we want/need tha= t. > Do not take me wrong, I quite like Michal's idea, and from a > conceptually point of view I guess it is the right thing to do. > But when evualating risks/difficulty, I am not really sure. >=20 > If we can pull that off while setting the right zone (and must be seen > what about the section state), and the outcome is not ugly, I am all fo= r > it. > Also a middel-ground might be something like I previously mentioned(hav= ing > a helper in memory_block_action() to do the right thing, so > offline/online_pages() do not get pouled. As I said, having soemthing like=20 memory_block_online()/memory_block_offline() could be one way to tackle=20 it. We only support onlining/offlining of memory blocks and I ripped out=20 all code that was abusing online_pages/offline_pages ... So have memory_block_online() call online_pages() and do the accounting=20 of the vmemmap, with a big fat comment that sections are actually set=20 online/offline in online_pages/offline_pages(). Could be a simple=20 cleanup on top of this series ... --=20 Thanks, David / dhildenb