From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Hildenbrand Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types Date: Mon, 1 Oct 2018 11:13:43 +0200 Message-ID: <147d20c7-2a07-2305-9b44-76fdb735173b@redhat.com> References: <20180928150357.12942-1-david@redhat.com> <5dba97a5-5a18-5df1-5493-99987679cf3a@linux.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <5dba97a5-5a18-5df1-5493-99987679cf3a@linux.intel.com> Content-Language: en-US List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: driverdev-devel-bounces@linuxdriverproject.org Sender: "devel" To: Dave Hansen , linux-mm@kvack.org Cc: Kate Stewart , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Benjamin Herrenschmidt , Balbir Singh , Heiko Carstens , Pavel Tatashin , Michal Hocko , Paul Mackerras , "H. Peter Anvin" , Rashmica Gupta , Boris Ostrovsky , linux-s390@vger.kernel.org, Michael Neuling , Stephen Hemminger , Yoshinori Sato , Michael Ellerman , linux-acpi@vger.kernel.org, Ingo Molnar , xen-devel@lists.xenproject.org, Rob Herring , Len Brown List-Id: linux-acpi@vger.kernel.org On 28/09/2018 19:02, Dave Hansen wrote: > It's really nice if these kinds of things are broken up. First, replace > the old want_memblock parameter, then add the parameter to the > __add_page() calls. Definitely, once we agree that is is not nuts, I will split it up for the next version :) > >> +/* >> + * NONE: No memory block is to be created (e.g. device memory). >> + * NORMAL: Memory block that represents normal (boot or hotplugged) memory >> + * (e.g. ACPI DIMMs) that should be onlined either automatically >> + * (memhp_auto_online) or manually by user space to select a >> + * specific zone. >> + * Applicable to memhp_auto_online. >> + * STANDBY: Memory block that represents standby memory that should only >> + * be onlined on demand by user space (e.g. standby memory on >> + * s390x), but never automatically by the kernel. >> + * Not applicable to memhp_auto_online. >> + * PARAVIRT: Memory block that represents memory added by >> + * paravirtualized mechanisms (e.g. hyper-v, xen) that will >> + * always automatically get onlined. Memory will be unplugged >> + * using ballooning, not by relying on the MOVABLE ZONE. >> + * Not applicable to memhp_auto_online. >> + */ >> +enum { >> + MEMORY_BLOCK_NONE, >> + MEMORY_BLOCK_NORMAL, >> + MEMORY_BLOCK_STANDBY, >> + MEMORY_BLOCK_PARAVIRT, >> +}; > > This does not seem like the best way to expose these. > > STANDBY, for instance, seems to be essentially a replacement for a check > against running on s390 in userspace to implement a _typical_ s390 > policy. It seems rather weird to try to make the userspace policy > determination easier by telling userspace about the typical s390 policy > via the kernel. Now comes the fun part: I am working on another paravirtualized memory hotplug way for KVM guests, based on virtio ("virtio-mem"). These devices can potentially be used concurrently with - s390x standby memory - DIMMs How should a policy in user space look like when new memory gets added - on s390x? Not onlining paravirtualized memory is very wrong. - on e.g. x86? Onlining memory to the MOVABLE zone is very wrong. So the type of memory is very important here to have in user space. Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()" to decide whether to online memory and how to online memory is wrong. Only some specific memory types (which I call "normal") are to be handled by user space. For the other ones, we exactly know what to do: - standby? don't online - paravirt? always online to normal zone I will add some more details as reply to Michal. > > As for the OOM issues, that sounds like something we need to fix by > refusing to do (or delaying) hot-add operations once we consume too much > ZONE_NORMAL from memmap[]s rather than trying to indirectly tell > userspace to hurry thing along. That is a moving target and doing that automatically is basically impossible. You can add a lot of memory to the movable zone and everything is fine. Suddenly a lot of processes are started - boom. MOVABLE should only every be used if you expect an unplug. And for paravirtualized devices, a "typical" unplug does not exist. > > So, to my eye, we need: > > +enum { > + MEMORY_BLOCK_NONE, > + MEMORY_BLOCK_STANDBY, /* the default */ > + MEMORY_BLOCK_AUTO_ONLINE, > +}; auto-online is strongly misleading, that's why I called it "normal", but I am open for suggestions. The information about devices handles fully in the kernel - "paravirt" is key for me. > > and we can probably collapse NONE into AUTO_ONLINE because userspace > ends up doing the same thing for both: nothing. For external reasons, yes, for internal reasons no (see hmm/device memory). In user space, we will never end up with MEMORY_BLOCK_NONE, because there is no memory block. > >> struct memory_block { >> unsigned long start_section_nr; >> unsigned long end_section_nr; >> @@ -34,6 +58,7 @@ struct memory_block { >> int (*phys_callback)(struct memory_block *); >> struct device dev; >> int nid; /* NID for this memory block */ >> + int type; /* type of this memory block */ >> }; > > Shouldn't we just be creating and using an actual named enum type? > That makes sense. Thanks! -- Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) by kanga.kvack.org (Postfix) with ESMTP id 78C4B6B0003 for ; Mon, 1 Oct 2018 05:14:05 -0400 (EDT) Received: by mail-qk1-f197.google.com with SMTP id u86-v6so13932324qku.5 for ; Mon, 01 Oct 2018 02:14:05 -0700 (PDT) Received: from mx1.redhat.com (mx1.redhat.com. [209.132.183.28]) by mx.google.com with ESMTPS id s13-v6si3077841qtn.389.2018.10.01.02.14.04 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 01 Oct 2018 02:14:04 -0700 (PDT) Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types References: <20180928150357.12942-1-david@redhat.com> <5dba97a5-5a18-5df1-5493-99987679cf3a@linux.intel.com> From: David Hildenbrand Message-ID: <147d20c7-2a07-2305-9b44-76fdb735173b@redhat.com> Date: Mon, 1 Oct 2018 11:13:43 +0200 MIME-Version: 1.0 In-Reply-To: <5dba97a5-5a18-5df1-5493-99987679cf3a@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Dave Hansen , linux-mm@kvack.org Cc: xen-devel@lists.xenproject.org, devel@linuxdriverproject.org, linux-acpi@vger.kernel.org, linux-sh@vger.kernel.org, linux-s390@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org, Tony Luck , Fenghua Yu , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Martin Schwidefsky , Heiko Carstens , Yoshinori Sato , Rich Felker , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "Rafael J. Wysocki" , Len Brown , Greg Kroah-Hartman , "K. Y. Srinivasan" , Haiyang Zhang , Stephen Hemminger , Boris Ostrovsky , Juergen Gross , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Andrew Morton , Mike Rapoport , Dan Williams , Stephen Rothwell , Michal Hocko , "Kirill A. Shutemov" , Nicholas Piggin , =?UTF-8?Q?Jonathan_Neusch=c3=a4fer?= , Joe Perches , Michael Neuling , Mauricio Faria de Oliveira , Balbir Singh , Rashmica Gupta , Pavel Tatashin , Rob Herring , Philippe Ombredanne , Kate Stewart , "mike.travis@hpe.com" , Joonsoo Kim , Oscar Salvador , Mathieu Malaterre On 28/09/2018 19:02, Dave Hansen wrote: > It's really nice if these kinds of things are broken up. First, replace > the old want_memblock parameter, then add the parameter to the > __add_page() calls. Definitely, once we agree that is is not nuts, I will split it up for the next version :) > >> +/* >> + * NONE: No memory block is to be created (e.g. device memory). >> + * NORMAL: Memory block that represents normal (boot or hotplugged) memory >> + * (e.g. ACPI DIMMs) that should be onlined either automatically >> + * (memhp_auto_online) or manually by user space to select a >> + * specific zone. >> + * Applicable to memhp_auto_online. >> + * STANDBY: Memory block that represents standby memory that should only >> + * be onlined on demand by user space (e.g. standby memory on >> + * s390x), but never automatically by the kernel. >> + * Not applicable to memhp_auto_online. >> + * PARAVIRT: Memory block that represents memory added by >> + * paravirtualized mechanisms (e.g. hyper-v, xen) that will >> + * always automatically get onlined. Memory will be unplugged >> + * using ballooning, not by relying on the MOVABLE ZONE. >> + * Not applicable to memhp_auto_online. >> + */ >> +enum { >> + MEMORY_BLOCK_NONE, >> + MEMORY_BLOCK_NORMAL, >> + MEMORY_BLOCK_STANDBY, >> + MEMORY_BLOCK_PARAVIRT, >> +}; > > This does not seem like the best way to expose these. > > STANDBY, for instance, seems to be essentially a replacement for a check > against running on s390 in userspace to implement a _typical_ s390 > policy. It seems rather weird to try to make the userspace policy > determination easier by telling userspace about the typical s390 policy > via the kernel. Now comes the fun part: I am working on another paravirtualized memory hotplug way for KVM guests, based on virtio ("virtio-mem"). These devices can potentially be used concurrently with - s390x standby memory - DIMMs How should a policy in user space look like when new memory gets added - on s390x? Not onlining paravirtualized memory is very wrong. - on e.g. x86? Onlining memory to the MOVABLE zone is very wrong. So the type of memory is very important here to have in user space. Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()" to decide whether to online memory and how to online memory is wrong. Only some specific memory types (which I call "normal") are to be handled by user space. For the other ones, we exactly know what to do: - standby? don't online - paravirt? always online to normal zone I will add some more details as reply to Michal. > > As for the OOM issues, that sounds like something we need to fix by > refusing to do (or delaying) hot-add operations once we consume too much > ZONE_NORMAL from memmap[]s rather than trying to indirectly tell > userspace to hurry thing along. That is a moving target and doing that automatically is basically impossible. You can add a lot of memory to the movable zone and everything is fine. Suddenly a lot of processes are started - boom. MOVABLE should only every be used if you expect an unplug. And for paravirtualized devices, a "typical" unplug does not exist. > > So, to my eye, we need: > > +enum { > + MEMORY_BLOCK_NONE, > + MEMORY_BLOCK_STANDBY, /* the default */ > + MEMORY_BLOCK_AUTO_ONLINE, > +}; auto-online is strongly misleading, that's why I called it "normal", but I am open for suggestions. The information about devices handles fully in the kernel - "paravirt" is key for me. > > and we can probably collapse NONE into AUTO_ONLINE because userspace > ends up doing the same thing for both: nothing. For external reasons, yes, for internal reasons no (see hmm/device memory). In user space, we will never end up with MEMORY_BLOCK_NONE, because there is no memory block. > >> struct memory_block { >> unsigned long start_section_nr; >> unsigned long end_section_nr; >> @@ -34,6 +58,7 @@ struct memory_block { >> int (*phys_callback)(struct memory_block *); >> struct device dev; >> int nid; /* NID for this memory block */ >> + int type; /* type of this memory block */ >> }; > > Shouldn't we just be creating and using an actual named enum type? > That makes sense. Thanks! -- Thanks, David / dhildenb From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4415DC43143 for ; Mon, 1 Oct 2018 09:16:18 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B97692083C for ; Mon, 1 Oct 2018 09:16:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B97692083C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 42NxV7334KzF37v for ; Mon, 1 Oct 2018 19:16:15 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=redhat.com (client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 42NxRd2xPdzF2yr for ; Mon, 1 Oct 2018 19:14:05 +1000 (AEST) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E6A8230001DF; Mon, 1 Oct 2018 09:13:56 +0000 (UTC) Received: from [10.36.117.4] (ovpn-117-4.ams2.redhat.com [10.36.117.4]) by smtp.corp.redhat.com (Postfix) with ESMTP id 750D817CFC; Mon, 1 Oct 2018 09:13:44 +0000 (UTC) Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types To: Dave Hansen , linux-mm@kvack.org References: <20180928150357.12942-1-david@redhat.com> <5dba97a5-5a18-5df1-5493-99987679cf3a@linux.intel.com> From: David Hildenbrand Openpgp: preference=signencrypt Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: <147d20c7-2a07-2305-9b44-76fdb735173b@redhat.com> Date: Mon, 1 Oct 2018 11:13:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.0 MIME-Version: 1.0 In-Reply-To: <5dba97a5-5a18-5df1-5493-99987679cf3a@linux.intel.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Mon, 01 Oct 2018 09:14:03 +0000 (UTC) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kate Stewart , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Heiko Carstens , Pavel Tatashin , Michal Hocko , Paul Mackerras , "H. Peter Anvin" , Rashmica Gupta , "K. Y. Srinivasan" , Boris Ostrovsky , linux-s390@vger.kernel.org, Michael Neuling , Stephen Hemminger , Yoshinori Sato , linux-acpi@vger.kernel.org, Ingo Molnar , xen-devel@lists.xenproject.org, Rob Herring , Len Brown , Fenghua Yu , Stephen Rothwell , "mike.travis@hpe.com" , Haiyang Zhang , Dan Williams , =?UTF-8?Q?Jonathan_Neusch=c3=a4fer?= , Nicholas Piggin , Joe Perches , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Mike Rapoport , Borislav Petkov , Andy Lutomirski , Thomas Gleixner , Joonsoo Kim , Oscar Salvador , Juergen Gross , Tony Luck , Mathieu Malaterre , Greg Kroah-Hartman , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, Mauricio Faria de Oliveira , Philippe Ombredanne , Martin Schwidefsky , devel@linuxdriverproject.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "Kirill A. Shutemov" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 28/09/2018 19:02, Dave Hansen wrote: > It's really nice if these kinds of things are broken up. First, replace > the old want_memblock parameter, then add the parameter to the > __add_page() calls. Definitely, once we agree that is is not nuts, I will split it up for the next version :) > >> +/* >> + * NONE: No memory block is to be created (e.g. device memory). >> + * NORMAL: Memory block that represents normal (boot or hotplugged) memory >> + * (e.g. ACPI DIMMs) that should be onlined either automatically >> + * (memhp_auto_online) or manually by user space to select a >> + * specific zone. >> + * Applicable to memhp_auto_online. >> + * STANDBY: Memory block that represents standby memory that should only >> + * be onlined on demand by user space (e.g. standby memory on >> + * s390x), but never automatically by the kernel. >> + * Not applicable to memhp_auto_online. >> + * PARAVIRT: Memory block that represents memory added by >> + * paravirtualized mechanisms (e.g. hyper-v, xen) that will >> + * always automatically get onlined. Memory will be unplugged >> + * using ballooning, not by relying on the MOVABLE ZONE. >> + * Not applicable to memhp_auto_online. >> + */ >> +enum { >> + MEMORY_BLOCK_NONE, >> + MEMORY_BLOCK_NORMAL, >> + MEMORY_BLOCK_STANDBY, >> + MEMORY_BLOCK_PARAVIRT, >> +}; > > This does not seem like the best way to expose these. > > STANDBY, for instance, seems to be essentially a replacement for a check > against running on s390 in userspace to implement a _typical_ s390 > policy. It seems rather weird to try to make the userspace policy > determination easier by telling userspace about the typical s390 policy > via the kernel. Now comes the fun part: I am working on another paravirtualized memory hotplug way for KVM guests, based on virtio ("virtio-mem"). These devices can potentially be used concurrently with - s390x standby memory - DIMMs How should a policy in user space look like when new memory gets added - on s390x? Not onlining paravirtualized memory is very wrong. - on e.g. x86? Onlining memory to the MOVABLE zone is very wrong. So the type of memory is very important here to have in user space. Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()" to decide whether to online memory and how to online memory is wrong. Only some specific memory types (which I call "normal") are to be handled by user space. For the other ones, we exactly know what to do: - standby? don't online - paravirt? always online to normal zone I will add some more details as reply to Michal. > > As for the OOM issues, that sounds like something we need to fix by > refusing to do (or delaying) hot-add operations once we consume too much > ZONE_NORMAL from memmap[]s rather than trying to indirectly tell > userspace to hurry thing along. That is a moving target and doing that automatically is basically impossible. You can add a lot of memory to the movable zone and everything is fine. Suddenly a lot of processes are started - boom. MOVABLE should only every be used if you expect an unplug. And for paravirtualized devices, a "typical" unplug does not exist. > > So, to my eye, we need: > > +enum { > + MEMORY_BLOCK_NONE, > + MEMORY_BLOCK_STANDBY, /* the default */ > + MEMORY_BLOCK_AUTO_ONLINE, > +}; auto-online is strongly misleading, that's why I called it "normal", but I am open for suggestions. The information about devices handles fully in the kernel - "paravirt" is key for me. > > and we can probably collapse NONE into AUTO_ONLINE because userspace > ends up doing the same thing for both: nothing. For external reasons, yes, for internal reasons no (see hmm/device memory). In user space, we will never end up with MEMORY_BLOCK_NONE, because there is no memory block. > >> struct memory_block { >> unsigned long start_section_nr; >> unsigned long end_section_nr; >> @@ -34,6 +58,7 @@ struct memory_block { >> int (*phys_callback)(struct memory_block *); >> struct device dev; >> int nid; /* NID for this memory block */ >> + int type; /* type of this memory block */ >> }; > > Shouldn't we just be creating and using an actual named enum type? > That makes sense. Thanks! -- Thanks, David / dhildenb