From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 497A8C43441 for ; Mon, 26 Nov 2018 12:33:38 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B9F2920664 for ; Mon, 26 Nov 2018 12:33:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B9F2920664 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 433RCz6HjGzDqXB for ; Mon, 26 Nov 2018 23:33:35 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=redhat.com (client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 433R8n50cVzDqWt for ; Mon, 26 Nov 2018 23:30:49 +1100 (AEDT) Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0D0B780F82; Mon, 26 Nov 2018 12:30:43 +0000 (UTC) Received: from [10.36.116.75] (ovpn-116-75.ams2.redhat.com [10.36.116.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id 7FCDC5D9C6; Mon, 26 Nov 2018 12:30:32 +0000 (UTC) Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types To: =?UTF-8?Q?Michal_Such=c3=a1nek?= References: <20180928150357.12942-1-david@redhat.com> <20181123190653.6da91461@kitsune.suse.cz> From: David Hildenbrand Openpgp: preference=signencrypt Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: Date: Mon, 26 Nov 2018 13:30:31 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <20181123190653.6da91461@kitsune.suse.cz> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.27]); Mon, 26 Nov 2018 12:30:44 +0000 (UTC) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kate Stewart , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Dave Hansen , Heiko Carstens , linux-mm@kvack.org, Michal Hocko , Paul Mackerras , "H. Peter Anvin" , Stephen Rothwell , Rashmica Gupta , "K. Y. Srinivasan" , Dan Williams , linux-s390@vger.kernel.org, Michael Neuling , Stephen Hemminger , Yoshinori Sato , linux-acpi@vger.kernel.org, Ingo Molnar , xen-devel@lists.xenproject.org, Len Brown , Pavel Tatashin , Rob Herring , "mike.travis@hpe.com" , Haiyang Zhang , =?UTF-8?Q?Jonathan_Neusch=c3=a4fer?= , Nicholas Piggin , Martin Schwidefsky , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Mike Rapoport , Borislav Petkov , Andy Lutomirski , Boris Ostrovsky , Andrew Morton , Oscar Salvador , Juergen Gross , Tony Luck , Mathieu Malaterre , Greg Kroah-Hartman , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, Fenghua Yu , Mauricio Faria de Oliveira , Thomas Gleixner , Philippe Ombredanne , Joe Perches , devel@linuxdriverproject.org, Joonsoo Kim , linuxppc-dev@lists.ozlabs.org, "Kirill A. Shutemov" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 23.11.18 19:06, Michal Suchánek wrote: > On Fri, 23 Nov 2018 12:13:58 +0100 > David Hildenbrand wrote: > >> On 28.09.18 17:03, David Hildenbrand wrote: >>> How to/when to online hotplugged memory is hard to manage for >>> distributions because different memory types are to be treated differently. >>> Right now, we need complicated udev rules that e.g. check if we are >>> running on s390x, on a physical system or on a virtualized system. But >>> there is also sometimes the demand to really online memory immediately >>> while adding in the kernel and not to wait for user space to make a >>> decision. And on virtualized systems there might be different >>> requirements, depending on "how" the memory was added (and if it will >>> eventually get unplugged again - DIMM vs. paravirtualized mechanisms). >>> >>> On the one hand, we have physical systems where we sometimes >>> want to be able to unplug memory again - e.g. a DIMM - so we have to online >>> it to the MOVABLE zone optionally. That decision is usually made in user >>> space. >>> >>> On the other hand, we have memory that should never be onlined >>> automatically, only when asked for by an administrator. Such memory only >>> applies to virtualized environments like s390x, where the concept of >>> "standby" memory exists. Memory is detected and added during boot, so it >>> can be onlined when requested by the admininistrator or some tooling. >>> Only when onlining, memory will be allocated in the hypervisor. >>> >>> But then, we also have paravirtualized devices (namely xen and hyper-v >>> balloons), that hotplug memory that will never ever be removed from a >>> system right now using offline_pages/remove_memory. If at all, this memory >>> is logically unplugged and handed back to the hypervisor via ballooning. >>> >>> For paravirtualized devices it is relevant that memory is onlined as >>> quickly as possible after adding - and that it is added to the NORMAL >>> zone. Otherwise, it could happen that too much memory in a row is added >>> (but not onlined), resulting in out-of-memory conditions due to the >>> additional memory for "struct pages" and friends. MOVABLE zone as well >>> as delays might be very problematic and lead to crashes (e.g. zone >>> imbalance). >>> >>> Therefore, introduce memory block types and online memory depending on >>> it when adding the memory. Expose the memory type to user space, so user >>> space handlers can start to process only "normal" memory. Other memory >>> block types can be ignored. One thing less to worry about in user space. >>> >> >> So I was looking into alternatives. >> >> 1. Provide only "normal" and "standby" memory types to user space. This >> way user space can make smarter decisions about how to online memory. >> Not really sure if this is the right way to go. >> >> >> 2. Use device driver information (as mentioned by Michal S.). >> >> The problem right now is that there are no drivers for memory block >> devices. The "memory" subsystem has no drivers, so the KOBJ_ADD uevent >> will not contain a "DRIVER" information and we ave no idea what kind of >> memory block device we hold in our hands. >> >> $ udevadm info -q all -a /sys/devices/system/memory/memory0 >> >> looking at device '/devices/system/memory/memory0': >> KERNEL=="memory0" >> SUBSYSTEM=="memory" >> DRIVER=="" >> ATTR{online}=="1" >> ATTR{phys_device}=="0" >> ATTR{phys_index}=="00000000" >> ATTR{removable}=="0" >> ATTR{state}=="online" >> ATTR{valid_zones}=="none" >> >> >> If we would provide "fake" drivers for the memory block devices we want >> to treat in a special way in user space (e.g. standby memory on s390x), >> user space could use that information to make smarter decisions. >> >> Adding such drivers might work. My suggestion would be to let ordinary >> DIMMs be without a driver for now and only special case standby memory >> and eventually paravirtualized memory devices (XEN and Hyper-V). >> >> Any thoughts? > > If we are going to fake the driver information we may as well add the > type attribute and be done with it. > > I think the problem with the patch was more with the semantic than the > attribute itself. > > What is normal, paravirtualized, and standby memory? > > I can understand DIMM device, baloon device, or whatever mechanism for > adding memory you might have. > > I can understand "memory designated as standby by the cluster > administrator". > > However, DIMM vs baloon is orthogonal to standby and should not be > conflated into one property. > > paravirtualized means nothing at all in relationship to memory type and > the desired online policy to me. Right, so with whatever we come up, it should allow to make a decision in user space about - if memory is to be onlined automatically - to which zone memory is to be onlined The rules are encoded in user space, the type will allow to make a decision. One important part will be if the memory can eventually be offlined + removed again (DIMM style unplug) vs. memory unplug is handled balloon-style. As we learned, some use cases might require to e.g. online balloon memory to the movable zone in order to make better use of huge pages. This has to be handled in user space. I'll think about possible types. > > Lastly I would suggest if you add any property you add it to *all* > memory that is hotplugged. That way the userspace can detect if it can > rely on the information from your patch or not. Leaving some memory > untagged makes things needlessly vague. Yes, that makes sense. Thanks! > > Thanks > > Michal > -- Thanks, David / dhildenb