From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0526C43441 for ; Mon, 26 Nov 2018 13:36:01 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5DFB720862 for ; Mon, 26 Nov 2018 13:36:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5DFB720862 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 433Sbz31LFzDqWn for ; Tue, 27 Nov 2018 00:35:59 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=redhat.com (client-ip=209.132.183.28; helo=mx1.redhat.com; envelope-from=david@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 433SYN3LZSzDqW1 for ; Tue, 27 Nov 2018 00:33:43 +1100 (AEDT) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5A30D308ED4B; Mon, 26 Nov 2018 13:33:40 +0000 (UTC) Received: from [10.36.116.75] (ovpn-116-75.ams2.redhat.com [10.36.116.75]) by smtp.corp.redhat.com (Postfix) with ESMTP id 002645C227; Mon, 26 Nov 2018 13:33:29 +0000 (UTC) Subject: Re: [PATCH RFC] mm/memory_hotplug: Introduce memory block types From: David Hildenbrand To: =?UTF-8?Q?Michal_Such=c3=a1nek?= References: <20180928150357.12942-1-david@redhat.com> <20181123190653.6da91461@kitsune.suse.cz> Openpgp: preference=signencrypt Autocrypt: addr=david@redhat.com; prefer-encrypt=mutual; keydata= xsFNBFXLn5EBEAC+zYvAFJxCBY9Tr1xZgcESmxVNI/0ffzE/ZQOiHJl6mGkmA1R7/uUpiCjJ dBrn+lhhOYjjNefFQou6478faXE6o2AhmebqT4KiQoUQFV4R7y1KMEKoSyy8hQaK1umALTdL QZLQMzNE74ap+GDK0wnacPQFpcG1AE9RMq3aeErY5tujekBS32jfC/7AnH7I0v1v1TbbK3Gp XNeiN4QroO+5qaSr0ID2sz5jtBLRb15RMre27E1ImpaIv2Jw8NJgW0k/D1RyKCwaTsgRdwuK Kx/Y91XuSBdz0uOyU/S8kM1+ag0wvsGlpBVxRR/xw/E8M7TEwuCZQArqqTCmkG6HGcXFT0V9 PXFNNgV5jXMQRwU0O/ztJIQqsE5LsUomE//bLwzj9IVsaQpKDqW6TAPjcdBDPLHvriq7kGjt WhVhdl0qEYB8lkBEU7V2Yb+SYhmhpDrti9Fq1EsmhiHSkxJcGREoMK/63r9WLZYI3+4W2rAc UucZa4OT27U5ZISjNg3Ev0rxU5UH2/pT4wJCfxwocmqaRr6UYmrtZmND89X0KigoFD/XSeVv jwBRNjPAubK9/k5NoRrYqztM9W6sJqrH8+UWZ1Idd/DdmogJh0gNC0+N42Za9yBRURfIdKSb B3JfpUqcWwE7vUaYrHG1nw54pLUoPG6sAA7Mehl3nd4pZUALHwARAQABzSREYXZpZCBIaWxk ZW5icmFuZCA8ZGF2aWRAcmVkaGF0LmNvbT7CwX4EEwECACgFAljj9eoCGwMFCQlmAYAGCwkI BwMCBhUIAgkKCwQWAgMBAh4BAheAAAoJEE3eEPcA/4Na5IIP/3T/FIQMxIfNzZshIq687qgG 8UbspuE/YSUDdv7r5szYTK6KPTlqN8NAcSfheywbuYD9A4ZeSBWD3/NAVUdrCaRP2IvFyELj xoMvfJccbq45BxzgEspg/bVahNbyuBpLBVjVWwRtFCUEXkyazksSv8pdTMAs9IucChvFmmq3 jJ2vlaz9lYt/lxN246fIVceckPMiUveimngvXZw21VOAhfQ+/sofXF8JCFv2mFcBDoa7eYob s0FLpmqFaeNRHAlzMWgSsP80qx5nWWEvRLdKWi533N2vC/EyunN3HcBwVrXH4hxRBMco3jvM m8VKLKao9wKj82qSivUnkPIwsAGNPdFoPbgghCQiBjBe6A75Z2xHFrzo7t1jg7nQfIyNC7ez MZBJ59sqA9EDMEJPlLNIeJmqslXPjmMFnE7Mby/+335WJYDulsRybN+W5rLT5aMvhC6x6POK z55fMNKrMASCzBJum2Fwjf/VnuGRYkhKCqqZ8gJ3OvmR50tInDV2jZ1DQgc3i550T5JDpToh dPBxZocIhzg+MBSRDXcJmHOx/7nQm3iQ6iLuwmXsRC6f5FbFefk9EjuTKcLMvBsEx+2DEx0E UnmJ4hVg7u1PQ+2Oy+Lh/opK/BDiqlQ8Pz2jiXv5xkECvr/3Sv59hlOCZMOaiLTTjtOIU7Tq 7ut6OL64oAq+zsFNBFXLn5EBEADn1959INH2cwYJv0tsxf5MUCghCj/CA/lc/LMthqQ773ga uB9mN+F1rE9cyyXb6jyOGn+GUjMbnq1o121Vm0+neKHUCBtHyseBfDXHA6m4B3mUTWo13nid 0e4AM71r0DS8+KYh6zvweLX/LL5kQS9GQeT+QNroXcC1NzWbitts6TZ+IrPOwT1hfB4WNC+X 2n4AzDqp3+ILiVST2DT4VBc11Gz6jijpC/KI5Al8ZDhRwG47LUiuQmt3yqrmN63V9wzaPhC+ xbwIsNZlLUvuRnmBPkTJwwrFRZvwu5GPHNndBjVpAfaSTOfppyKBTccu2AXJXWAE1Xjh6GOC 8mlFjZwLxWFqdPHR1n2aPVgoiTLk34LR/bXO+e0GpzFXT7enwyvFFFyAS0Nk1q/7EChPcbRb hJqEBpRNZemxmg55zC3GLvgLKd5A09MOM2BrMea+l0FUR+PuTenh2YmnmLRTro6eZ/qYwWkC u8FFIw4pT0OUDMyLgi+GI1aMpVogTZJ70FgV0pUAlpmrzk/bLbRkF3TwgucpyPtcpmQtTkWS gDS50QG9DR/1As3LLLcNkwJBZzBG6PWbvcOyrwMQUF1nl4SSPV0LLH63+BrrHasfJzxKXzqg rW28CTAE2x8qi7e/6M/+XXhrsMYG+uaViM7n2je3qKe7ofum3s4vq7oFCPsOgwARAQABwsFl BBgBAgAPBQJVy5+RAhsMBQkJZgGAAAoJEE3eEPcA/4NagOsP/jPoIBb/iXVbM+fmSHOjEshl KMwEl/m5iLj3iHnHPVLBUWrXPdS7iQijJA/VLxjnFknhaS60hkUNWexDMxVVP/6lbOrs4bDZ NEWDMktAeqJaFtxackPszlcpRVkAs6Msn9tu8hlvB517pyUgvuD7ZS9gGOMmYwFQDyytpepo YApVV00P0u3AaE0Cj/o71STqGJKZxcVhPaZ+LR+UCBZOyKfEyq+ZN311VpOJZ1IvTExf+S/5 lqnciDtbO3I4Wq0ArLX1gs1q1XlXLaVaA3yVqeC8E7kOchDNinD3hJS4OX0e1gdsx/e6COvy qNg5aL5n0Kl4fcVqM0LdIhsubVs4eiNCa5XMSYpXmVi3HAuFyg9dN+x8thSwI836FoMASwOl C7tHsTjnSGufB+D7F7ZBT61BffNBBIm1KdMxcxqLUVXpBQHHlGkbwI+3Ye+nE6HmZH7IwLwV W+Ajl7oYF+jeKaH4DZFtgLYGLtZ1LDwKPjX7VAsa4Yx7S5+EBAaZGxK510MjIx6SGrZWBrrV TEvdV00F2MnQoeXKzD7O4WFbL55hhyGgfWTHwZ457iN9SgYi1JLPqWkZB0JRXIEtjd4JEQcx +8Umfre0Xt4713VxMygW0PnQt5aSQdMD58jHFxTk092mU+yIHj5LeYgvwSgZN4airXk5yRXl SE+xAvmumFBY Organization: Red Hat GmbH Message-ID: Date: Mon, 26 Nov 2018 14:33:29 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.44]); Mon, 26 Nov 2018 13:33:41 +0000 (UTC) X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kate Stewart , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Dave Hansen , Heiko Carstens , linux-mm@kvack.org, Michal Hocko , Paul Mackerras , "H. Peter Anvin" , Stephen Rothwell , Rashmica Gupta , "K. Y. Srinivasan" , Dan Williams , linux-s390@vger.kernel.org, Michael Neuling , Stephen Hemminger , Yoshinori Sato , linux-acpi@vger.kernel.org, Ingo Molnar , xen-devel@lists.xenproject.org, Len Brown , Pavel Tatashin , Rob Herring , "mike.travis@hpe.com" , Haiyang Zhang , =?UTF-8?Q?Jonathan_Neusch=c3=a4fer?= , Nicholas Piggin , Martin Schwidefsky , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Mike Rapoport , Borislav Petkov , Andy Lutomirski , Boris Ostrovsky , Andrew Morton , Oscar Salvador , Juergen Gross , Tony Luck , Mathieu Malaterre , Greg Kroah-Hartman , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, Fenghua Yu , Mauricio Faria de Oliveira , Thomas Gleixner , Philippe Ombredanne , Joe Perches , devel@linuxdriverproject.org, Joonsoo Kim , linuxppc-dev@lists.ozlabs.org, "Kirill A. Shutemov" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 26.11.18 13:30, David Hildenbrand wrote: > On 23.11.18 19:06, Michal Suchánek wrote: >> On Fri, 23 Nov 2018 12:13:58 +0100 >> David Hildenbrand wrote: >> >>> On 28.09.18 17:03, David Hildenbrand wrote: >>>> How to/when to online hotplugged memory is hard to manage for >>>> distributions because different memory types are to be treated differently. >>>> Right now, we need complicated udev rules that e.g. check if we are >>>> running on s390x, on a physical system or on a virtualized system. But >>>> there is also sometimes the demand to really online memory immediately >>>> while adding in the kernel and not to wait for user space to make a >>>> decision. And on virtualized systems there might be different >>>> requirements, depending on "how" the memory was added (and if it will >>>> eventually get unplugged again - DIMM vs. paravirtualized mechanisms). >>>> >>>> On the one hand, we have physical systems where we sometimes >>>> want to be able to unplug memory again - e.g. a DIMM - so we have to online >>>> it to the MOVABLE zone optionally. That decision is usually made in user >>>> space. >>>> >>>> On the other hand, we have memory that should never be onlined >>>> automatically, only when asked for by an administrator. Such memory only >>>> applies to virtualized environments like s390x, where the concept of >>>> "standby" memory exists. Memory is detected and added during boot, so it >>>> can be onlined when requested by the admininistrator or some tooling. >>>> Only when onlining, memory will be allocated in the hypervisor. >>>> >>>> But then, we also have paravirtualized devices (namely xen and hyper-v >>>> balloons), that hotplug memory that will never ever be removed from a >>>> system right now using offline_pages/remove_memory. If at all, this memory >>>> is logically unplugged and handed back to the hypervisor via ballooning. >>>> >>>> For paravirtualized devices it is relevant that memory is onlined as >>>> quickly as possible after adding - and that it is added to the NORMAL >>>> zone. Otherwise, it could happen that too much memory in a row is added >>>> (but not onlined), resulting in out-of-memory conditions due to the >>>> additional memory for "struct pages" and friends. MOVABLE zone as well >>>> as delays might be very problematic and lead to crashes (e.g. zone >>>> imbalance). >>>> >>>> Therefore, introduce memory block types and online memory depending on >>>> it when adding the memory. Expose the memory type to user space, so user >>>> space handlers can start to process only "normal" memory. Other memory >>>> block types can be ignored. One thing less to worry about in user space. >>>> >>> >>> So I was looking into alternatives. >>> >>> 1. Provide only "normal" and "standby" memory types to user space. This >>> way user space can make smarter decisions about how to online memory. >>> Not really sure if this is the right way to go. >>> >>> >>> 2. Use device driver information (as mentioned by Michal S.). >>> >>> The problem right now is that there are no drivers for memory block >>> devices. The "memory" subsystem has no drivers, so the KOBJ_ADD uevent >>> will not contain a "DRIVER" information and we ave no idea what kind of >>> memory block device we hold in our hands. >>> >>> $ udevadm info -q all -a /sys/devices/system/memory/memory0 >>> >>> looking at device '/devices/system/memory/memory0': >>> KERNEL=="memory0" >>> SUBSYSTEM=="memory" >>> DRIVER=="" >>> ATTR{online}=="1" >>> ATTR{phys_device}=="0" >>> ATTR{phys_index}=="00000000" >>> ATTR{removable}=="0" >>> ATTR{state}=="online" >>> ATTR{valid_zones}=="none" >>> >>> >>> If we would provide "fake" drivers for the memory block devices we want >>> to treat in a special way in user space (e.g. standby memory on s390x), >>> user space could use that information to make smarter decisions. >>> >>> Adding such drivers might work. My suggestion would be to let ordinary >>> DIMMs be without a driver for now and only special case standby memory >>> and eventually paravirtualized memory devices (XEN and Hyper-V). >>> >>> Any thoughts? >> >> If we are going to fake the driver information we may as well add the >> type attribute and be done with it. >> >> I think the problem with the patch was more with the semantic than the >> attribute itself. >> >> What is normal, paravirtualized, and standby memory? >> >> I can understand DIMM device, baloon device, or whatever mechanism for >> adding memory you might have. >> >> I can understand "memory designated as standby by the cluster >> administrator". >> >> However, DIMM vs baloon is orthogonal to standby and should not be >> conflated into one property. >> >> paravirtualized means nothing at all in relationship to memory type and >> the desired online policy to me. > > Right, so with whatever we come up, it should allow to make a decision > in user space about > - if memory is to be onlined automatically And I will think about if we really should model standby memory. Maybe it is really better to have in user space something like (as Dan noted) if (isS390x() && type == "dimm") { /* don't online, on s390x system DIMMs are standby memory */ } The we could have in addition if (type == "balloon") { /* * Balloon will not be unplugged by offlining the whole block at * once, online as !movable. */ } But I'll have to think about the wording / types etc. (I neither like "dimm" nor "balloon"). -- Thanks, David / dhildenb