From mboxrd@z Thu Jan 1 00:00:00 1970 From: Michal Hocko Subject: Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types Date: Thu, 20 Dec 2018 14:08:32 +0100 Message-ID: <20181220130832.GH9104@dhcp22.suse.cz> References: <20181130175922.10425-1-david@redhat.com> <1b4afb6a-5f91-407d-6e6e-6a89b8cf5d56@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <1b4afb6a-5f91-407d-6e6e-6a89b8cf5d56@redhat.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: driverdev-devel-bounces@linuxdriverproject.org Sender: "devel" To: David Hildenbrand Cc: Oscar Salvador , "Rafael J. Wysocki" , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Benjamin Herrenschmidt , Balbir Singh , Dave Hansen , Heiko Carstens , Wei Yang , linux-mm@kvack.org, Pavel Tatashin , Arun KS , "H. Peter Anvin" , Stephen Rothwell , Rashmica Gupta , Boris Ostrovsky , Paul Mackerras , Pavel Tatashin , linux-s390@vger.kernel.org, Michael Neuling , Stefano Stabellini Dave Jiang List-Id: linux-acpi@vger.kernel.org On Thu 20-12-18 13:58:16, David Hildenbrand wrote: > On 30.11.18 18:59, David Hildenbrand wrote: > > This is the second approach, introducing more meaningful memory block > > types and not changing online behavior in the kernel. It is based on > > latest linux-next. > > > > As we found out during dicussion, user space should always handle onlining > > of memory, in any case. However in order to make smart decisions in user > > space about if and how to online memory, we have to export more information > > about memory blocks. This way, we can formulate rules in user space. > > > > One such information is the type of memory block we are talking about. > > This helps to answer some questions like: > > - Does this memory block belong to a DIMM? > > - Can this DIMM theoretically ever be unplugged again? > > - Was this memory added by a balloon driver that will rely on balloon > > inflation to remove chunks of that memory again? Which zone is advised? > > - Is this special standby memory on s390x that is usually not automatically > > onlined? > > > > And in short it helps to answer to some extend (excluding zone imbalances) > > - Should I online this memory block? > > - To which zone should I online this memory block? > > ... of course special use cases will result in different anwers. But that's > > why user space has control of onlining memory. > > > > More details can be found in Patch 1 and Patch 3. > > Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x. > > > > > > Example: > > $ udevadm info -q all -a /sys/devices/system/memory/memory0 > > KERNEL=="memory0" > > SUBSYSTEM=="memory" > > DRIVER=="" > > ATTR{online}=="1" > > ATTR{phys_device}=="0" > > ATTR{phys_index}=="00000000" > > ATTR{removable}=="0" > > ATTR{state}=="online" > > ATTR{type}=="boot" > > ATTR{valid_zones}=="none" > > $ udevadm info -q all -a /sys/devices/system/memory/memory90 > > KERNEL=="memory90" > > SUBSYSTEM=="memory" > > DRIVER=="" > > ATTR{online}=="1" > > ATTR{phys_device}=="0" > > ATTR{phys_index}=="0000005a" > > ATTR{removable}=="1" > > ATTR{state}=="online" > > ATTR{type}=="dimm" > > ATTR{valid_zones}=="Normal" > > > > > > RFC -> RFCv2: > > - Now also taking care of PPC (somehow missed it :/ ) > > - Split the series up to some degree (some ideas on how to split up patch 3 > > would be very welcome) > > - Introduce more memory block types. Turns out abstracting too much was > > rather confusing and not helpful. Properly document them. > > > > Notes: > > - I wanted to convert the enum of types into a named enum but this > > provoked all kinds of different errors. For now, I am doing it just like > > the other types (e.g. online_type) we are using in that context. > > - The "removable" property should never have been named like that. It > > should have been "offlinable". Can we still rename that? E.g. boot memory > > is sometimes marked as removable ... > > > > > Any feedback regarding the suggested block types would be very much > appreciated! I still do not like this much to be honest. I just didn't get to think through this properly. My fear is that this is conflating an actual API with the current implementation and as such will cause problems in future. But I haven't really looked into your patches closely so I might be wrong. Anyway I won't be able to look into it by the end of year. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by kanga.kvack.org (Postfix) with ESMTP id B661B8E0002 for ; Thu, 20 Dec 2018 08:08:40 -0500 (EST) Received: by mail-ed1-f72.google.com with SMTP id l45so2324170edb.1 for ; Thu, 20 Dec 2018 05:08:40 -0800 (PST) Received: from mx1.suse.de (mx2.suse.de. [195.135.220.15]) by mx.google.com with ESMTPS id c43si1041316edc.97.2018.12.20.05.08.39 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 20 Dec 2018 05:08:39 -0800 (PST) Date: Thu, 20 Dec 2018 14:08:32 +0100 From: Michal Hocko Subject: Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types Message-ID: <20181220130832.GH9104@dhcp22.suse.cz> References: <20181130175922.10425-1-david@redhat.com> <1b4afb6a-5f91-407d-6e6e-6a89b8cf5d56@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1b4afb6a-5f91-407d-6e6e-6a89b8cf5d56@redhat.com> Sender: owner-linux-mm@kvack.org List-ID: To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-ia64@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, linux-acpi@vger.kernel.org, devel@linuxdriverproject.org, xen-devel@lists.xenproject.org, x86@kernel.org, Andrew Banman , Andrew Morton , Andy Lutomirski , Arun KS , Balbir Singh , Benjamin Herrenschmidt , Borislav Petkov , Boris Ostrovsky , Christophe Leroy , Dan Williams , Dave Hansen , Dave Jiang , Fenghua Yu , Greg Kroah-Hartman , Haiyang Zhang , Heiko Carstens , "H. Peter Anvin" , Ingo Molnar , Ingo Molnar , Jan =?iso-8859-1?Q?H=2E_Sch=F6nherr?= , =?iso-8859-1?B?Suly9G1l?= Glisse , Jonathan =?iso-8859-1?Q?Neusch=E4fer?= , Joonsoo Kim , Juergen Gross , "Kirill A. Shutemov" , "K. Y. Srinivasan" , Len Brown , Logan Gunthorpe , Martin Schwidefsky , Mathieu Malaterre , Matthew Wilcox , Mauricio Faria de Oliveira , Michael Ellerman , Michael Neuling , Michal =?iso-8859-1?Q?Such=E1nek?= , Mike Rapoport , "mike.travis@hpe.com" , Nathan Fontenot , Nicholas Piggin , Oscar Salvador , Oscar Salvador , Paul Mackerras , Pavel Tatashin , Pavel Tatashin , Pavel Tatashin , Peter Zijlstra , "Rafael J. Wysocki" , "Rafael J. Wysocki" , Rashmica Gupta , Rich Felker , Rob Herring , Stefano Stabellini , Stephen Hemminger , Stephen Rothwell , Thomas Gleixner , Tony Luck , Vasily Gorbik , Vitaly Kuznetsov , Wei Yang , Yoshinori Sato , YueHaibing On Thu 20-12-18 13:58:16, David Hildenbrand wrote: > On 30.11.18 18:59, David Hildenbrand wrote: > > This is the second approach, introducing more meaningful memory block > > types and not changing online behavior in the kernel. It is based on > > latest linux-next. > > > > As we found out during dicussion, user space should always handle onlining > > of memory, in any case. However in order to make smart decisions in user > > space about if and how to online memory, we have to export more information > > about memory blocks. This way, we can formulate rules in user space. > > > > One such information is the type of memory block we are talking about. > > This helps to answer some questions like: > > - Does this memory block belong to a DIMM? > > - Can this DIMM theoretically ever be unplugged again? > > - Was this memory added by a balloon driver that will rely on balloon > > inflation to remove chunks of that memory again? Which zone is advised? > > - Is this special standby memory on s390x that is usually not automatically > > onlined? > > > > And in short it helps to answer to some extend (excluding zone imbalances) > > - Should I online this memory block? > > - To which zone should I online this memory block? > > ... of course special use cases will result in different anwers. But that's > > why user space has control of onlining memory. > > > > More details can be found in Patch 1 and Patch 3. > > Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x. > > > > > > Example: > > $ udevadm info -q all -a /sys/devices/system/memory/memory0 > > KERNEL=="memory0" > > SUBSYSTEM=="memory" > > DRIVER=="" > > ATTR{online}=="1" > > ATTR{phys_device}=="0" > > ATTR{phys_index}=="00000000" > > ATTR{removable}=="0" > > ATTR{state}=="online" > > ATTR{type}=="boot" > > ATTR{valid_zones}=="none" > > $ udevadm info -q all -a /sys/devices/system/memory/memory90 > > KERNEL=="memory90" > > SUBSYSTEM=="memory" > > DRIVER=="" > > ATTR{online}=="1" > > ATTR{phys_device}=="0" > > ATTR{phys_index}=="0000005a" > > ATTR{removable}=="1" > > ATTR{state}=="online" > > ATTR{type}=="dimm" > > ATTR{valid_zones}=="Normal" > > > > > > RFC -> RFCv2: > > - Now also taking care of PPC (somehow missed it :/ ) > > - Split the series up to some degree (some ideas on how to split up patch 3 > > would be very welcome) > > - Introduce more memory block types. Turns out abstracting too much was > > rather confusing and not helpful. Properly document them. > > > > Notes: > > - I wanted to convert the enum of types into a named enum but this > > provoked all kinds of different errors. For now, I am doing it just like > > the other types (e.g. online_type) we are using in that context. > > - The "removable" property should never have been named like that. It > > should have been "offlinable". Can we still rename that? E.g. boot memory > > is sometimes marked as removable ... > > > > > Any feedback regarding the suggested block types would be very much > appreciated! I still do not like this much to be honest. I just didn't get to think through this properly. My fear is that this is conflating an actual API with the current implementation and as such will cause problems in future. But I haven't really looked into your patches closely so I might be wrong. Anyway I won't be able to look into it by the end of year. -- Michal Hocko SUSE Labs From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=MAILING_LIST_MULTI,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 456F7C43387 for ; Thu, 20 Dec 2018 14:09:26 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BEC09217D8 for ; Thu, 20 Dec 2018 14:09:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BEC09217D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 43LDCR63C0zDr5b for ; Fri, 21 Dec 2018 01:09:23 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: lists.ozlabs.org; spf=softfail (mailfrom) smtp.mailfrom=kernel.org (client-ip=195.135.220.15; helo=mx1.suse.de; envelope-from=mhocko@kernel.org; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=fail (p=none dis=none) header.from=kernel.org Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 43LBsQ1KFPzDqgd for ; Fri, 21 Dec 2018 00:08:42 +1100 (AEDT) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 1D2DCADF4; Thu, 20 Dec 2018 13:08:38 +0000 (UTC) Date: Thu, 20 Dec 2018 14:08:32 +0100 From: Michal Hocko To: David Hildenbrand Subject: Re: [PATCH RFCv2 0/4] mm/memory_hotplug: Introduce memory block types Message-ID: <20181220130832.GH9104@dhcp22.suse.cz> References: <20181130175922.10425-1-david@redhat.com> <1b4afb6a-5f91-407d-6e6e-6a89b8cf5d56@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1b4afb6a-5f91-407d-6e6e-6a89b8cf5d56@redhat.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Mailman-Approved-At: Fri, 21 Dec 2018 01:05:18 +1100 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Oscar Salvador , "Rafael J. Wysocki" , Rich Felker , linux-ia64@vger.kernel.org, linux-sh@vger.kernel.org, Peter Zijlstra , Dave Hansen , Heiko Carstens , Wei Yang , linux-mm@kvack.org, Pavel Tatashin , Arun KS , "H. Peter Anvin" , Stephen Rothwell , Rashmica Gupta , "K. Y. Srinivasan" , Boris Ostrovsky , Paul Mackerras , Pavel Tatashin , linux-s390@vger.kernel.org, Michael Neuling , Stefano Stabellini , Dave Jiang , Yoshinori Sato , Logan Gunthorpe , x86@kernel.org, YueHaibing , Pavel Tatashin , Matthew Wilcox , Ingo Molnar , linux-acpi@vger.kernel.org, Ingo Molnar , xen-devel@lists.xenproject.org, Michal =?iso-8859-1?Q?Such=E1nek?= , Len Brown , Fenghua Yu , Vitaly Kuznetsov , Jan =?iso-8859-1?Q?H=2E_Sch=F6nherr?= , Juergen Gross , Vasily Gorbik , Rob Herring , "mike.travis@hpe.com" , Haiyang Zhang , Jonathan =?iso-8859-1?Q?Neusch=E4fer?= , Nicholas Piggin , =?iso-8859-1?B?Suly9G1l?= Glisse , Mike Rapoport , Borislav Petkov , Andy Lutomirski , Nathan Fontenot , Stephen Hemminger , Dan Williams , Joonsoo Kim , Oscar Salvador , Tony Luck , Andrew Banman , Mathieu Malaterre , Greg Kroah-Hartman , "Rafael J. Wysocki" , linux-kernel@vger.kernel.org, Mauricio Faria de Oliveira , Thomas Gleixner , Martin Schwidefsky , devel@linuxdriverproject.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "Kirill A. Shutemov" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Thu 20-12-18 13:58:16, David Hildenbrand wrote: > On 30.11.18 18:59, David Hildenbrand wrote: > > This is the second approach, introducing more meaningful memory block > > types and not changing online behavior in the kernel. It is based on > > latest linux-next. > > > > As we found out during dicussion, user space should always handle onlining > > of memory, in any case. However in order to make smart decisions in user > > space about if and how to online memory, we have to export more information > > about memory blocks. This way, we can formulate rules in user space. > > > > One such information is the type of memory block we are talking about. > > This helps to answer some questions like: > > - Does this memory block belong to a DIMM? > > - Can this DIMM theoretically ever be unplugged again? > > - Was this memory added by a balloon driver that will rely on balloon > > inflation to remove chunks of that memory again? Which zone is advised? > > - Is this special standby memory on s390x that is usually not automatically > > onlined? > > > > And in short it helps to answer to some extend (excluding zone imbalances) > > - Should I online this memory block? > > - To which zone should I online this memory block? > > ... of course special use cases will result in different anwers. But that's > > why user space has control of onlining memory. > > > > More details can be found in Patch 1 and Patch 3. > > Tested on x86 with hotplugged DIMMs. Cross-compiled for PPC and s390x. > > > > > > Example: > > $ udevadm info -q all -a /sys/devices/system/memory/memory0 > > KERNEL=="memory0" > > SUBSYSTEM=="memory" > > DRIVER=="" > > ATTR{online}=="1" > > ATTR{phys_device}=="0" > > ATTR{phys_index}=="00000000" > > ATTR{removable}=="0" > > ATTR{state}=="online" > > ATTR{type}=="boot" > > ATTR{valid_zones}=="none" > > $ udevadm info -q all -a /sys/devices/system/memory/memory90 > > KERNEL=="memory90" > > SUBSYSTEM=="memory" > > DRIVER=="" > > ATTR{online}=="1" > > ATTR{phys_device}=="0" > > ATTR{phys_index}=="0000005a" > > ATTR{removable}=="1" > > ATTR{state}=="online" > > ATTR{type}=="dimm" > > ATTR{valid_zones}=="Normal" > > > > > > RFC -> RFCv2: > > - Now also taking care of PPC (somehow missed it :/ ) > > - Split the series up to some degree (some ideas on how to split up patch 3 > > would be very welcome) > > - Introduce more memory block types. Turns out abstracting too much was > > rather confusing and not helpful. Properly document them. > > > > Notes: > > - I wanted to convert the enum of types into a named enum but this > > provoked all kinds of different errors. For now, I am doing it just like > > the other types (e.g. online_type) we are using in that context. > > - The "removable" property should never have been named like that. It > > should have been "offlinable". Can we still rename that? E.g. boot memory > > is sometimes marked as removable ... > > > > > Any feedback regarding the suggested block types would be very much > appreciated! I still do not like this much to be honest. I just didn't get to think through this properly. My fear is that this is conflating an actual API with the current implementation and as such will cause problems in future. But I haven't really looked into your patches closely so I might be wrong. Anyway I won't be able to look into it by the end of year. -- Michal Hocko SUSE Labs