All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Leonardo Bras <leonardo@linux.ibm.com>,
	Nathan Lynch <nathanl@linux.ibm.com>,
	Allison Randal <allison@lohutok.net>,
	Nathan Fontenot <nfont@linux.vnet.ibm.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	lantianyu1986@gmail.com,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Subject: Re: [PATCH RFC v1] mm: is_mem_section_removable() overhaul
Date: Wed, 22 Jan 2020 19:15:47 +0100	[thread overview]
Message-ID: <626d344e-8243-c161-cd07-ed1276eba73d@redhat.com> (raw)
In-Reply-To: <20200122164618.GY29276@dhcp22.suse.cz>

On 22.01.20 17:46, Michal Hocko wrote:
> On Wed 22-01-20 12:58:16, David Hildenbrand wrote:
>> On 22.01.20 11:54, David Hildenbrand wrote:
>>> On 22.01.20 11:42, Michal Hocko wrote:
>>>> On Wed 22-01-20 11:39:08, David Hildenbrand wrote:
>>>>>>>> Really, the interface is flawed and should have never been merged in the
>>>>>>>> first place. We cannot simply remove it altogether I am afraid so let's
>>>>>>>> at least remove the bogus code and pretend that the world is a better
>>>>>>>> place where everything is removable except the reality sucks...
>>>>>>>
>>>>>>> As I expressed already, the interface works as designed/documented and
>>>>>>> has been used like that for years.
>>>>>>
>>>>>> It seems we do differ in the usefulness though. Using a crappy interface
>>>>>> for years doesn't make it less crappy. I do realize we cannot remove the
>>>>>> interface but we can remove issues with the implementation and I dare to
>>>>>> say that most existing users wouldn't really notice.
>>>>>
>>>>> Well, at least powerpc-utils (why this interface was introduced) will
>>>>> notice a) performance wise and b) because more logging output will be
>>>>> generated (obviously non-offlineable blocks will be tried to offline).
>>>>
>>>> I would really appreciate some specific example for a real usecase. I am
>>>> not familiar with powerpc-utils worklflows myself.
>>>>
>>>
>>> Not an expert myself:
>>>
>>> https://github.com/ibm-power-utilities/powerpc-utils
>>>
>>> -> src/drmgr/drslot_chrp_mem.c
>>>
>>> On request to remove some memory it will
>>>
>>> a) Read "->removable" of all memory blocks ("lmb")
>>> b) Check if the request can be fulfilled using the removable blocks
>>> c) Try to offline the memory blocks by trying to offline it. If that
>>> succeeded, trigger removeal of it using some hypervisor hooks.
>>>
>>> Interestingly, with "AMS ballooning", it will already consider the
>>> "removable" information useless (most probably, because of
>>> non-migratable balloon pages that can be offlined - I assume the powerpc
>>> code that I converted to proper balloon compaction just recently). a)
>>> and b) is skipped.
>>>
>>> Returning "yes" on all blocks will make them handle it just like if "AMS
>>> ballooning" is active. So any memory block will be tried. Should work
>>> but will be slower if no ballooning is active.
>>>
>>
>> On lsmem:
>>
>> https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lgdd/lgdd_r_lsmem_cmd.html
>>
>> "
>> Removable
>>     yes if the memory range can be set offline, no if it cannot be set
>> offline. A dash (-) means that the range is already offline. The kernel
>> method that identifies removable memory ranges is heuristic and not
>> exact. Occasionally, memory ranges are falsely reported as removable or
>> falsely reported as not removable.
>> "
>>
>> Usage of lsmem paird with chmem:
>>
>> https://access.redhat.com/solutions/3937181
>>
>>
>> Especially interesting for IBM z Systems, whereby memory
>> onlining/offlining will trigger the actual population of memory in the
>> hypervisor. So if an admin wants to offline some memory (to give it back
>> to the hypervisor), it would use lsmem to identify such blocks first,
>> instead of trying random blocks until one offlining request succeeds.
> 
> I am sorry for being dense here but I still do not understand why s390

It's good that we talk about it :) It's hard to reconstruct actual use
cases from tools and some documentation only ...

Side note (just FYI): One difference on s390x compared to other
architectures (AFAIKS) is that once memory is offline, you might not be
allowed (by the hypervisor) to online it again - because it was
effectively unplugged. Such memory is not removed via remove_memory(),
it's simply kept offline.


> and the way how it does the hotremove matters here. Afterall there are
> no arch specific operations done until the memory is offlined. Also
> randomly checking memory blocks and then hoping that the offline will
> succeed is not way much different from just trying the offline the
> block. Both have to crawl through the pfn range and bail out on the
> unmovable memory.

I think in general we have to approaches to memory unplugging.

1. Know explicitly what you want to unplug (e.g., a DIMM spanning
multiple memory blocks).

2. Find random memory blocks you can offline/unplug.


For 1, I think we both agree that we don't need this. Just try to
offline and you know if it worked.

Now of course, for 2 you can try random blocks until you succeeded. From
a sysadmin point of view that's very inefficient. From a powerpc-utils
point of view, that's inefficient.

I learned just now, "chmem"[1] has a mode where you can specify a "size"
and not only a range. So a sysadmin can still control onlining/offlining
for this use case with a few commands. In other tools (e.g.,
powerpc-utils), well, you have to try to offline random memory blocks
(just like chmem does).


AFAIK, once we turn /sys/.../removable useless, I can see the following
changes:

1. Trying to offline a certain amount of memory blocks gets slower/takes
longer/is less efficient. Might be tolerable. The tools seem to keep
working.

2. You can no longer make a rough estimate how much memory you could
offline - before you actually try to offline it. I can only imagine that
something like this makes sense in a virtual environment (e.g., IBM z)
to balance memory between virtual machines, but I am not aware of a real
user of something like that.


So what I can do is

a) Come up with a patch that rips that stuff out (well I have that
already lying around)

b) Describe the existing users + changes we will see

c) CC relevant people I identify (lsmem/chmem/powerpc-utils/etc.) on the
patch to see if we are missing other use cases/users/implications.

Sounds like a plan?


[1]
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/sys-utils/chmem.c

-- 
Thanks,

David / dhildenb


WARNING: multiple messages have this Message-ID (diff)
From: David Hildenbrand <david@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Thomas Gleixner <tglx@linutronix.de>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	"Rafael J. Wysocki" <rafael@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>, Paul Mackerras <paulus@samba.org>,
	Nathan Fontenot <nfont@linux.vnet.ibm.com>,
	Leonardo Bras <leonardo@linux.ibm.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Allison Randal <allison@lohutok.net>,
	lantianyu1986@gmail.com
Subject: Re: [PATCH RFC v1] mm: is_mem_section_removable() overhaul
Date: Wed, 22 Jan 2020 19:15:47 +0100	[thread overview]
Message-ID: <626d344e-8243-c161-cd07-ed1276eba73d@redhat.com> (raw)
In-Reply-To: <20200122164618.GY29276@dhcp22.suse.cz>

On 22.01.20 17:46, Michal Hocko wrote:
> On Wed 22-01-20 12:58:16, David Hildenbrand wrote:
>> On 22.01.20 11:54, David Hildenbrand wrote:
>>> On 22.01.20 11:42, Michal Hocko wrote:
>>>> On Wed 22-01-20 11:39:08, David Hildenbrand wrote:
>>>>>>>> Really, the interface is flawed and should have never been merged in the
>>>>>>>> first place. We cannot simply remove it altogether I am afraid so let's
>>>>>>>> at least remove the bogus code and pretend that the world is a better
>>>>>>>> place where everything is removable except the reality sucks...
>>>>>>>
>>>>>>> As I expressed already, the interface works as designed/documented and
>>>>>>> has been used like that for years.
>>>>>>
>>>>>> It seems we do differ in the usefulness though. Using a crappy interface
>>>>>> for years doesn't make it less crappy. I do realize we cannot remove the
>>>>>> interface but we can remove issues with the implementation and I dare to
>>>>>> say that most existing users wouldn't really notice.
>>>>>
>>>>> Well, at least powerpc-utils (why this interface was introduced) will
>>>>> notice a) performance wise and b) because more logging output will be
>>>>> generated (obviously non-offlineable blocks will be tried to offline).
>>>>
>>>> I would really appreciate some specific example for a real usecase. I am
>>>> not familiar with powerpc-utils worklflows myself.
>>>>
>>>
>>> Not an expert myself:
>>>
>>> https://github.com/ibm-power-utilities/powerpc-utils
>>>
>>> -> src/drmgr/drslot_chrp_mem.c
>>>
>>> On request to remove some memory it will
>>>
>>> a) Read "->removable" of all memory blocks ("lmb")
>>> b) Check if the request can be fulfilled using the removable blocks
>>> c) Try to offline the memory blocks by trying to offline it. If that
>>> succeeded, trigger removeal of it using some hypervisor hooks.
>>>
>>> Interestingly, with "AMS ballooning", it will already consider the
>>> "removable" information useless (most probably, because of
>>> non-migratable balloon pages that can be offlined - I assume the powerpc
>>> code that I converted to proper balloon compaction just recently). a)
>>> and b) is skipped.
>>>
>>> Returning "yes" on all blocks will make them handle it just like if "AMS
>>> ballooning" is active. So any memory block will be tried. Should work
>>> but will be slower if no ballooning is active.
>>>
>>
>> On lsmem:
>>
>> https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lgdd/lgdd_r_lsmem_cmd.html
>>
>> "
>> Removable
>>     yes if the memory range can be set offline, no if it cannot be set
>> offline. A dash (-) means that the range is already offline. The kernel
>> method that identifies removable memory ranges is heuristic and not
>> exact. Occasionally, memory ranges are falsely reported as removable or
>> falsely reported as not removable.
>> "
>>
>> Usage of lsmem paird with chmem:
>>
>> https://access.redhat.com/solutions/3937181
>>
>>
>> Especially interesting for IBM z Systems, whereby memory
>> onlining/offlining will trigger the actual population of memory in the
>> hypervisor. So if an admin wants to offline some memory (to give it back
>> to the hypervisor), it would use lsmem to identify such blocks first,
>> instead of trying random blocks until one offlining request succeeds.
> 
> I am sorry for being dense here but I still do not understand why s390

It's good that we talk about it :) It's hard to reconstruct actual use
cases from tools and some documentation only ...

Side note (just FYI): One difference on s390x compared to other
architectures (AFAIKS) is that once memory is offline, you might not be
allowed (by the hypervisor) to online it again - because it was
effectively unplugged. Such memory is not removed via remove_memory(),
it's simply kept offline.


> and the way how it does the hotremove matters here. Afterall there are
> no arch specific operations done until the memory is offlined. Also
> randomly checking memory blocks and then hoping that the offline will
> succeed is not way much different from just trying the offline the
> block. Both have to crawl through the pfn range and bail out on the
> unmovable memory.

I think in general we have to approaches to memory unplugging.

1. Know explicitly what you want to unplug (e.g., a DIMM spanning
multiple memory blocks).

2. Find random memory blocks you can offline/unplug.


For 1, I think we both agree that we don't need this. Just try to
offline and you know if it worked.

Now of course, for 2 you can try random blocks until you succeeded. From
a sysadmin point of view that's very inefficient. From a powerpc-utils
point of view, that's inefficient.

I learned just now, "chmem"[1] has a mode where you can specify a "size"
and not only a range. So a sysadmin can still control onlining/offlining
for this use case with a few commands. In other tools (e.g.,
powerpc-utils), well, you have to try to offline random memory blocks
(just like chmem does).


AFAIK, once we turn /sys/.../removable useless, I can see the following
changes:

1. Trying to offline a certain amount of memory blocks gets slower/takes
longer/is less efficient. Might be tolerable. The tools seem to keep
working.

2. You can no longer make a rough estimate how much memory you could
offline - before you actually try to offline it. I can only imagine that
something like this makes sense in a virtual environment (e.g., IBM z)
to balance memory between virtual machines, but I am not aware of a real
user of something like that.


So what I can do is

a) Come up with a patch that rips that stuff out (well I have that
already lying around)

b) Describe the existing users + changes we will see

c) CC relevant people I identify (lsmem/chmem/powerpc-utils/etc.) on the
patch to see if we are missing other use cases/users/implications.

Sounds like a plan?


[1]
https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git/tree/sys-utils/chmem.c

-- 
Thanks,

David / dhildenb


  reply	other threads:[~2020-01-22 18:16 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-17 10:57 [PATCH RFC v1] mm: is_mem_section_removable() overhaul David Hildenbrand
2020-01-17 10:57 ` David Hildenbrand
2020-01-17 11:33 ` Michal Hocko
2020-01-17 11:33   ` Michal Hocko
2020-01-17 13:08   ` David Hildenbrand
2020-01-17 13:08     ` David Hildenbrand
2020-01-17 14:52     ` Michal Hocko
2020-01-17 14:52       ` Michal Hocko
2020-01-17 14:58       ` David Hildenbrand
2020-01-17 14:58         ` David Hildenbrand
2020-01-17 15:29         ` Michal Hocko
2020-01-17 15:29           ` Michal Hocko
2020-01-17 15:54           ` Dan Williams
2020-01-17 15:54             ` Dan Williams
2020-01-17 15:54             ` Dan Williams
2020-01-17 16:10             ` David Hildenbrand
2020-01-17 16:10               ` David Hildenbrand
2020-01-17 16:57               ` Dan Williams
2020-01-17 16:57                 ` Dan Williams
2020-01-17 16:57                 ` Dan Williams
2020-01-20  7:48                 ` Michal Hocko
2020-01-20  7:48                   ` Michal Hocko
2020-01-20  9:14                   ` David Hildenbrand
2020-01-20  9:14                     ` David Hildenbrand
2020-01-20  9:20                     ` David Hildenbrand
2020-01-20  9:20                       ` David Hildenbrand
2020-01-21 12:07                     ` Michal Hocko
2020-01-21 12:07                       ` Michal Hocko
2020-01-22 10:39                       ` David Hildenbrand
2020-01-22 10:39                         ` David Hildenbrand
2020-01-22 10:42                         ` Michal Hocko
2020-01-22 10:42                           ` Michal Hocko
2020-01-22 10:54                           ` David Hildenbrand
2020-01-22 10:54                             ` David Hildenbrand
2020-01-22 11:58                             ` David Hildenbrand
2020-01-22 11:58                               ` David Hildenbrand
2020-01-22 16:46                               ` Michal Hocko
2020-01-22 16:46                                 ` Michal Hocko
2020-01-22 18:15                                 ` David Hildenbrand [this message]
2020-01-22 18:15                                   ` David Hildenbrand
2020-01-22 18:38                                   ` Michal Hocko
2020-01-22 18:38                                     ` Michal Hocko
2020-01-22 18:46                                     ` David Hildenbrand
2020-01-22 18:46                                       ` David Hildenbrand
2020-01-22 19:09                                       ` Michal Hocko
2020-01-22 19:09                                         ` Michal Hocko
2020-01-22 20:51                                         ` Dan Williams
2020-01-22 20:51                                           ` Dan Williams
2020-01-22 20:51                                           ` Dan Williams
2020-01-22 19:01                                   ` Michal Hocko
2020-01-22 19:01                                     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=626d344e-8243-c161-cd07-ed1276eba73d@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=allison@lohutok.net \
    --cc=anshuman.khandual@arm.com \
    --cc=benh@kernel.crashing.org \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=lantianyu1986@gmail.com \
    --cc=leonardo@linux.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mhocko@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=nathanl@linux.ibm.com \
    --cc=nfont@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=rafael@kernel.org \
    --cc=sfr@canb.auug.org.au \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.