All of lore.kernel.org
 help / color / mirror / Atom feed
* Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
@ 2020-09-10 10:20 David Hildenbrand
  2020-09-10 20:00 ` Dave Hansen
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-09-10 10:20 UTC (permalink / raw)
  To: Gerald Schaefer, Michal Hocko, akpm, Greg KH, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

Hi everybody,

I was just exploring how /sys/devices/system/memory/memoryX/phys_device
is/was used. It's one of these interfaces that most probably never
should have been added but now we are stuck with it.

"phys_device" was used on s390x in older versions of lsmem[2]/chmem[3],
back when they were still part of s390x-tools. They were later replaced
[5] by the variants in linux-utils. For example, RHEL6 and RHEL7 contain
lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux
on s390x [4].

"phys_device" was added with sysfs support for memory hotplug in commit
3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions")
in 2005. It always returned 0.

s390x started returning something != 0 on some setups (if sclp.rzm is
set by HW) in 2010 via commit 57b552ba0b2f("memory hotplug/s390: set
phys_device").

For s390x, it allowed for identifying which memory block devices belong
to the same memory increment (RZM). Only if all memory block devices
comprising a single memory increment were offline, the memory could
actually be removed in the hypervisor.

Since commit e5d709bb5fb7 ("s390/memory hotplug: provide
memory_block_size_bytes() function") in 2013 a memory block devices
spans at least one memory increment - which is why the interface isn't
really helpful/used anymore (except by old lsmem/chmem tools).

There were once RFC patches to make use of it in ACPI, but it could be
solved using different interfaces [1].


While I'd love to rip it out completely, I think it would break old
lsmem/chmem completely - and I assume that's not acceptable. I was
wondering what would be considered safe to do now/in the future:

1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on
s390x). This will make old lsmem/chmem behave differently after
switching to a new kernel, like if sclp.rzm would not be set by HW -
AFAIU, it will assume all memory is in a single memory increment. Do we
care?
2. Restrict it to s390x only. It always returned 0 on other
architectures, I was not able to find any user.

I think 2 should be safe to do (never used on other archs). I do wonder
what the feelings are about 1.

Thoughts?


[1] https://patchwork.kernel.org/patch/2163871/
[2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem
[3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem
[4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134
[5]
https://github.com/ibm-s390-tools/s390-tools/commit/778292e771fb00cfcbd7ff6535ee3d9fde612dc5#diff-82c32a7f4c597c50db90157ed0c581b3

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand
@ 2020-09-10 20:00 ` Dave Hansen
  2020-09-10 20:31   ` David Hildenbrand
  2020-09-10 20:57 ` Dave Hansen
  2020-09-22 13:56 ` Gerald Schaefer
  2 siblings, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2020-09-10 20:00 UTC (permalink / raw)
  To: David Hildenbrand, Gerald Schaefer, Michal Hocko, akpm, Greg KH,
	Jan Höppner, Heiko Carstens, linux-mm, linux-api,
	Dave Hansen, linux-kernel

On 9/10/20 3:20 AM, David Hildenbrand wrote:
> I was just exploring how /sys/devices/system/memory/memoryX/phys_device
> is/was used. It's one of these interfaces that most probably never
> should have been added but now we are stuck with it.

While I'm all for cleanups, what specific problems is phys_device causing?

Are you hoping that we can just remove users of memoryX/* until there
are no more left, and this is the easiest place to start?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-10 20:00 ` Dave Hansen
@ 2020-09-10 20:31   ` David Hildenbrand
  2020-09-11  7:20     ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2020-09-10 20:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: David Hildenbrand, Gerald Schaefer, Michal Hocko, akpm, Greg KH,
	Jan Höppner, Heiko Carstens, linux-mm, linux-api,
	Dave Hansen, linux-kernel



> Am 10.09.2020 um 22:01 schrieb Dave Hansen <dave.hansen@intel.com>:
> 
> On 9/10/20 3:20 AM, David Hildenbrand wrote:
>> I was just exploring how /sys/devices/system/memory/memoryX/phys_device
>> is/was used. It's one of these interfaces that most probably never
>> should have been added but now we are stuck with it.
> 
> While I'm all for cleanups, what specific problems is phys_device causing?
> 

Mostly stumbling over it, understanding that it is basically unused with new userspace for good reason, questioning its existence.

E.g., I am working on virtio-mem support for s390x. Displaying misleading/wrong phys_device indications isn‘t particularly helpful - especially once there are different ways to hotplug memory for an architecture.

> Are you hoping that we can just remove users of memoryX/* until there
> are no more left, and this is the easiest place to start?

At least reducing it to a minimum with clear semantics. Even with automatic onlining there are still reasons why we need to keep the interface for now (e.g., reloading kexec to update the kdump headers on memory hot(un)plug). But also standby memory handling on s399x requires it (->manual onlining).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand
  2020-09-10 20:00 ` Dave Hansen
@ 2020-09-10 20:57 ` Dave Hansen
  2020-09-22 13:56 ` Gerald Schaefer
  2 siblings, 0 replies; 21+ messages in thread
From: Dave Hansen @ 2020-09-10 20:57 UTC (permalink / raw)
  To: David Hildenbrand, Gerald Schaefer, Michal Hocko, akpm, Greg KH,
	Jan Höppner, Heiko Carstens, linux-mm, linux-api,
	Dave Hansen, linux-kernel

On 9/10/20 3:20 AM, David Hildenbrand wrote:
> While I'd love to rip it out completely, I think it would break old
> lsmem/chmem completely - and I assume that's not acceptable. I was
> wondering what would be considered safe to do now/in the future:
> 
> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on
> s390x). This will make old lsmem/chmem behave differently after
> switching to a new kernel, like if sclp.rzm would not be set by HW -
> AFAIU, it will assume all memory is in a single memory increment. Do we
> care?
> 2. Restrict it to s390x only. It always returned 0 on other
> architectures, I was not able to find any user.

By "restrict it", do you mean just remove the sysfs file on everything
other than s390x?  That seems like a good idea, especially if we don't
have any users.  That, plus boot option or something to reenable it
would be nice if someone trips over it disappearing.

If there is a user, we stand a chance of finding them because they'll
hopefully get a good error message.  Worst case, an strace will show an
-ENOENT and should be pretty easy to track down.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-10 20:31   ` David Hildenbrand
@ 2020-09-11  7:20     ` Michal Hocko
  2020-09-11  8:09       ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2020-09-11  7:20 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

On Thu 10-09-20 22:31:09, David Hildenbrand wrote:
> 
> 
> > Am 10.09.2020 um 22:01 schrieb Dave Hansen <dave.hansen@intel.com>:
> > 
> > On 9/10/20 3:20 AM, David Hildenbrand wrote:
> >> I was just exploring how /sys/devices/system/memory/memoryX/phys_device
> >> is/was used. It's one of these interfaces that most probably never
> >> should have been added but now we are stuck with it.
> > 
> > While I'm all for cleanups, what specific problems is phys_device causing?
> > 
> 
> Mostly stumbling over it, understanding that it is basically unused
> with new userspace for good reason, questioning its existence.
> 
> E.g., I am working on virtio-mem support for s390x. Displaying
> misleading/wrong phys_device indications isn‘t particularly helpful
> - especially once there are different ways to hotplug memory for an
> architecture.
> 
> > Are you hoping that we can just remove users of memoryX/* until there
> > are no more left, and this is the easiest place to start?
> 
> At least reducing it to a minimum with clear semantics. Even with
> automatic onlining there are still reasons why we need to keep the
> interface for now (e.g., reloading kexec to update the kdump headers
> on memory hot(un)plug). But also standby memory handling on s399x
> requires it (->manual onlining).

While I agree that the existing interface is far from ideal, I am not
sure it makes much sense to invest energy into cleaning it up. We can
have a pig with a lipstick but but this will not solve the underlying
problem that we have I believe. The interface doesn't scale with the
block count (especially on some platforms like ppc), it is too
inflexible (single size of the block) and many others. I believe we need
a completely new interface which would effectively deprecate the
existing one. One could still chose to use the old interface but new
usecases would use the new one ideally.

I have brought that up earlier already without much follow up
(http://lkml.kernel.org/r/20200619120704.GD12177@dhcp22.suse.cz)

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11  7:20     ` Michal Hocko
@ 2020-09-11  8:09       ` David Hildenbrand
  2020-09-11  9:12         ` Michal Hocko
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2020-09-11  8:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

On 11.09.20 09:20, Michal Hocko wrote:
> On Thu 10-09-20 22:31:09, David Hildenbrand wrote:
>>
>>
>>> Am 10.09.2020 um 22:01 schrieb Dave Hansen <dave.hansen@intel.com>:
>>>
>>> On 9/10/20 3:20 AM, David Hildenbrand wrote:
>>>> I was just exploring how /sys/devices/system/memory/memoryX/phys_device
>>>> is/was used. It's one of these interfaces that most probably never
>>>> should have been added but now we are stuck with it.
>>>
>>> While I'm all for cleanups, what specific problems is phys_device causing?
>>>
>>
>> Mostly stumbling over it, understanding that it is basically unused
>> with new userspace for good reason, questioning its existence.
>>
>> E.g., I am working on virtio-mem support for s390x. Displaying
>> misleading/wrong phys_device indications isn‘t particularly helpful
>> - especially once there are different ways to hotplug memory for an
>> architecture.
>>
>>> Are you hoping that we can just remove users of memoryX/* until there
>>> are no more left, and this is the easiest place to start?
>>
>> At least reducing it to a minimum with clear semantics. Even with
>> automatic onlining there are still reasons why we need to keep the
>> interface for now (e.g., reloading kexec to update the kdump headers
>> on memory hot(un)plug). But also standby memory handling on s399x
>> requires it (->manual onlining).
> 
> While I agree that the existing interface is far from ideal, I am not
> sure it makes much sense to invest energy into cleaning it up. We can
> have a pig with a lipstick but but this will not solve the underlying
> problem that we have I believe. The interface doesn't scale with the
> block count (especially on some platforms like ppc), it is too
> inflexible (single size of the block) and many others. I believe we need
> a completely new interface which would effectively deprecate the
> existing one. One could still chose to use the old interface but new
> usecases would use the new one ideally.

Even with a new interface (that does allow for variable-sized block
sizes), we will still end up with many memory block devices. It's not
the one thing that solves all our problems.

Consider two cases:

1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to
online/offline the whole thing. HW can effectively only plug/unplug the
whole thing. It makes sense in some (most?) setups to represent one DIMM
as one memory block device.

2. Hot(un)plugging small memory increments. This is mostly the case in
virtualized environments - especially hyper-v balloon, xen balloon,
virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC,
you want at least all (16MB!) memory block devices that can get
unplugged again individually ("LMBs") as separate memory blocks. Same on
s390x on memory increment size (currently effectively the memory block
size).

In summary, larger memory block devices mostly only make sense with
DIMMs (and for boot memory in some cases). We will still end up with
many memory block devices in other configurations.

I do agree that a "disable sysfs" option is interesting - even with
memory hotplug (we mostly need a way to configure it and a way to notify
kexec-tools about memory hot(un)plug events). I am currently (once
again) looking into improving auto-onlining support in the kernel.

Having that said, I much rather want to see smaller improvements (that
can be fine-tuned individually - like allowing variable-sized memory
blocks) than doing a switch to "new shiny" and figuring out after a
while that we need "new shiny2".

I consider removing "phys_device" as one of these tunables. The question
would be how to make such sysfs changes easy to configure
("-phys_device", "+variable_sized_blocks" ...)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11  8:09       ` David Hildenbrand
@ 2020-09-11  9:12         ` Michal Hocko
  2020-09-11 10:09           ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2020-09-11  9:12 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

On Fri 11-09-20 10:09:07, David Hildenbrand wrote:
[...]
> Consider two cases:
> 
> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to
> online/offline the whole thing. HW can effectively only plug/unplug the
> whole thing. It makes sense in some (most?) setups to represent one DIMM
> as one memory block device.

Yes, for the physical hotplug it doesn't really make much sense to me to
offline portions that the HW cannot hotremove.

> 2. Hot(un)plugging small memory increments. This is mostly the case in
> virtualized environments - especially hyper-v balloon, xen balloon,
> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC,
> you want at least all (16MB!) memory block devices that can get
> unplugged again individually ("LMBs") as separate memory blocks. Same on
> s390x on memory increment size (currently effectively the memory block
> size).

Yes I do recognize those usecase even though I will not pretend I
consider it quesitonable. E.g. any hotplug with a smaller granularity
than the memory model in Linus allows is just dubious. We simply cannot
implement that without a lot of wasting and then the question is what is
the real point.

> In summary, larger memory block devices mostly only make sense with
> DIMMs (and for boot memory in some cases). We will still end up with
> many memory block devices in other configurations.

And that is fine because the boot time memory is still likely the
primary source of memory. And reducing memory devices for those is a
huge improvement already (just think of a multi TB system with
gazillions pointless memory devices). 

> I do agree that a "disable sysfs" option is interesting - even with
> memory hotplug (we mostly need a way to configure it and a way to notify
> kexec-tools about memory hot(un)plug events). I am currently (once
> again) looking into improving auto-onlining support in the kernel.
> 
> Having that said, I much rather want to see smaller improvements (that
> can be fine-tuned individually - like allowing variable-sized memory
> blocks) than doing a switch to "new shiny" and figuring out after a
> while that we need "new shiny2".

There is only one certainty. Providing a long term interface with ever
growing (ab)users is a hard target. And shinyN might be needed in the
end. Who knows. My main point is that the existing interface is hitting
a wall on usecases which _do_not_care_ about memory hotplug. And that is
something we should be looking at.

> I consider removing "phys_device" as one of these tunables. The question
> would be how to make such sysfs changes easy to configure
> ("-phys_device", "+variable_sized_blocks" ...)

I am with you on that. There are more candidates in memory block
directories which have dubious value. Deprecation process is a PITA and
that's why I thought that it would make sense to focus on something that
we can mis^Wdesign with exising and forming usecases in mind that would
get rid of all the cruft that we know it doesn't work (removable would
be another one.

I am definitely not going to insist and I appreciate you are trying to
clean this up. That is highly appreciated of course.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11  9:12         ` Michal Hocko
@ 2020-09-11 10:09           ` David Hildenbrand
  2020-09-11 19:24             ` Dave Hansen
  2020-09-14 11:24             ` Michal Hocko
  0 siblings, 2 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-09-11 10:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

On 11.09.20 11:12, Michal Hocko wrote:
> On Fri 11-09-20 10:09:07, David Hildenbrand wrote:
> [...]
>> Consider two cases:
>>
>> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to
>> online/offline the whole thing. HW can effectively only plug/unplug the
>> whole thing. It makes sense in some (most?) setups to represent one DIMM
>> as one memory block device.
> 
> Yes, for the physical hotplug it doesn't really make much sense to me to
> offline portions that the HW cannot hotremove.

I've seen people offline parts of memory to simulate systems with less
RAM and people offline parts of memory on demand to save energy
(poweroff banks). People won't stop being creative with what we provided
to them :D

> 
>> 2. Hot(un)plugging small memory increments. This is mostly the case in
>> virtualized environments - especially hyper-v balloon, xen balloon,
>> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC,
>> you want at least all (16MB!) memory block devices that can get
>> unplugged again individually ("LMBs") as separate memory blocks. Same on
>> s390x on memory increment size (currently effectively the memory block
>> size).
> 
> Yes I do recognize those usecase even though I will not pretend I
> consider it quesitonable. E.g. any hotplug with a smaller granularity
> than the memory model in Linus allows is just dubious. We simply cannot
> implement that without a lot of wasting and then the question is what is
> the real point.

Having the section size as small as possible in these environments is
most certainly preferable, to clean up metadata where possible.
Otherwise, hot(un)plugging smaller granularity behaves more like memory
ballooning (and I think I don't have to tell you that ballooning is used
excessively even though it wastes memory on metadata ;) ). Anyhow,
that's another discussion.

> 
>> In summary, larger memory block devices mostly only make sense with
>> DIMMs (and for boot memory in some cases). We will still end up with
>> many memory block devices in other configurations.
> 
> And that is fine because the boot time memory is still likely the
> primary source of memory. And reducing memory devices for those is a
> huge improvement already (just think of a multi TB system with
> gazillions pointless memory devices). 

Agreed. On my workstation (64GB - 4x16GB DIMMs if I recall correctly) I
end up with

$ cat /sys/devices/system/memory/block_size_bytes
8000000
$ ls /sys/devices/system/memory/ | grep memory | wc -l
512

$ cat /proc/iomem
00000000-00000fff : Reserved
00001000-0009ffff : System RAM
000a0000-000fffff : Reserved
  000a0000-000bffff : PCI Bus 0000:00
  000c0000-000dffff : PCI Bus 0000:00
    000c0000-000cf1ff : Video ROM
  000f0000-000fffff : System ROM
00100000-09dfffff : System RAM
09e00000-09ffffff : Reserved
0a000000-0a1fffff : System RAM
0a200000-0a20ffff : ACPI Non-volatile Storage
0a210000-b70fe017 : System RAM
b70fe018-b7117c57 : System RAM
b7117c58-b7118017 : System RAM
b7118018-b7129057 : System RAM
b7129058-b826cfff : System RAM
b826d000-b82c3fff : Reserved
b82c4000-b8d52fff : System RAM
b8d53000-b8d53fff : Reserved
b8d54000-bc67cfff : System RAM
bc67d000-bca26fff : Reserved
bca27000-bca73fff : ACPI Tables
bca74000-bd103fff : ACPI Non-volatile Storage
bd104000-bddfefff : Reserved
bddff000-beffffff : System RAM
bf000000-bfffffff : Reserved
[ PCI stuff ]
100000000-103f2fffff : System RAM
  d9f000000-d9fe00d90 : Kernel code
  da0000000-da07f9fff : Kernel rodata
  da0800000-da0a59e3f : Kernel data
  da110c000-da15fffff : Kernel bss
103f300000-10503fffff : Reserved


If we'd want to create a separate device during boot for each "System
RAM" resource, I am having a hard time figuring out the actual devices
(4 DIMMs). For memory hotplug it's a lot easier (e.g., separate
add_memory() calls). Of course, my workstation most probably doesn't
support DIMM hot(un)plug, so the BIOS might do strange things.

Also, I do wonder how hard the BIOS might mess up a DIMM configuration
(e820 map, resulting in "System RAM" resources) after hotplug, when
rebooting - or after kexec.

On bare metal, people expect that DIMMs that where hotplugged can be
hotunplugged again after reboot (of course, taking care of ZONE_MOVABLE,
which is a pain). As discussed under QEMU that's easier, because we get
separate add_memory() calls for all DIMMs from ACPI code. How stuff
behaves on bare metal is still a head-scratcher -  if we can rely on
separate "System RAM" instances to cover separate DIMMs, or if DIMMs
might get merged/split/EFI allocations ...

Maybe we can derive the actual DIMMs from some ACPI tables (SRAT?),
instead of relying on e820/"System RAM resources" - I have no clue.

>> I do agree that a "disable sysfs" option is interesting - even with
>> memory hotplug (we mostly need a way to configure it and a way to notify
>> kexec-tools about memory hot(un)plug events). I am currently (once
>> again) looking into improving auto-onlining support in the kernel.
>>
>> Having that said, I much rather want to see smaller improvements (that
>> can be fine-tuned individually - like allowing variable-sized memory
>> blocks) than doing a switch to "new shiny" and figuring out after a
>> while that we need "new shiny2".
> 
> There is only one certainty. Providing a long term interface with ever
> growing (ab)users is a hard target. And shinyN might be needed in the
> end. Who knows. My main point is that the existing interface is hitting
> a wall on usecases which _do_not_care_ about memory hotplug. And that is
> something we should be looking at.

Agreed. I can see 3 scenarios

a) no memory hotplug support, no sysfs.
b) memory hotplug support, no sysfs
c) memory hotplug support, sysfs

Starting with a) and c) is the easiest way to go.

> 
>> I consider removing "phys_device" as one of these tunables. The question
>> would be how to make such sysfs changes easy to configure
>> ("-phys_device", "+variable_sized_blocks" ...)
> 
> I am with you on that. There are more candidates in memory block
> directories which have dubious value. Deprecation process is a PITA and
> that's why I thought that it would make sense to focus on something that
> we can mis^Wdesign with exising and forming usecases in mind that would
> get rid of all the cruft that we know it doesn't work (removable would
> be another one.

Yeah, "phys_index" is also dubious. Simply providing a memory range
would have been much cleaner. Lesson learned :)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11 10:09           ` David Hildenbrand
@ 2020-09-11 19:24             ` Dave Hansen
  2020-09-11 19:35               ` Luck, Tony
  2020-09-14 11:24             ` Michal Hocko
  1 sibling, 1 reply; 21+ messages in thread
From: Dave Hansen @ 2020-09-11 19:24 UTC (permalink / raw)
  To: David Hildenbrand, Michal Hocko
  Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens,
	linux-mm, linux-api, Dave Hansen, linux-kernel, Luck, Tony

On 9/11/20 3:09 AM, David Hildenbrand wrote:
> Maybe we can derive the actual DIMMs from some ACPI tables (SRAT?),
> instead of relying on e820/"System RAM resources" - I have no clue.

It's actually really hard to map a DIMM to a physical address.
Interleaving can mean that one page actually spans a bunch of DIMMs.
For NVDIMMs, the interleaving is configurable and different namespaces
on the system can have different interleaving properties.

The EDAC drivers do the physical address to DIMM lookups, but they're
quite messy.  There isn't a simple table for it IIRC.  *But* this turns
out not to be a problem for memory hotplug because if you're
interleaving, you can't just remove one DIMM in an interleave set anyway.

Right now, I think we just depend on ACPI to _request_ hot remove in a
size which will allow the hardware to be removed.

Anyway, I just wanted to point out the M:N relationship between pages
and DIMMs.

Maybe we should start with an erring of grievances against the old
interfaces and then start coming up with the requirements for a new one.
 I'll start a list in a Google Doc unless someone has a better idea.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11 19:24             ` Dave Hansen
@ 2020-09-11 19:35               ` Luck, Tony
  2020-09-11 19:56                 ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2020-09-11 19:35 UTC (permalink / raw)
  To: Hansen, Dave, David Hildenbrand, Michal Hocko
  Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens,
	linux-mm, linux-api, Dave Hansen, linux-kernel

> It's actually really hard to map a DIMM to a physical address.
> Interleaving can mean that one page actually spans a bunch of DIMMs.

Heh! If NUMA mode is turned off your single page may have cache lines
from *every* DIMM in the system. Even with NUMA turned on the page
will have cache lines from every DIMM on the socket.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11 19:35               ` Luck, Tony
@ 2020-09-11 19:56                 ` David Hildenbrand
  2020-09-11 20:09                   ` Luck, Tony
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2020-09-11 19:56 UTC (permalink / raw)
  To: Luck, Tony
  Cc: Hansen, Dave, David Hildenbrand, Michal Hocko, Gerald Schaefer,
	akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm,
	linux-api, Dave Hansen, linux-kernel



> Am 11.09.2020 um 21:36 schrieb Luck, Tony <tony.luck@intel.com>:
> 
> 
>> 
>> It's actually really hard to map a DIMM to a physical address.
>> Interleaving can mean that one page actually spans a bunch of DIMMs.
> 
> Heh! If NUMA mode is turned off your single page may have cache lines
> from *every* DIMM in the system. Even with NUMA turned on the page
> will have cache lines from every DIMM on the socket.
> 

Thanks Dave and Tony, that‘s valuable information!

How would it behave after hotplugging a single DIMM  - I assume a single page will only be mapped to that DIMM (otherwise a lot of stuff would habe to be moved around. Would the mapping change after a reboot - especially can a DIMM that could get hotunplugged before suddenly no longer be hotunplugged individually?

> -Tony


^ permalink raw reply	[flat|nested] 21+ messages in thread

* RE: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11 19:56                 ` David Hildenbrand
@ 2020-09-11 20:09                   ` Luck, Tony
  2020-09-11 20:49                     ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Luck, Tony @ 2020-09-11 20:09 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Hansen, Dave, Michal Hocko, Gerald Schaefer, akpm, Greg KH,
	Jan Höppner, Heiko Carstens, linux-mm, linux-api,
	Dave Hansen, linux-kernel

> How would it behave after hotplugging a single DIMM  - I assume a single page will only be mapped to that DIMM (otherwise a lot of stuff would habe to be moved around. Would the mapping change after a reboot - especially can a DIMM that could get hotunplugged before suddenly no longer be hotunplugged individually?


We don't currently have any platforms that would allow hot adding at the DIMM level.
The Brickland generation of E7 Xeon servers (Ivybridge, Haswell, Broadwell) allowed
for hot plugging a riser card that contained up to 12 DIMMs.

If you did add memory it would have to appear at the top of the system physical
address space. No interleave (unless you added more than one DIMM in a single
operation).  After a reboot the system would likely shuffle things around to and
interleave.

-Tony

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11 20:09                   ` Luck, Tony
@ 2020-09-11 20:49                     ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-09-11 20:49 UTC (permalink / raw)
  To: Luck, Tony
  Cc: David Hildenbrand, Hansen, Dave, Michal Hocko, Gerald Schaefer,
	akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm,
	linux-api, Dave Hansen, linux-kernel



> Am 11.09.2020 um 22:09 schrieb Luck, Tony <tony.luck@intel.com>:
> 
> 
>> 
>> How would it behave after hotplugging a single DIMM  - I assume a single page will only be mapped to that DIMM (otherwise a lot of stuff would habe to be moved around. Would the mapping change after a reboot - especially can a DIMM that could get hotunplugged before suddenly no longer be hotunplugged individually?
> 
> 
> We don't currently have any platforms that would allow hot adding at the DIMM level.
> The Brickland generation of E7 Xeon servers (Ivybridge, Haswell, Broadwell) allowed
> for hot plugging a riser card that contained up to 12 DIMMs.
> 
> If you did add memory it would have to appear at the top of the system physical
> address space. No interleave (unless you added more than one DIMM in a single
> operation).  After a reboot the system would likely shuffle things around to and
> interleave.
> 

Thanks a lot - so I‘m really spoiled by hot(un)plug capabilities in virtualized environments :D

> -Tony


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-11 10:09           ` David Hildenbrand
  2020-09-11 19:24             ` Dave Hansen
@ 2020-09-14 11:24             ` Michal Hocko
  2020-09-14 12:14               ` David Hildenbrand
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2020-09-14 11:24 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

On Fri 11-09-20 12:09:52, David Hildenbrand wrote:
> On 11.09.20 11:12, Michal Hocko wrote:
> > On Fri 11-09-20 10:09:07, David Hildenbrand wrote:
> > [...]
> >> Consider two cases:
> >>
> >> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to
> >> online/offline the whole thing. HW can effectively only plug/unplug the
> >> whole thing. It makes sense in some (most?) setups to represent one DIMM
> >> as one memory block device.
> > 
> > Yes, for the physical hotplug it doesn't really make much sense to me to
> > offline portions that the HW cannot hotremove.
> 
> I've seen people offline parts of memory to simulate systems with less
> RAM and people offline parts of memory on demand to save energy
> (poweroff banks). People won't stop being creative with what we provided
> to them :D

Heh, I have seen people shooting their foot for fun. But more seriously,
I do undestand different usecases and we shouldn't cut them off their
toys.

> >> 2. Hot(un)plugging small memory increments. This is mostly the case in
> >> virtualized environments - especially hyper-v balloon, xen balloon,
> >> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC,
> >> you want at least all (16MB!) memory block devices that can get
> >> unplugged again individually ("LMBs") as separate memory blocks. Same on
> >> s390x on memory increment size (currently effectively the memory block
> >> size).
> > 
> > Yes I do recognize those usecase even though I will not pretend I
> > consider it quesitonable. E.g. any hotplug with a smaller granularity
> > than the memory model in Linus allows is just dubious. We simply cannot
> > implement that without a lot of wasting and then the question is what is
> > the real point.
> 
> Having the section size as small as possible in these environments is
> most certainly preferable, to clean up metadata where possible.

There is a certain line that is hard to maintain. I consider a section
to be the smallest granularity that makes sense to support. Current
section sizing makes sense from the VMEMMAP point of view. If there are
strong reasons to allow smaller once then I belive this should be
compile time option.

> Otherwise, hot(un)plugging smaller granularity behaves more like memory
> ballooning (and I think I don't have to tell you that ballooning is used
> excessively even though it wastes memory on metadata ;) ). Anyhow,
> that's another discussion.

Yeah, I am aware of that. And honestly subsection offlining makes very
little sense to me. It was hard to argue against that for nvdimm
usecases where we simply had to workaround the reality where devices
couldn't have been aligned properly. I do not think we want to claim a
support for general hotplug though. 

[...]

> > There is only one certainty. Providing a long term interface with ever
> > growing (ab)users is a hard target. And shinyN might be needed in the
> > end. Who knows. My main point is that the existing interface is hitting
> > a wall on usecases which _do_not_care_ about memory hotplug. And that is
> > something we should be looking at.
> 
> Agreed. I can see 3 scenarios
> 
> a) no memory hotplug support, no sysfs.
> b) memory hotplug support, no sysfs
> c) memory hotplug support, sysfs
> 
> Starting with a) and c) is the easiest way to go.

Yes, the first and the simplest way would be to provide
memory_hotplug=[disabled|v1]

where disabled would be no sysfs interface, v1 would be the existing
infrastructure. I would hope to land with v2 in a future which would
provide a new interface.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-14 11:24             ` Michal Hocko
@ 2020-09-14 12:14               ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-09-14 12:14 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

>> Otherwise, hot(un)plugging smaller granularity behaves more like memory
>> ballooning (and I think I don't have to tell you that ballooning is used
>> excessively even though it wastes memory on metadata ;) ). Anyhow,
>> that's another discussion.
> 
> Yeah, I am aware of that. And honestly subsection offlining makes very
> little sense to me. It was hard to argue against that for nvdimm
> usecases where we simply had to workaround the reality where devices
> couldn't have been aligned properly. I do not think we want to claim a
> support for general hotplug though. 

Totally agree, I also don't want to see actual sub-section
onlining/offlining in the core (e.g., virtio-mem emulates that on top,
but it behaves a lot more like memory ballooning).

> 
> [...]
> 
>>> There is only one certainty. Providing a long term interface with ever
>>> growing (ab)users is a hard target. And shinyN might be needed in the
>>> end. Who knows. My main point is that the existing interface is hitting
>>> a wall on usecases which _do_not_care_ about memory hotplug. And that is
>>> something we should be looking at.
>>
>> Agreed. I can see 3 scenarios
>>
>> a) no memory hotplug support, no sysfs.
>> b) memory hotplug support, no sysfs
>> c) memory hotplug support, sysfs
>>
>> Starting with a) and c) is the easiest way to go.
> 
> Yes, the first and the simplest way would be to provide
> memory_hotplug=[disabled|v1]
> 
> where disabled would be no sysfs interface, v1 would be the existing
> infrastructure. I would hope to land with v2 in a future which would
> provide a new interface.
> 

Agreed.

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand
  2020-09-10 20:00 ` Dave Hansen
  2020-09-10 20:57 ` Dave Hansen
@ 2020-09-22 13:56 ` Gerald Schaefer
  2020-09-25 14:49   ` David Hildenbrand
  2 siblings, 1 reply; 21+ messages in thread
From: Gerald Schaefer @ 2020-09-22 13:56 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens,
	linux-mm, linux-api, Dave Hansen, linux-kernel

On Thu, 10 Sep 2020 12:20:34 +0200
David Hildenbrand <david@redhat.com> wrote:

> Hi everybody,
> 
> I was just exploring how /sys/devices/system/memory/memoryX/phys_device
> is/was used. It's one of these interfaces that most probably never
> should have been added but now we are stuck with it.
> 
> "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3],
> back when they were still part of s390x-tools. They were later replaced
> [5] by the variants in linux-utils. For example, RHEL6 and RHEL7 contain
> lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux
> on s390x [4].
> 
> "phys_device" was added with sysfs support for memory hotplug in commit
> 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions")
> in 2005. It always returned 0.
> 
> s390x started returning something != 0 on some setups (if sclp.rzm is
> set by HW) in 2010 via commit 57b552ba0b2f("memory hotplug/s390: set
> phys_device").
> 
> For s390x, it allowed for identifying which memory block devices belong
> to the same memory increment (RZM). Only if all memory block devices
> comprising a single memory increment were offline, the memory could
> actually be removed in the hypervisor.
> 
> Since commit e5d709bb5fb7 ("s390/memory hotplug: provide
> memory_block_size_bytes() function") in 2013 a memory block devices
> spans at least one memory increment - which is why the interface isn't
> really helpful/used anymore (except by old lsmem/chmem tools).

Correct, so I do not see any problem for s390 with removing / changing
that for the upstream kernel. BTW, that commit also gave some relief
on the scaling issue, at least for s390. With increasing total memory
size, we also have increasing increment and thus memory block size.

Of course, that also has some limitations, IIRC max. 1 GB increment
size, but still better than the 256 MB default size.

> 
> There were once RFC patches to make use of it in ACPI, but it could be
> solved using different interfaces [1].
> 
> 
> While I'd love to rip it out completely, I think it would break old
> lsmem/chmem completely - and I assume that's not acceptable. I was
> wondering what would be considered safe to do now/in the future:
> 
> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on
> s390x). This will make old lsmem/chmem behave differently after
> switching to a new kernel, like if sclp.rzm would not be set by HW -
> AFAIU, it will assume all memory is in a single memory increment. Do we
> care?

No, at least not until that kernel change would be backported to some
old distribution level where we still use lsmem/chmem from s390-tools.
Given that this is just some clean-up w/o any functional benefit, and
hopefully w/o any negative impact, I think we can safely assume that no
distributor will do that "just for fun".

Even if there would be good reasons for backports, then I guess we also
have good reasons for backporting / switching to the util-linux version
of lsmem / chmem for such distribution levels. Alternatively, adjust the
s390-tools lsmem / chmem there.

But I would rather "rip it out completely" than just return 0. You'd
need some lsmem / chmem changes anyway, at least in case this would
ever be backported.

> 2. Restrict it to s390x only. It always returned 0 on other
> architectures, I was not able to find any user.
> 
> I think 2 should be safe to do (never used on other archs). I do wonder
> what the feelings are about 1.

Please don't add any s390-specific workarounds here, that does not
really sound like a clean-up, rather the opposite.

That being said, I do not really see the benefit of this change at
all. As Michal mentioned, there really should be some more fundamental
change. And from the rest of this thread, it also seems that phys_device
usage might not be the biggest issue here.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-22 13:56 ` Gerald Schaefer
@ 2020-09-25 14:49   ` David Hildenbrand
  2020-09-25 15:00     ` Greg KH
  2020-09-25 15:39     ` Michal Hocko
  0 siblings, 2 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-09-25 14:49 UTC (permalink / raw)
  To: Gerald Schaefer
  Cc: Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens,
	linux-mm, linux-api, Dave Hansen, linux-kernel

>> There were once RFC patches to make use of it in ACPI, but it could be
>> solved using different interfaces [1].
>>
>>
>> While I'd love to rip it out completely, I think it would break old
>> lsmem/chmem completely - and I assume that's not acceptable. I was
>> wondering what would be considered safe to do now/in the future:
>>
>> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on
>> s390x). This will make old lsmem/chmem behave differently after
>> switching to a new kernel, like if sclp.rzm would not be set by HW -
>> AFAIU, it will assume all memory is in a single memory increment. Do we
>> care?
> 
> No, at least not until that kernel change would be backported to some
> old distribution level where we still use lsmem/chmem from s390-tools.
> Given that this is just some clean-up w/o any functional benefit, and
> hopefully w/o any negative impact, I think we can safely assume that no
> distributor will do that "just for fun".
> 
> Even if there would be good reasons for backports, then I guess we also
> have good reasons for backporting / switching to the util-linux version
> of lsmem / chmem for such distribution levels. Alternatively, adjust the
> s390-tools lsmem / chmem there.
> 
> But I would rather "rip it out completely" than just return 0. You'd
> need some lsmem / chmem changes anyway, at least in case this would
> ever be backported.

Thanks for your input Gerald.

So unless people would be running shiny new kernels on older
distributions it shouldn't be a problem (and I don't think we care too
much about something like that). I don't expect something like that to
get backported - there is absolutely no reason to do so IMHO.

> 
>> 2. Restrict it to s390x only. It always returned 0 on other
>> architectures, I was not able to find any user.
>>
>> I think 2 should be safe to do (never used on other archs). I do wonder
>> what the feelings are about 1.
> 
> Please don't add any s390-specific workarounds here, that does not
> really sound like a clean-up, rather the opposite.

People seem to have different opinions here. I'm happy as long as we can
get rid of it (either now, or in the future with a new model).

> 
> That being said, I do not really see the benefit of this change at
> all. As Michal mentioned, there really should be some more fundamental
> change. And from the rest of this thread, it also seems that phys_device
> usage might not be the biggest issue here.
> 

As I already expressed, I am more of a friend of small, incremental
changes than having a single big world switch where everything will be
shiny and perfect.

(Deprecating it now - in any way - stops any new users from appearing -
both, in the kernel and from user space - eventually making the big
world switch later a little easier because there is one thing less that
vanished)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-25 14:49   ` David Hildenbrand
@ 2020-09-25 15:00     ` Greg KH
  2020-09-25 15:05       ` David Hildenbrand
  2020-09-25 15:39     ` Michal Hocko
  1 sibling, 1 reply; 21+ messages in thread
From: Greg KH @ 2020-09-25 15:00 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gerald Schaefer, Michal Hocko, akpm, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

On Fri, Sep 25, 2020 at 04:49:28PM +0200, David Hildenbrand wrote:
> >> There were once RFC patches to make use of it in ACPI, but it could be
> >> solved using different interfaces [1].
> >>
> >>
> >> While I'd love to rip it out completely, I think it would break old
> >> lsmem/chmem completely - and I assume that's not acceptable. I was
> >> wondering what would be considered safe to do now/in the future:
> >>
> >> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on
> >> s390x). This will make old lsmem/chmem behave differently after
> >> switching to a new kernel, like if sclp.rzm would not be set by HW -
> >> AFAIU, it will assume all memory is in a single memory increment. Do we
> >> care?
> > 
> > No, at least not until that kernel change would be backported to some
> > old distribution level where we still use lsmem/chmem from s390-tools.
> > Given that this is just some clean-up w/o any functional benefit, and
> > hopefully w/o any negative impact, I think we can safely assume that no
> > distributor will do that "just for fun".
> > 
> > Even if there would be good reasons for backports, then I guess we also
> > have good reasons for backporting / switching to the util-linux version
> > of lsmem / chmem for such distribution levels. Alternatively, adjust the
> > s390-tools lsmem / chmem there.
> > 
> > But I would rather "rip it out completely" than just return 0. You'd
> > need some lsmem / chmem changes anyway, at least in case this would
> > ever be backported.
> 
> Thanks for your input Gerald.
> 
> So unless people would be running shiny new kernels on older
> distributions it shouldn't be a problem (and I don't think we care too
> much about something like that). I don't expect something like that to
> get backported - there is absolutely no reason to do so IMHO.

We do care about this, Andrew used to have an old Fedora 9 box or
something like that, that he tourtured many of us with bug reports when
we broke it :)

So watch out, people keep old userspace around for much longer than you
can possibly imagine because they don't like having their use-cases in
userspace change, and we have made the guarantee to them that they _CAN_
trust us to not break things in userspace.

It's a slow age-out, but watch out, you might have to revert things...

good luck!

greg k-h

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-25 15:00     ` Greg KH
@ 2020-09-25 15:05       ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-09-25 15:05 UTC (permalink / raw)
  To: Greg KH
  Cc: Gerald Schaefer, Michal Hocko, akpm, Jan Höppner,
	Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel

> 
> It's a slow age-out, but watch out, you might have to revert things...
> 
> good luck!

Yeah, I always liked playing with fire ;)

Thanks for the insights Greg!

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-25 14:49   ` David Hildenbrand
  2020-09-25 15:00     ` Greg KH
@ 2020-09-25 15:39     ` Michal Hocko
  2020-09-25 15:47       ` David Hildenbrand
  1 sibling, 1 reply; 21+ messages in thread
From: Michal Hocko @ 2020-09-25 15:39 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens,
	linux-mm, linux-api, Dave Hansen, linux-kernel

On Fri 25-09-20 16:49:28, David Hildenbrand wrote:
> >> There were once RFC patches to make use of it in ACPI, but it could be
> >> solved using different interfaces [1].
> >>
> >>
> >> While I'd love to rip it out completely, I think it would break old
> >> lsmem/chmem completely - and I assume that's not acceptable. I was
> >> wondering what would be considered safe to do now/in the future:
> >>
> >> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on
> >> s390x). This will make old lsmem/chmem behave differently after
> >> switching to a new kernel, like if sclp.rzm would not be set by HW -
> >> AFAIU, it will assume all memory is in a single memory increment. Do we
> >> care?
> > 
> > No, at least not until that kernel change would be backported to some
> > old distribution level where we still use lsmem/chmem from s390-tools.
> > Given that this is just some clean-up w/o any functional benefit, and
> > hopefully w/o any negative impact, I think we can safely assume that no
> > distributor will do that "just for fun".
> > 
> > Even if there would be good reasons for backports, then I guess we also
> > have good reasons for backporting / switching to the util-linux version
> > of lsmem / chmem for such distribution levels. Alternatively, adjust the
> > s390-tools lsmem / chmem there.
> > 
> > But I would rather "rip it out completely" than just return 0. You'd
> > need some lsmem / chmem changes anyway, at least in case this would
> > ever be backported.
> 
> Thanks for your input Gerald.
> 
> So unless people would be running shiny new kernels on older
> distributions it shouldn't be a problem (and I don't think we care too
> much about something like that). I don't expect something like that to
> get backported - there is absolutely no reason to do so IMHO.

Ohh, there are many people running current Linus tree on an older
distribution. Including me.

> >> 2. Restrict it to s390x only. It always returned 0 on other
> >> architectures, I was not able to find any user.
> >>
> >> I think 2 should be safe to do (never used on other archs). I do wonder
> >> what the feelings are about 1.
> > 
> > Please don't add any s390-specific workarounds here, that does not
> > really sound like a clean-up, rather the opposite.
> 
> People seem to have different opinions here. I'm happy as long as we can
> get rid of it (either now, or in the future with a new model).
> 
> > 
> > That being said, I do not really see the benefit of this change at
> > all. As Michal mentioned, there really should be some more fundamental
> > change. And from the rest of this thread, it also seems that phys_device
> > usage might not be the biggest issue here.
> > 
> 
> As I already expressed, I am more of a friend of small, incremental
> changes than having a single big world switch where everything will be
> shiny and perfect.
> 
> (Deprecating it now - in any way - stops any new users from appearing -
> both, in the kernel and from user space - eventually making the big
> world switch later a little easier because there is one thing less that
> vanished)

Realistically people do not care about deprecation all that much. They
simply use whatever they can find or somebody will show them. Really,
deprecation has never really worked. The only thing that worked was to
remove the functionality and then wait for somebody to complain and
revert or somehow allow the functionality without necessity to alter the
userspace.

As much as I would like to remove as much crud as possible I strongly
suspect that the existing hotplug interface is just a lost case and it
doesn't make for the best used time to put a lip stick on a pig. Even if
we remove this particular interface we are not going to get rid of a lot
of code or we won't gain any more sensible semantic, right?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ?
  2020-09-25 15:39     ` Michal Hocko
@ 2020-09-25 15:47       ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2020-09-25 15:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens,
	linux-mm, linux-api, Dave Hansen, linux-kernel

>>>> 2. Restrict it to s390x only. It always returned 0 on other
>>>> architectures, I was not able to find any user.
>>>>
>>>> I think 2 should be safe to do (never used on other archs). I do wonder
>>>> what the feelings are about 1.
>>>
>>> Please don't add any s390-specific workarounds here, that does not
>>> really sound like a clean-up, rather the opposite.
>>
>> People seem to have different opinions here. I'm happy as long as we can
>> get rid of it (either now, or in the future with a new model).
>>
>>>
>>> That being said, I do not really see the benefit of this change at
>>> all. As Michal mentioned, there really should be some more fundamental
>>> change. And from the rest of this thread, it also seems that phys_device
>>> usage might not be the biggest issue here.
>>>
>>
>> As I already expressed, I am more of a friend of small, incremental
>> changes than having a single big world switch where everything will be
>> shiny and perfect.
>>
>> (Deprecating it now - in any way - stops any new users from appearing -
>> both, in the kernel and from user space - eventually making the big
>> world switch later a little easier because there is one thing less that
>> vanished)
>
> Realistically people do not care about deprecation all that much. They
> simply use whatever they can find or somebody will show them. Really,
> deprecation has never really worked. The only thing that worked was to
> remove the functionality and then wait for somebody to complain and
> revert or somehow allow the functionality without necessity to alter the
> userspace.

Mainframe people are usually ... more conservative (well, they focus on
stability and pay a lot of money for that - including HW). :)

What they would lose here is s390x lsmem/chmem functionality, used to
manage standby memory (under LPAR and z/VM, if enabled) - with the old
tools. I have the feeling that this would be acceptable (I never had
access to an LPAR that allowed for it ...), but yeah, you never now.

> 
> As much as I would like to remove as much crud as possible I strongly
> suspect that the existing hotplug interface is just a lost case and it
> doesn't make for the best used time to put a lip stick on a pig. Even if
> we remove this particular interface we are not going to get rid of a lot
> of code or we won't gain any more sensible semantic, right?
> 

Excluding some documentation

 drivers/base/memory.c        | 29 -----------------------------
 drivers/s390/char/sclp_cmd.c |  7 -------
 include/linux/memory.h       |  2 --
 3 files changed, 38 deletions(-)

Seems like this is the only way to deprecate. (I mean I can add comments
in the code, but as you say, doesn't stop new user space users from
showing up)

-- 
Thanks,

David / dhildenb


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2020-09-25 15:47 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand
2020-09-10 20:00 ` Dave Hansen
2020-09-10 20:31   ` David Hildenbrand
2020-09-11  7:20     ` Michal Hocko
2020-09-11  8:09       ` David Hildenbrand
2020-09-11  9:12         ` Michal Hocko
2020-09-11 10:09           ` David Hildenbrand
2020-09-11 19:24             ` Dave Hansen
2020-09-11 19:35               ` Luck, Tony
2020-09-11 19:56                 ` David Hildenbrand
2020-09-11 20:09                   ` Luck, Tony
2020-09-11 20:49                     ` David Hildenbrand
2020-09-14 11:24             ` Michal Hocko
2020-09-14 12:14               ` David Hildenbrand
2020-09-10 20:57 ` Dave Hansen
2020-09-22 13:56 ` Gerald Schaefer
2020-09-25 14:49   ` David Hildenbrand
2020-09-25 15:00     ` Greg KH
2020-09-25 15:05       ` David Hildenbrand
2020-09-25 15:39     ` Michal Hocko
2020-09-25 15:47       ` David Hildenbrand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.