* Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? @ 2020-09-10 10:20 David Hildenbrand 2020-09-10 20:00 ` Dave Hansen ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: David Hildenbrand @ 2020-09-10 10:20 UTC (permalink / raw) To: Gerald Schaefer, Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel Hi everybody, I was just exploring how /sys/devices/system/memory/memoryX/phys_device is/was used. It's one of these interfaces that most probably never should have been added but now we are stuck with it. "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3], back when they were still part of s390x-tools. They were later replaced [5] by the variants in linux-utils. For example, RHEL6 and RHEL7 contain lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux on s390x [4]. "phys_device" was added with sysfs support for memory hotplug in commit 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") in 2005. It always returned 0. s390x started returning something != 0 on some setups (if sclp.rzm is set by HW) in 2010 via commit 57b552ba0b2f("memory hotplug/s390: set phys_device"). For s390x, it allowed for identifying which memory block devices belong to the same memory increment (RZM). Only if all memory block devices comprising a single memory increment were offline, the memory could actually be removed in the hypervisor. Since commit e5d709bb5fb7 ("s390/memory hotplug: provide memory_block_size_bytes() function") in 2013 a memory block devices spans at least one memory increment - which is why the interface isn't really helpful/used anymore (except by old lsmem/chmem tools). There were once RFC patches to make use of it in ACPI, but it could be solved using different interfaces [1]. While I'd love to rip it out completely, I think it would break old lsmem/chmem completely - and I assume that's not acceptable. I was wondering what would be considered safe to do now/in the future: 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on s390x). This will make old lsmem/chmem behave differently after switching to a new kernel, like if sclp.rzm would not be set by HW - AFAIU, it will assume all memory is in a single memory increment. Do we care? 2. Restrict it to s390x only. It always returned 0 on other architectures, I was not able to find any user. I think 2 should be safe to do (never used on other archs). I do wonder what the feelings are about 1. Thoughts? [1] https://patchwork.kernel.org/patch/2163871/ [2] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/lsmem [3] https://github.com/ibm-s390-tools/s390-tools/blob/v2.1.0/zconf/chmem [4] https://bugzilla.redhat.com/show_bug.cgi?id=1504134 [5] https://github.com/ibm-s390-tools/s390-tools/commit/778292e771fb00cfcbd7ff6535ee3d9fde612dc5#diff-82c32a7f4c597c50db90157ed0c581b3 -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand @ 2020-09-10 20:00 ` Dave Hansen 2020-09-10 20:31 ` David Hildenbrand 2020-09-10 20:57 ` Dave Hansen 2020-09-22 13:56 ` Gerald Schaefer 2 siblings, 1 reply; 21+ messages in thread From: Dave Hansen @ 2020-09-10 20:00 UTC (permalink / raw) To: David Hildenbrand, Gerald Schaefer, Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On 9/10/20 3:20 AM, David Hildenbrand wrote: > I was just exploring how /sys/devices/system/memory/memoryX/phys_device > is/was used. It's one of these interfaces that most probably never > should have been added but now we are stuck with it. While I'm all for cleanups, what specific problems is phys_device causing? Are you hoping that we can just remove users of memoryX/* until there are no more left, and this is the easiest place to start? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-10 20:00 ` Dave Hansen @ 2020-09-10 20:31 ` David Hildenbrand 2020-09-11 7:20 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: David Hildenbrand @ 2020-09-10 20:31 UTC (permalink / raw) To: Dave Hansen Cc: David Hildenbrand, Gerald Schaefer, Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel > Am 10.09.2020 um 22:01 schrieb Dave Hansen <dave.hansen@intel.com>: > > On 9/10/20 3:20 AM, David Hildenbrand wrote: >> I was just exploring how /sys/devices/system/memory/memoryX/phys_device >> is/was used. It's one of these interfaces that most probably never >> should have been added but now we are stuck with it. > > While I'm all for cleanups, what specific problems is phys_device causing? > Mostly stumbling over it, understanding that it is basically unused with new userspace for good reason, questioning its existence. E.g., I am working on virtio-mem support for s390x. Displaying misleading/wrong phys_device indications isn‘t particularly helpful - especially once there are different ways to hotplug memory for an architecture. > Are you hoping that we can just remove users of memoryX/* until there > are no more left, and this is the easiest place to start? At least reducing it to a minimum with clear semantics. Even with automatic onlining there are still reasons why we need to keep the interface for now (e.g., reloading kexec to update the kdump headers on memory hot(un)plug). But also standby memory handling on s399x requires it (->manual onlining). ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-10 20:31 ` David Hildenbrand @ 2020-09-11 7:20 ` Michal Hocko 2020-09-11 8:09 ` David Hildenbrand 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2020-09-11 7:20 UTC (permalink / raw) To: David Hildenbrand Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On Thu 10-09-20 22:31:09, David Hildenbrand wrote: > > > > Am 10.09.2020 um 22:01 schrieb Dave Hansen <dave.hansen@intel.com>: > > > > On 9/10/20 3:20 AM, David Hildenbrand wrote: > >> I was just exploring how /sys/devices/system/memory/memoryX/phys_device > >> is/was used. It's one of these interfaces that most probably never > >> should have been added but now we are stuck with it. > > > > While I'm all for cleanups, what specific problems is phys_device causing? > > > > Mostly stumbling over it, understanding that it is basically unused > with new userspace for good reason, questioning its existence. > > E.g., I am working on virtio-mem support for s390x. Displaying > misleading/wrong phys_device indications isn‘t particularly helpful > - especially once there are different ways to hotplug memory for an > architecture. > > > Are you hoping that we can just remove users of memoryX/* until there > > are no more left, and this is the easiest place to start? > > At least reducing it to a minimum with clear semantics. Even with > automatic onlining there are still reasons why we need to keep the > interface for now (e.g., reloading kexec to update the kdump headers > on memory hot(un)plug). But also standby memory handling on s399x > requires it (->manual onlining). While I agree that the existing interface is far from ideal, I am not sure it makes much sense to invest energy into cleaning it up. We can have a pig with a lipstick but but this will not solve the underlying problem that we have I believe. The interface doesn't scale with the block count (especially on some platforms like ppc), it is too inflexible (single size of the block) and many others. I believe we need a completely new interface which would effectively deprecate the existing one. One could still chose to use the old interface but new usecases would use the new one ideally. I have brought that up earlier already without much follow up (http://lkml.kernel.org/r/20200619120704.GD12177@dhcp22.suse.cz) -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 7:20 ` Michal Hocko @ 2020-09-11 8:09 ` David Hildenbrand 2020-09-11 9:12 ` Michal Hocko 0 siblings, 1 reply; 21+ messages in thread From: David Hildenbrand @ 2020-09-11 8:09 UTC (permalink / raw) To: Michal Hocko Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On 11.09.20 09:20, Michal Hocko wrote: > On Thu 10-09-20 22:31:09, David Hildenbrand wrote: >> >> >>> Am 10.09.2020 um 22:01 schrieb Dave Hansen <dave.hansen@intel.com>: >>> >>> On 9/10/20 3:20 AM, David Hildenbrand wrote: >>>> I was just exploring how /sys/devices/system/memory/memoryX/phys_device >>>> is/was used. It's one of these interfaces that most probably never >>>> should have been added but now we are stuck with it. >>> >>> While I'm all for cleanups, what specific problems is phys_device causing? >>> >> >> Mostly stumbling over it, understanding that it is basically unused >> with new userspace for good reason, questioning its existence. >> >> E.g., I am working on virtio-mem support for s390x. Displaying >> misleading/wrong phys_device indications isn‘t particularly helpful >> - especially once there are different ways to hotplug memory for an >> architecture. >> >>> Are you hoping that we can just remove users of memoryX/* until there >>> are no more left, and this is the easiest place to start? >> >> At least reducing it to a minimum with clear semantics. Even with >> automatic onlining there are still reasons why we need to keep the >> interface for now (e.g., reloading kexec to update the kdump headers >> on memory hot(un)plug). But also standby memory handling on s399x >> requires it (->manual onlining). > > While I agree that the existing interface is far from ideal, I am not > sure it makes much sense to invest energy into cleaning it up. We can > have a pig with a lipstick but but this will not solve the underlying > problem that we have I believe. The interface doesn't scale with the > block count (especially on some platforms like ppc), it is too > inflexible (single size of the block) and many others. I believe we need > a completely new interface which would effectively deprecate the > existing one. One could still chose to use the old interface but new > usecases would use the new one ideally. Even with a new interface (that does allow for variable-sized block sizes), we will still end up with many memory block devices. It's not the one thing that solves all our problems. Consider two cases: 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to online/offline the whole thing. HW can effectively only plug/unplug the whole thing. It makes sense in some (most?) setups to represent one DIMM as one memory block device. 2. Hot(un)plugging small memory increments. This is mostly the case in virtualized environments - especially hyper-v balloon, xen balloon, virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, you want at least all (16MB!) memory block devices that can get unplugged again individually ("LMBs") as separate memory blocks. Same on s390x on memory increment size (currently effectively the memory block size). In summary, larger memory block devices mostly only make sense with DIMMs (and for boot memory in some cases). We will still end up with many memory block devices in other configurations. I do agree that a "disable sysfs" option is interesting - even with memory hotplug (we mostly need a way to configure it and a way to notify kexec-tools about memory hot(un)plug events). I am currently (once again) looking into improving auto-onlining support in the kernel. Having that said, I much rather want to see smaller improvements (that can be fine-tuned individually - like allowing variable-sized memory blocks) than doing a switch to "new shiny" and figuring out after a while that we need "new shiny2". I consider removing "phys_device" as one of these tunables. The question would be how to make such sysfs changes easy to configure ("-phys_device", "+variable_sized_blocks" ...) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 8:09 ` David Hildenbrand @ 2020-09-11 9:12 ` Michal Hocko 2020-09-11 10:09 ` David Hildenbrand 0 siblings, 1 reply; 21+ messages in thread From: Michal Hocko @ 2020-09-11 9:12 UTC (permalink / raw) To: David Hildenbrand Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On Fri 11-09-20 10:09:07, David Hildenbrand wrote: [...] > Consider two cases: > > 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to > online/offline the whole thing. HW can effectively only plug/unplug the > whole thing. It makes sense in some (most?) setups to represent one DIMM > as one memory block device. Yes, for the physical hotplug it doesn't really make much sense to me to offline portions that the HW cannot hotremove. > 2. Hot(un)plugging small memory increments. This is mostly the case in > virtualized environments - especially hyper-v balloon, xen balloon, > virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, > you want at least all (16MB!) memory block devices that can get > unplugged again individually ("LMBs") as separate memory blocks. Same on > s390x on memory increment size (currently effectively the memory block > size). Yes I do recognize those usecase even though I will not pretend I consider it quesitonable. E.g. any hotplug with a smaller granularity than the memory model in Linus allows is just dubious. We simply cannot implement that without a lot of wasting and then the question is what is the real point. > In summary, larger memory block devices mostly only make sense with > DIMMs (and for boot memory in some cases). We will still end up with > many memory block devices in other configurations. And that is fine because the boot time memory is still likely the primary source of memory. And reducing memory devices for those is a huge improvement already (just think of a multi TB system with gazillions pointless memory devices). > I do agree that a "disable sysfs" option is interesting - even with > memory hotplug (we mostly need a way to configure it and a way to notify > kexec-tools about memory hot(un)plug events). I am currently (once > again) looking into improving auto-onlining support in the kernel. > > Having that said, I much rather want to see smaller improvements (that > can be fine-tuned individually - like allowing variable-sized memory > blocks) than doing a switch to "new shiny" and figuring out after a > while that we need "new shiny2". There is only one certainty. Providing a long term interface with ever growing (ab)users is a hard target. And shinyN might be needed in the end. Who knows. My main point is that the existing interface is hitting a wall on usecases which _do_not_care_ about memory hotplug. And that is something we should be looking at. > I consider removing "phys_device" as one of these tunables. The question > would be how to make such sysfs changes easy to configure > ("-phys_device", "+variable_sized_blocks" ...) I am with you on that. There are more candidates in memory block directories which have dubious value. Deprecation process is a PITA and that's why I thought that it would make sense to focus on something that we can mis^Wdesign with exising and forming usecases in mind that would get rid of all the cruft that we know it doesn't work (removable would be another one. I am definitely not going to insist and I appreciate you are trying to clean this up. That is highly appreciated of course. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 9:12 ` Michal Hocko @ 2020-09-11 10:09 ` David Hildenbrand 2020-09-11 19:24 ` Dave Hansen 2020-09-14 11:24 ` Michal Hocko 0 siblings, 2 replies; 21+ messages in thread From: David Hildenbrand @ 2020-09-11 10:09 UTC (permalink / raw) To: Michal Hocko Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On 11.09.20 11:12, Michal Hocko wrote: > On Fri 11-09-20 10:09:07, David Hildenbrand wrote: > [...] >> Consider two cases: >> >> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to >> online/offline the whole thing. HW can effectively only plug/unplug the >> whole thing. It makes sense in some (most?) setups to represent one DIMM >> as one memory block device. > > Yes, for the physical hotplug it doesn't really make much sense to me to > offline portions that the HW cannot hotremove. I've seen people offline parts of memory to simulate systems with less RAM and people offline parts of memory on demand to save energy (poweroff banks). People won't stop being creative with what we provided to them :D > >> 2. Hot(un)plugging small memory increments. This is mostly the case in >> virtualized environments - especially hyper-v balloon, xen balloon, >> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, >> you want at least all (16MB!) memory block devices that can get >> unplugged again individually ("LMBs") as separate memory blocks. Same on >> s390x on memory increment size (currently effectively the memory block >> size). > > Yes I do recognize those usecase even though I will not pretend I > consider it quesitonable. E.g. any hotplug with a smaller granularity > than the memory model in Linus allows is just dubious. We simply cannot > implement that without a lot of wasting and then the question is what is > the real point. Having the section size as small as possible in these environments is most certainly preferable, to clean up metadata where possible. Otherwise, hot(un)plugging smaller granularity behaves more like memory ballooning (and I think I don't have to tell you that ballooning is used excessively even though it wastes memory on metadata ;) ). Anyhow, that's another discussion. > >> In summary, larger memory block devices mostly only make sense with >> DIMMs (and for boot memory in some cases). We will still end up with >> many memory block devices in other configurations. > > And that is fine because the boot time memory is still likely the > primary source of memory. And reducing memory devices for those is a > huge improvement already (just think of a multi TB system with > gazillions pointless memory devices). Agreed. On my workstation (64GB - 4x16GB DIMMs if I recall correctly) I end up with $ cat /sys/devices/system/memory/block_size_bytes 8000000 $ ls /sys/devices/system/memory/ | grep memory | wc -l 512 $ cat /proc/iomem 00000000-00000fff : Reserved 00001000-0009ffff : System RAM 000a0000-000fffff : Reserved 000a0000-000bffff : PCI Bus 0000:00 000c0000-000dffff : PCI Bus 0000:00 000c0000-000cf1ff : Video ROM 000f0000-000fffff : System ROM 00100000-09dfffff : System RAM 09e00000-09ffffff : Reserved 0a000000-0a1fffff : System RAM 0a200000-0a20ffff : ACPI Non-volatile Storage 0a210000-b70fe017 : System RAM b70fe018-b7117c57 : System RAM b7117c58-b7118017 : System RAM b7118018-b7129057 : System RAM b7129058-b826cfff : System RAM b826d000-b82c3fff : Reserved b82c4000-b8d52fff : System RAM b8d53000-b8d53fff : Reserved b8d54000-bc67cfff : System RAM bc67d000-bca26fff : Reserved bca27000-bca73fff : ACPI Tables bca74000-bd103fff : ACPI Non-volatile Storage bd104000-bddfefff : Reserved bddff000-beffffff : System RAM bf000000-bfffffff : Reserved [ PCI stuff ] 100000000-103f2fffff : System RAM d9f000000-d9fe00d90 : Kernel code da0000000-da07f9fff : Kernel rodata da0800000-da0a59e3f : Kernel data da110c000-da15fffff : Kernel bss 103f300000-10503fffff : Reserved If we'd want to create a separate device during boot for each "System RAM" resource, I am having a hard time figuring out the actual devices (4 DIMMs). For memory hotplug it's a lot easier (e.g., separate add_memory() calls). Of course, my workstation most probably doesn't support DIMM hot(un)plug, so the BIOS might do strange things. Also, I do wonder how hard the BIOS might mess up a DIMM configuration (e820 map, resulting in "System RAM" resources) after hotplug, when rebooting - or after kexec. On bare metal, people expect that DIMMs that where hotplugged can be hotunplugged again after reboot (of course, taking care of ZONE_MOVABLE, which is a pain). As discussed under QEMU that's easier, because we get separate add_memory() calls for all DIMMs from ACPI code. How stuff behaves on bare metal is still a head-scratcher - if we can rely on separate "System RAM" instances to cover separate DIMMs, or if DIMMs might get merged/split/EFI allocations ... Maybe we can derive the actual DIMMs from some ACPI tables (SRAT?), instead of relying on e820/"System RAM resources" - I have no clue. >> I do agree that a "disable sysfs" option is interesting - even with >> memory hotplug (we mostly need a way to configure it and a way to notify >> kexec-tools about memory hot(un)plug events). I am currently (once >> again) looking into improving auto-onlining support in the kernel. >> >> Having that said, I much rather want to see smaller improvements (that >> can be fine-tuned individually - like allowing variable-sized memory >> blocks) than doing a switch to "new shiny" and figuring out after a >> while that we need "new shiny2". > > There is only one certainty. Providing a long term interface with ever > growing (ab)users is a hard target. And shinyN might be needed in the > end. Who knows. My main point is that the existing interface is hitting > a wall on usecases which _do_not_care_ about memory hotplug. And that is > something we should be looking at. Agreed. I can see 3 scenarios a) no memory hotplug support, no sysfs. b) memory hotplug support, no sysfs c) memory hotplug support, sysfs Starting with a) and c) is the easiest way to go. > >> I consider removing "phys_device" as one of these tunables. The question >> would be how to make such sysfs changes easy to configure >> ("-phys_device", "+variable_sized_blocks" ...) > > I am with you on that. There are more candidates in memory block > directories which have dubious value. Deprecation process is a PITA and > that's why I thought that it would make sense to focus on something that > we can mis^Wdesign with exising and forming usecases in mind that would > get rid of all the cruft that we know it doesn't work (removable would > be another one. Yeah, "phys_index" is also dubious. Simply providing a memory range would have been much cleaner. Lesson learned :) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 10:09 ` David Hildenbrand @ 2020-09-11 19:24 ` Dave Hansen 2020-09-11 19:35 ` Luck, Tony 2020-09-14 11:24 ` Michal Hocko 1 sibling, 1 reply; 21+ messages in thread From: Dave Hansen @ 2020-09-11 19:24 UTC (permalink / raw) To: David Hildenbrand, Michal Hocko Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel, Luck, Tony On 9/11/20 3:09 AM, David Hildenbrand wrote: > Maybe we can derive the actual DIMMs from some ACPI tables (SRAT?), > instead of relying on e820/"System RAM resources" - I have no clue. It's actually really hard to map a DIMM to a physical address. Interleaving can mean that one page actually spans a bunch of DIMMs. For NVDIMMs, the interleaving is configurable and different namespaces on the system can have different interleaving properties. The EDAC drivers do the physical address to DIMM lookups, but they're quite messy. There isn't a simple table for it IIRC. *But* this turns out not to be a problem for memory hotplug because if you're interleaving, you can't just remove one DIMM in an interleave set anyway. Right now, I think we just depend on ACPI to _request_ hot remove in a size which will allow the hardware to be removed. Anyway, I just wanted to point out the M:N relationship between pages and DIMMs. Maybe we should start with an erring of grievances against the old interfaces and then start coming up with the requirements for a new one. I'll start a list in a Google Doc unless someone has a better idea. ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 19:24 ` Dave Hansen @ 2020-09-11 19:35 ` Luck, Tony 2020-09-11 19:56 ` David Hildenbrand 0 siblings, 1 reply; 21+ messages in thread From: Luck, Tony @ 2020-09-11 19:35 UTC (permalink / raw) To: Hansen, Dave, David Hildenbrand, Michal Hocko Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel > It's actually really hard to map a DIMM to a physical address. > Interleaving can mean that one page actually spans a bunch of DIMMs. Heh! If NUMA mode is turned off your single page may have cache lines from *every* DIMM in the system. Even with NUMA turned on the page will have cache lines from every DIMM on the socket. -Tony ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 19:35 ` Luck, Tony @ 2020-09-11 19:56 ` David Hildenbrand 2020-09-11 20:09 ` Luck, Tony 0 siblings, 1 reply; 21+ messages in thread From: David Hildenbrand @ 2020-09-11 19:56 UTC (permalink / raw) To: Luck, Tony Cc: Hansen, Dave, David Hildenbrand, Michal Hocko, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel > Am 11.09.2020 um 21:36 schrieb Luck, Tony <tony.luck@intel.com>: > > >> >> It's actually really hard to map a DIMM to a physical address. >> Interleaving can mean that one page actually spans a bunch of DIMMs. > > Heh! If NUMA mode is turned off your single page may have cache lines > from *every* DIMM in the system. Even with NUMA turned on the page > will have cache lines from every DIMM on the socket. > Thanks Dave and Tony, that‘s valuable information! How would it behave after hotplugging a single DIMM - I assume a single page will only be mapped to that DIMM (otherwise a lot of stuff would habe to be moved around. Would the mapping change after a reboot - especially can a DIMM that could get hotunplugged before suddenly no longer be hotunplugged individually? > -Tony ^ permalink raw reply [flat|nested] 21+ messages in thread
* RE: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 19:56 ` David Hildenbrand @ 2020-09-11 20:09 ` Luck, Tony 2020-09-11 20:49 ` David Hildenbrand 0 siblings, 1 reply; 21+ messages in thread From: Luck, Tony @ 2020-09-11 20:09 UTC (permalink / raw) To: David Hildenbrand Cc: Hansen, Dave, Michal Hocko, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel > How would it behave after hotplugging a single DIMM - I assume a single page will only be mapped to that DIMM (otherwise a lot of stuff would habe to be moved around. Would the mapping change after a reboot - especially can a DIMM that could get hotunplugged before suddenly no longer be hotunplugged individually? We don't currently have any platforms that would allow hot adding at the DIMM level. The Brickland generation of E7 Xeon servers (Ivybridge, Haswell, Broadwell) allowed for hot plugging a riser card that contained up to 12 DIMMs. If you did add memory it would have to appear at the top of the system physical address space. No interleave (unless you added more than one DIMM in a single operation). After a reboot the system would likely shuffle things around to and interleave. -Tony ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 20:09 ` Luck, Tony @ 2020-09-11 20:49 ` David Hildenbrand 0 siblings, 0 replies; 21+ messages in thread From: David Hildenbrand @ 2020-09-11 20:49 UTC (permalink / raw) To: Luck, Tony Cc: David Hildenbrand, Hansen, Dave, Michal Hocko, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel > Am 11.09.2020 um 22:09 schrieb Luck, Tony <tony.luck@intel.com>: > > >> >> How would it behave after hotplugging a single DIMM - I assume a single page will only be mapped to that DIMM (otherwise a lot of stuff would habe to be moved around. Would the mapping change after a reboot - especially can a DIMM that could get hotunplugged before suddenly no longer be hotunplugged individually? > > > We don't currently have any platforms that would allow hot adding at the DIMM level. > The Brickland generation of E7 Xeon servers (Ivybridge, Haswell, Broadwell) allowed > for hot plugging a riser card that contained up to 12 DIMMs. > > If you did add memory it would have to appear at the top of the system physical > address space. No interleave (unless you added more than one DIMM in a single > operation). After a reboot the system would likely shuffle things around to and > interleave. > Thanks a lot - so I‘m really spoiled by hot(un)plug capabilities in virtualized environments :D > -Tony ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-11 10:09 ` David Hildenbrand 2020-09-11 19:24 ` Dave Hansen @ 2020-09-14 11:24 ` Michal Hocko 2020-09-14 12:14 ` David Hildenbrand 1 sibling, 1 reply; 21+ messages in thread From: Michal Hocko @ 2020-09-14 11:24 UTC (permalink / raw) To: David Hildenbrand Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On Fri 11-09-20 12:09:52, David Hildenbrand wrote: > On 11.09.20 11:12, Michal Hocko wrote: > > On Fri 11-09-20 10:09:07, David Hildenbrand wrote: > > [...] > >> Consider two cases: > >> > >> 1. Hot(un)plugging huge DIMMs: many (not all!) use cases want to > >> online/offline the whole thing. HW can effectively only plug/unplug the > >> whole thing. It makes sense in some (most?) setups to represent one DIMM > >> as one memory block device. > > > > Yes, for the physical hotplug it doesn't really make much sense to me to > > offline portions that the HW cannot hotremove. > > I've seen people offline parts of memory to simulate systems with less > RAM and people offline parts of memory on demand to save energy > (poweroff banks). People won't stop being creative with what we provided > to them :D Heh, I have seen people shooting their foot for fun. But more seriously, I do undestand different usecases and we shouldn't cut them off their toys. > >> 2. Hot(un)plugging small memory increments. This is mostly the case in > >> virtualized environments - especially hyper-v balloon, xen balloon, > >> virtio-mem and (drumroll) ppc dlpar and s390x standby memory. On PPC, > >> you want at least all (16MB!) memory block devices that can get > >> unplugged again individually ("LMBs") as separate memory blocks. Same on > >> s390x on memory increment size (currently effectively the memory block > >> size). > > > > Yes I do recognize those usecase even though I will not pretend I > > consider it quesitonable. E.g. any hotplug with a smaller granularity > > than the memory model in Linus allows is just dubious. We simply cannot > > implement that without a lot of wasting and then the question is what is > > the real point. > > Having the section size as small as possible in these environments is > most certainly preferable, to clean up metadata where possible. There is a certain line that is hard to maintain. I consider a section to be the smallest granularity that makes sense to support. Current section sizing makes sense from the VMEMMAP point of view. If there are strong reasons to allow smaller once then I belive this should be compile time option. > Otherwise, hot(un)plugging smaller granularity behaves more like memory > ballooning (and I think I don't have to tell you that ballooning is used > excessively even though it wastes memory on metadata ;) ). Anyhow, > that's another discussion. Yeah, I am aware of that. And honestly subsection offlining makes very little sense to me. It was hard to argue against that for nvdimm usecases where we simply had to workaround the reality where devices couldn't have been aligned properly. I do not think we want to claim a support for general hotplug though. [...] > > There is only one certainty. Providing a long term interface with ever > > growing (ab)users is a hard target. And shinyN might be needed in the > > end. Who knows. My main point is that the existing interface is hitting > > a wall on usecases which _do_not_care_ about memory hotplug. And that is > > something we should be looking at. > > Agreed. I can see 3 scenarios > > a) no memory hotplug support, no sysfs. > b) memory hotplug support, no sysfs > c) memory hotplug support, sysfs > > Starting with a) and c) is the easiest way to go. Yes, the first and the simplest way would be to provide memory_hotplug=[disabled|v1] where disabled would be no sysfs interface, v1 would be the existing infrastructure. I would hope to land with v2 in a future which would provide a new interface. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-14 11:24 ` Michal Hocko @ 2020-09-14 12:14 ` David Hildenbrand 0 siblings, 0 replies; 21+ messages in thread From: David Hildenbrand @ 2020-09-14 12:14 UTC (permalink / raw) To: Michal Hocko Cc: Dave Hansen, Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel >> Otherwise, hot(un)plugging smaller granularity behaves more like memory >> ballooning (and I think I don't have to tell you that ballooning is used >> excessively even though it wastes memory on metadata ;) ). Anyhow, >> that's another discussion. > > Yeah, I am aware of that. And honestly subsection offlining makes very > little sense to me. It was hard to argue against that for nvdimm > usecases where we simply had to workaround the reality where devices > couldn't have been aligned properly. I do not think we want to claim a > support for general hotplug though. Totally agree, I also don't want to see actual sub-section onlining/offlining in the core (e.g., virtio-mem emulates that on top, but it behaves a lot more like memory ballooning). > > [...] > >>> There is only one certainty. Providing a long term interface with ever >>> growing (ab)users is a hard target. And shinyN might be needed in the >>> end. Who knows. My main point is that the existing interface is hitting >>> a wall on usecases which _do_not_care_ about memory hotplug. And that is >>> something we should be looking at. >> >> Agreed. I can see 3 scenarios >> >> a) no memory hotplug support, no sysfs. >> b) memory hotplug support, no sysfs >> c) memory hotplug support, sysfs >> >> Starting with a) and c) is the easiest way to go. > > Yes, the first and the simplest way would be to provide > memory_hotplug=[disabled|v1] > > where disabled would be no sysfs interface, v1 would be the existing > infrastructure. I would hope to land with v2 in a future which would > provide a new interface. > Agreed. -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand 2020-09-10 20:00 ` Dave Hansen @ 2020-09-10 20:57 ` Dave Hansen 2020-09-22 13:56 ` Gerald Schaefer 2 siblings, 0 replies; 21+ messages in thread From: Dave Hansen @ 2020-09-10 20:57 UTC (permalink / raw) To: David Hildenbrand, Gerald Schaefer, Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On 9/10/20 3:20 AM, David Hildenbrand wrote: > While I'd love to rip it out completely, I think it would break old > lsmem/chmem completely - and I assume that's not acceptable. I was > wondering what would be considered safe to do now/in the future: > > 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on > s390x). This will make old lsmem/chmem behave differently after > switching to a new kernel, like if sclp.rzm would not be set by HW - > AFAIU, it will assume all memory is in a single memory increment. Do we > care? > 2. Restrict it to s390x only. It always returned 0 on other > architectures, I was not able to find any user. By "restrict it", do you mean just remove the sysfs file on everything other than s390x? That seems like a good idea, especially if we don't have any users. That, plus boot option or something to reenable it would be nice if someone trips over it disappearing. If there is a user, we stand a chance of finding them because they'll hopefully get a good error message. Worst case, an strace will show an -ENOENT and should be pretty easy to track down. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand 2020-09-10 20:00 ` Dave Hansen 2020-09-10 20:57 ` Dave Hansen @ 2020-09-22 13:56 ` Gerald Schaefer 2020-09-25 14:49 ` David Hildenbrand 2 siblings, 1 reply; 21+ messages in thread From: Gerald Schaefer @ 2020-09-22 13:56 UTC (permalink / raw) To: David Hildenbrand Cc: Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On Thu, 10 Sep 2020 12:20:34 +0200 David Hildenbrand <david@redhat.com> wrote: > Hi everybody, > > I was just exploring how /sys/devices/system/memory/memoryX/phys_device > is/was used. It's one of these interfaces that most probably never > should have been added but now we are stuck with it. > > "phys_device" was used on s390x in older versions of lsmem[2]/chmem[3], > back when they were still part of s390x-tools. They were later replaced > [5] by the variants in linux-utils. For example, RHEL6 and RHEL7 contain > lsmem/chmem from s390-utils. RHEL8 switched to versions from util-linux > on s390x [4]. > > "phys_device" was added with sysfs support for memory hotplug in commit > 3947be1969a9 ("[PATCH] memory hotplug: sysfs and add/remove functions") > in 2005. It always returned 0. > > s390x started returning something != 0 on some setups (if sclp.rzm is > set by HW) in 2010 via commit 57b552ba0b2f("memory hotplug/s390: set > phys_device"). > > For s390x, it allowed for identifying which memory block devices belong > to the same memory increment (RZM). Only if all memory block devices > comprising a single memory increment were offline, the memory could > actually be removed in the hypervisor. > > Since commit e5d709bb5fb7 ("s390/memory hotplug: provide > memory_block_size_bytes() function") in 2013 a memory block devices > spans at least one memory increment - which is why the interface isn't > really helpful/used anymore (except by old lsmem/chmem tools). Correct, so I do not see any problem for s390 with removing / changing that for the upstream kernel. BTW, that commit also gave some relief on the scaling issue, at least for s390. With increasing total memory size, we also have increasing increment and thus memory block size. Of course, that also has some limitations, IIRC max. 1 GB increment size, but still better than the 256 MB default size. > > There were once RFC patches to make use of it in ACPI, but it could be > solved using different interfaces [1]. > > > While I'd love to rip it out completely, I think it would break old > lsmem/chmem completely - and I assume that's not acceptable. I was > wondering what would be considered safe to do now/in the future: > > 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on > s390x). This will make old lsmem/chmem behave differently after > switching to a new kernel, like if sclp.rzm would not be set by HW - > AFAIU, it will assume all memory is in a single memory increment. Do we > care? No, at least not until that kernel change would be backported to some old distribution level where we still use lsmem/chmem from s390-tools. Given that this is just some clean-up w/o any functional benefit, and hopefully w/o any negative impact, I think we can safely assume that no distributor will do that "just for fun". Even if there would be good reasons for backports, then I guess we also have good reasons for backporting / switching to the util-linux version of lsmem / chmem for such distribution levels. Alternatively, adjust the s390-tools lsmem / chmem there. But I would rather "rip it out completely" than just return 0. You'd need some lsmem / chmem changes anyway, at least in case this would ever be backported. > 2. Restrict it to s390x only. It always returned 0 on other > architectures, I was not able to find any user. > > I think 2 should be safe to do (never used on other archs). I do wonder > what the feelings are about 1. Please don't add any s390-specific workarounds here, that does not really sound like a clean-up, rather the opposite. That being said, I do not really see the benefit of this change at all. As Michal mentioned, there really should be some more fundamental change. And from the rest of this thread, it also seems that phys_device usage might not be the biggest issue here. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-22 13:56 ` Gerald Schaefer @ 2020-09-25 14:49 ` David Hildenbrand 2020-09-25 15:00 ` Greg KH 2020-09-25 15:39 ` Michal Hocko 0 siblings, 2 replies; 21+ messages in thread From: David Hildenbrand @ 2020-09-25 14:49 UTC (permalink / raw) To: Gerald Schaefer Cc: Michal Hocko, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel >> There were once RFC patches to make use of it in ACPI, but it could be >> solved using different interfaces [1]. >> >> >> While I'd love to rip it out completely, I think it would break old >> lsmem/chmem completely - and I assume that's not acceptable. I was >> wondering what would be considered safe to do now/in the future: >> >> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on >> s390x). This will make old lsmem/chmem behave differently after >> switching to a new kernel, like if sclp.rzm would not be set by HW - >> AFAIU, it will assume all memory is in a single memory increment. Do we >> care? > > No, at least not until that kernel change would be backported to some > old distribution level where we still use lsmem/chmem from s390-tools. > Given that this is just some clean-up w/o any functional benefit, and > hopefully w/o any negative impact, I think we can safely assume that no > distributor will do that "just for fun". > > Even if there would be good reasons for backports, then I guess we also > have good reasons for backporting / switching to the util-linux version > of lsmem / chmem for such distribution levels. Alternatively, adjust the > s390-tools lsmem / chmem there. > > But I would rather "rip it out completely" than just return 0. You'd > need some lsmem / chmem changes anyway, at least in case this would > ever be backported. Thanks for your input Gerald. So unless people would be running shiny new kernels on older distributions it shouldn't be a problem (and I don't think we care too much about something like that). I don't expect something like that to get backported - there is absolutely no reason to do so IMHO. > >> 2. Restrict it to s390x only. It always returned 0 on other >> architectures, I was not able to find any user. >> >> I think 2 should be safe to do (never used on other archs). I do wonder >> what the feelings are about 1. > > Please don't add any s390-specific workarounds here, that does not > really sound like a clean-up, rather the opposite. People seem to have different opinions here. I'm happy as long as we can get rid of it (either now, or in the future with a new model). > > That being said, I do not really see the benefit of this change at > all. As Michal mentioned, there really should be some more fundamental > change. And from the rest of this thread, it also seems that phys_device > usage might not be the biggest issue here. > As I already expressed, I am more of a friend of small, incremental changes than having a single big world switch where everything will be shiny and perfect. (Deprecating it now - in any way - stops any new users from appearing - both, in the kernel and from user space - eventually making the big world switch later a little easier because there is one thing less that vanished) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-25 14:49 ` David Hildenbrand @ 2020-09-25 15:00 ` Greg KH 2020-09-25 15:05 ` David Hildenbrand 2020-09-25 15:39 ` Michal Hocko 1 sibling, 1 reply; 21+ messages in thread From: Greg KH @ 2020-09-25 15:00 UTC (permalink / raw) To: David Hildenbrand Cc: Gerald Schaefer, Michal Hocko, akpm, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On Fri, Sep 25, 2020 at 04:49:28PM +0200, David Hildenbrand wrote: > >> There were once RFC patches to make use of it in ACPI, but it could be > >> solved using different interfaces [1]. > >> > >> > >> While I'd love to rip it out completely, I think it would break old > >> lsmem/chmem completely - and I assume that's not acceptable. I was > >> wondering what would be considered safe to do now/in the future: > >> > >> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on > >> s390x). This will make old lsmem/chmem behave differently after > >> switching to a new kernel, like if sclp.rzm would not be set by HW - > >> AFAIU, it will assume all memory is in a single memory increment. Do we > >> care? > > > > No, at least not until that kernel change would be backported to some > > old distribution level where we still use lsmem/chmem from s390-tools. > > Given that this is just some clean-up w/o any functional benefit, and > > hopefully w/o any negative impact, I think we can safely assume that no > > distributor will do that "just for fun". > > > > Even if there would be good reasons for backports, then I guess we also > > have good reasons for backporting / switching to the util-linux version > > of lsmem / chmem for such distribution levels. Alternatively, adjust the > > s390-tools lsmem / chmem there. > > > > But I would rather "rip it out completely" than just return 0. You'd > > need some lsmem / chmem changes anyway, at least in case this would > > ever be backported. > > Thanks for your input Gerald. > > So unless people would be running shiny new kernels on older > distributions it shouldn't be a problem (and I don't think we care too > much about something like that). I don't expect something like that to > get backported - there is absolutely no reason to do so IMHO. We do care about this, Andrew used to have an old Fedora 9 box or something like that, that he tourtured many of us with bug reports when we broke it :) So watch out, people keep old userspace around for much longer than you can possibly imagine because they don't like having their use-cases in userspace change, and we have made the guarantee to them that they _CAN_ trust us to not break things in userspace. It's a slow age-out, but watch out, you might have to revert things... good luck! greg k-h ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-25 15:00 ` Greg KH @ 2020-09-25 15:05 ` David Hildenbrand 0 siblings, 0 replies; 21+ messages in thread From: David Hildenbrand @ 2020-09-25 15:05 UTC (permalink / raw) To: Greg KH Cc: Gerald Schaefer, Michal Hocko, akpm, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel > > It's a slow age-out, but watch out, you might have to revert things... > > good luck! Yeah, I always liked playing with fire ;) Thanks for the insights Greg! -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-25 14:49 ` David Hildenbrand 2020-09-25 15:00 ` Greg KH @ 2020-09-25 15:39 ` Michal Hocko 2020-09-25 15:47 ` David Hildenbrand 1 sibling, 1 reply; 21+ messages in thread From: Michal Hocko @ 2020-09-25 15:39 UTC (permalink / raw) To: David Hildenbrand Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel On Fri 25-09-20 16:49:28, David Hildenbrand wrote: > >> There were once RFC patches to make use of it in ACPI, but it could be > >> solved using different interfaces [1]. > >> > >> > >> While I'd love to rip it out completely, I think it would break old > >> lsmem/chmem completely - and I assume that's not acceptable. I was > >> wondering what would be considered safe to do now/in the future: > >> > >> 1. Make it always return 0 (just as if "sclp.rzm" would be set to 0 on > >> s390x). This will make old lsmem/chmem behave differently after > >> switching to a new kernel, like if sclp.rzm would not be set by HW - > >> AFAIU, it will assume all memory is in a single memory increment. Do we > >> care? > > > > No, at least not until that kernel change would be backported to some > > old distribution level where we still use lsmem/chmem from s390-tools. > > Given that this is just some clean-up w/o any functional benefit, and > > hopefully w/o any negative impact, I think we can safely assume that no > > distributor will do that "just for fun". > > > > Even if there would be good reasons for backports, then I guess we also > > have good reasons for backporting / switching to the util-linux version > > of lsmem / chmem for such distribution levels. Alternatively, adjust the > > s390-tools lsmem / chmem there. > > > > But I would rather "rip it out completely" than just return 0. You'd > > need some lsmem / chmem changes anyway, at least in case this would > > ever be backported. > > Thanks for your input Gerald. > > So unless people would be running shiny new kernels on older > distributions it shouldn't be a problem (and I don't think we care too > much about something like that). I don't expect something like that to > get backported - there is absolutely no reason to do so IMHO. Ohh, there are many people running current Linus tree on an older distribution. Including me. > >> 2. Restrict it to s390x only. It always returned 0 on other > >> architectures, I was not able to find any user. > >> > >> I think 2 should be safe to do (never used on other archs). I do wonder > >> what the feelings are about 1. > > > > Please don't add any s390-specific workarounds here, that does not > > really sound like a clean-up, rather the opposite. > > People seem to have different opinions here. I'm happy as long as we can > get rid of it (either now, or in the future with a new model). > > > > > That being said, I do not really see the benefit of this change at > > all. As Michal mentioned, there really should be some more fundamental > > change. And from the rest of this thread, it also seems that phys_device > > usage might not be the biggest issue here. > > > > As I already expressed, I am more of a friend of small, incremental > changes than having a single big world switch where everything will be > shiny and perfect. > > (Deprecating it now - in any way - stops any new users from appearing - > both, in the kernel and from user space - eventually making the big > world switch later a little easier because there is one thing less that > vanished) Realistically people do not care about deprecation all that much. They simply use whatever they can find or somebody will show them. Really, deprecation has never really worked. The only thing that worked was to remove the functionality and then wait for somebody to complain and revert or somehow allow the functionality without necessity to alter the userspace. As much as I would like to remove as much crud as possible I strongly suspect that the existing hotplug interface is just a lost case and it doesn't make for the best used time to put a lip stick on a pig. Even if we remove this particular interface we are not going to get rid of a lot of code or we won't gain any more sensible semantic, right? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? 2020-09-25 15:39 ` Michal Hocko @ 2020-09-25 15:47 ` David Hildenbrand 0 siblings, 0 replies; 21+ messages in thread From: David Hildenbrand @ 2020-09-25 15:47 UTC (permalink / raw) To: Michal Hocko Cc: Gerald Schaefer, akpm, Greg KH, Jan Höppner, Heiko Carstens, linux-mm, linux-api, Dave Hansen, linux-kernel >>>> 2. Restrict it to s390x only. It always returned 0 on other >>>> architectures, I was not able to find any user. >>>> >>>> I think 2 should be safe to do (never used on other archs). I do wonder >>>> what the feelings are about 1. >>> >>> Please don't add any s390-specific workarounds here, that does not >>> really sound like a clean-up, rather the opposite. >> >> People seem to have different opinions here. I'm happy as long as we can >> get rid of it (either now, or in the future with a new model). >> >>> >>> That being said, I do not really see the benefit of this change at >>> all. As Michal mentioned, there really should be some more fundamental >>> change. And from the rest of this thread, it also seems that phys_device >>> usage might not be the biggest issue here. >>> >> >> As I already expressed, I am more of a friend of small, incremental >> changes than having a single big world switch where everything will be >> shiny and perfect. >> >> (Deprecating it now - in any way - stops any new users from appearing - >> both, in the kernel and from user space - eventually making the big >> world switch later a little easier because there is one thing less that >> vanished) > > Realistically people do not care about deprecation all that much. They > simply use whatever they can find or somebody will show them. Really, > deprecation has never really worked. The only thing that worked was to > remove the functionality and then wait for somebody to complain and > revert or somehow allow the functionality without necessity to alter the > userspace. Mainframe people are usually ... more conservative (well, they focus on stability and pay a lot of money for that - including HW). :) What they would lose here is s390x lsmem/chmem functionality, used to manage standby memory (under LPAR and z/VM, if enabled) - with the old tools. I have the feeling that this would be acceptable (I never had access to an LPAR that allowed for it ...), but yeah, you never now. > > As much as I would like to remove as much crud as possible I strongly > suspect that the existing hotplug interface is just a lost case and it > doesn't make for the best used time to put a lip stick on a pig. Even if > we remove this particular interface we are not going to get rid of a lot > of code or we won't gain any more sensible semantic, right? > Excluding some documentation drivers/base/memory.c | 29 ----------------------------- drivers/s390/char/sclp_cmd.c | 7 ------- include/linux/memory.h | 2 -- 3 files changed, 38 deletions(-) Seems like this is the only way to deprecate. (I mean I can add comments in the code, but as you say, doesn't stop new user space users from showing up) -- Thanks, David / dhildenb ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2020-09-25 15:47 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-10 10:20 Ways to deprecate /sys/devices/system/memory/memoryX/phys_device ? David Hildenbrand 2020-09-10 20:00 ` Dave Hansen 2020-09-10 20:31 ` David Hildenbrand 2020-09-11 7:20 ` Michal Hocko 2020-09-11 8:09 ` David Hildenbrand 2020-09-11 9:12 ` Michal Hocko 2020-09-11 10:09 ` David Hildenbrand 2020-09-11 19:24 ` Dave Hansen 2020-09-11 19:35 ` Luck, Tony 2020-09-11 19:56 ` David Hildenbrand 2020-09-11 20:09 ` Luck, Tony 2020-09-11 20:49 ` David Hildenbrand 2020-09-14 11:24 ` Michal Hocko 2020-09-14 12:14 ` David Hildenbrand 2020-09-10 20:57 ` Dave Hansen 2020-09-22 13:56 ` Gerald Schaefer 2020-09-25 14:49 ` David Hildenbrand 2020-09-25 15:00 ` Greg KH 2020-09-25 15:05 ` David Hildenbrand 2020-09-25 15:39 ` Michal Hocko 2020-09-25 15:47 ` David Hildenbrand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).