[v2] mm/memory-hotplug: Add sysfs hot-remove trigger
diff mbox series

Message ID 49ef5e6c12f5ede189419d4dcced5dc04957c34d.1549906631.git.robin.murphy@arm.com
State In Next
Commit 81af9e67b6c2d43bd20c84df77087218b40d7937
Headers show
Series
  • [v2] mm/memory-hotplug: Add sysfs hot-remove trigger
Related show

Commit Message

Robin Murphy Feb. 11, 2019, 5:50 p.m. UTC
ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
but being able to exercise the (arguably trickier) hot-remove path would
be even more useful. Extend the feature to allow removal of offline
sections to be triggered manually to aid development.

Since process dictates the new sysfs entry be documented, let's also
document the existing probe entry to match - better 13-and-a-half years
late than never, as they say...

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: Use is_memblock_offlined() helper, write up documentation

 .../ABI/testing/sysfs-devices-memory          | 25 +++++++++++
 drivers/base/memory.c                         | 42 ++++++++++++++++++-
 2 files changed, 66 insertions(+), 1 deletion(-)

Comments

Anshuman Khandual Feb. 12, 2019, 3:20 a.m. UTC | #1
On 02/11/2019 11:20 PM, Robin Murphy wrote:
> ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
> but being able to exercise the (arguably trickier) hot-remove path would
> be even more useful. Extend the feature to allow removal of offline
> sections to be triggered manually to aid development.
> 
> Since process dictates the new sysfs entry be documented, let's also
> document the existing probe entry to match - better 13-and-a-half years
> late than never, as they say...
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2: Use is_memblock_offlined() helper, write up documentation
> 
>  .../ABI/testing/sysfs-devices-memory          | 25 +++++++++++
>  drivers/base/memory.c                         | 42 ++++++++++++++++++-
>  2 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
> index deef3b5723cf..02a4250964e0 100644
> --- a/Documentation/ABI/testing/sysfs-devices-memory
> +++ b/Documentation/ABI/testing/sysfs-devices-memory
> @@ -91,3 +91,28 @@ Description:
>  		memory section directory.  For example, the following symbolic
>  		link is created for memory section 9 on node0.
>  		/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
> +
> +What:		/sys/devices/system/memory/probe
> +Date:		October 2005
> +Contact:	Linux Memory Management list <linux-mm@kvack.org>
> +Description:
> +		The file /sys/devices/system/memory/probe is write-only, and
> +		when written will simulate a physical hot-add of a memory

Small nit. It does not 'simulate' but really does add the memory block into
the memblock, buddy allocator and so on.

> +		section at the given address. For example, assuming a section
> +		of unused memory exists at physical address 0x80000000, it can
> +		be introduced to the kernel with the following command:
> +		# echo 0x80000000 > /sys/devices/system/memory/probe
> +Users:		Memory hotplug testing and development
> +
> +What:		/sys/devices/system/memory/memoryX/remove
> +Date:		February 2019
> +Contact:	Linux Memory Management list <linux-mm@kvack.org>
> +Description:
> +		The file /sys/devices/system/memory/memoryX/remove is
> +		write-only, and when written with a boolean 'true' value will
> +		simulate a physical hot-remove of that memory section. For

Same here.
> +		example, assuming a 1GB section size, the section added by the
> +		above "probe" example could be removed again with the following

There is no need to mention specific memory block sizes like 1G in documentation
for this generic interface which would work for all possible sizes.
Michal Hocko Feb. 12, 2019, 8:33 a.m. UTC | #2
On Mon 11-02-19 17:50:46, Robin Murphy wrote:
> ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
> but being able to exercise the (arguably trickier) hot-remove path would
> be even more useful. Extend the feature to allow removal of offline
> sections to be triggered manually to aid development.
> 
> Since process dictates the new sysfs entry be documented, let's also
> document the existing probe entry to match - better 13-and-a-half years
> late than never, as they say...

The probe sysfs is quite dubious already TBH. Apart from testing, is
anybody using it for something real? Do we need to keep an API for
something testing only? Why isn't a customer testing module enough for
such a purpose?

In other words, why do we have to add an API that has to be maintained
for ever for a testing only purpose?

Besides that, what is the reason to use __remove_memory rather than the
exported remove_memory which does an additional locking. Also, we do
trust root to do sane things but are we sure that the current BUG-land
mines in the hotplug code are ready enough to be exported for playing?

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2: Use is_memblock_offlined() helper, write up documentation
> 
>  .../ABI/testing/sysfs-devices-memory          | 25 +++++++++++
>  drivers/base/memory.c                         | 42 ++++++++++++++++++-
>  2 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
> index deef3b5723cf..02a4250964e0 100644
> --- a/Documentation/ABI/testing/sysfs-devices-memory
> +++ b/Documentation/ABI/testing/sysfs-devices-memory
> @@ -91,3 +91,28 @@ Description:
>  		memory section directory.  For example, the following symbolic
>  		link is created for memory section 9 on node0.
>  		/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
> +
> +What:		/sys/devices/system/memory/probe
> +Date:		October 2005
> +Contact:	Linux Memory Management list <linux-mm@kvack.org>
> +Description:
> +		The file /sys/devices/system/memory/probe is write-only, and
> +		when written will simulate a physical hot-add of a memory
> +		section at the given address. For example, assuming a section
> +		of unused memory exists at physical address 0x80000000, it can
> +		be introduced to the kernel with the following command:
> +		# echo 0x80000000 > /sys/devices/system/memory/probe
> +Users:		Memory hotplug testing and development
> +
> +What:		/sys/devices/system/memory/memoryX/remove
> +Date:		February 2019
> +Contact:	Linux Memory Management list <linux-mm@kvack.org>
> +Description:
> +		The file /sys/devices/system/memory/memoryX/remove is
> +		write-only, and when written with a boolean 'true' value will
> +		simulate a physical hot-remove of that memory section. For
> +		example, assuming a 1GB section size, the section added by the
> +		above "probe" example could be removed again with the following
> +		command:
> +		# echo 1 > /sys/devices/system/memory/memory2/remove
> +Users:		Memory hotplug testing and development
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 048cbf7d5233..1ba9d1a6ba5e 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -521,7 +521,44 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
>  }
>  
>  static DEVICE_ATTR_WO(probe);
> -#endif
> +
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
> +			    const char *buf, size_t count)
> +{
> +	struct memory_block *mem = to_memory_block(dev);
> +	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
> +	bool remove;
> +	int ret;
> +
> +	ret = kstrtobool(buf, &remove);
> +	if (ret)
> +		return ret;
> +	if (!remove)
> +		return count;
> +
> +	if (!is_memblock_offlined(mem))
> +		return -EBUSY;
> +
> +	ret = lock_device_hotplug_sysfs();
> +	if (ret)
> +		return ret;
> +
> +	if (device_remove_file_self(dev, attr)) {
> +		__remove_memory(pfn_to_nid(start_pfn), PFN_PHYS(start_pfn),
> +				MIN_MEMORY_BLOCK_SIZE * sections_per_block);
> +		ret = count;
> +	} else {
> +		ret = -EBUSY;
> +	}
> +
> +	unlock_device_hotplug();
> +	return ret;
> +}
> +
> +static DEVICE_ATTR_WO(remove);
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
> +#endif /* CONFIG_ARCH_MEMORY_PROBE */
>  
>  #ifdef CONFIG_MEMORY_FAILURE
>  /*
> @@ -615,6 +652,9 @@ static struct attribute *memory_memblk_attrs[] = {
>  	&dev_attr_removable.attr,
>  #ifdef CONFIG_MEMORY_HOTREMOVE
>  	&dev_attr_valid_zones.attr,
> +#ifdef CONFIG_ARCH_MEMORY_PROBE
> +	&dev_attr_remove.attr,
> +#endif
>  #endif
>  	NULL
>  };
> -- 
> 2.20.1.dirty
Robin Murphy Feb. 12, 2019, 2:54 p.m. UTC | #3
On 12/02/2019 08:33, Michal Hocko wrote:
> On Mon 11-02-19 17:50:46, Robin Murphy wrote:
>> ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
>> but being able to exercise the (arguably trickier) hot-remove path would
>> be even more useful. Extend the feature to allow removal of offline
>> sections to be triggered manually to aid development.
>>
>> Since process dictates the new sysfs entry be documented, let's also
>> document the existing probe entry to match - better 13-and-a-half years
>> late than never, as they say...
> 
> The probe sysfs is quite dubious already TBH. Apart from testing, is
> anybody using it for something real? Do we need to keep an API for
> something testing only? Why isn't a customer testing module enough for
> such a purpose?

 From the arm64 angle, beyond "conventional" servers where we can 
hopefully assume ACPI, I can imagine there being embedded/HPC setups 
(not all as esoteric as that distributed-memory dRedBox thing), as well 
as virtual machines, that are DT-based with minimal runtime firmware. 
I'm none too keen on the idea either, but if such systems want to 
support physical hotplug then driving it from userspace might be the 
only reasonable approach. I'm just loath to actually document it as 
anything other than a developer feature so as not to give the impression 
that I consider it anything other than a last resort for production use. 
I do note that my x86 distro kernel has ARCH_MEMORY_PROBE enabled 
despite it being "for testing".

> In other words, why do we have to add an API that has to be maintained
> for ever for a testing only purpose?

There's already half the API being maintained, though, so adding the 
corresponding other half alongside it doesn't seem like that great an 
overhead, regardless of how it ends up getting used. Ultimately, though, 
it's a patch I wrote because I needed it, and if everyone else is 
adamant that it's not useful enough then fair enough - it's at least in 
the list archives now so I can sleep happy that I've done my 
"contributing back" bit as best I could :)

> Besides that, what is the reason to use __remove_memory rather than the
> exported remove_memory which does an additional locking.

For the same reason that probe uses __add_memory() rather than 
add_memory() - I can't claim to understand *exactly* why 
lock_device_hotplug_sysfs() does what it does compared to 
lock_device_hotplug() (even after reading 5e33bc4165f3), but I can only 
assume it's safest to be consistent with the other attributes here.

> Also, we do
> trust root to do sane things but are we sure that the current BUG-land
> mines in the hotplug code are ready enough to be exported for playing?

Well, the point of this particular implementation as opposed to other 
approaches is that it's impossible by construction to even attempt to 
remove something which isn't an exact, valid memory_block. AFAICS that 
should make it at least as robust as any other hot-remove caller.

Robin.

>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> v2: Use is_memblock_offlined() helper, write up documentation
>>
>>   .../ABI/testing/sysfs-devices-memory          | 25 +++++++++++
>>   drivers/base/memory.c                         | 42 ++++++++++++++++++-
>>   2 files changed, 66 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
>> index deef3b5723cf..02a4250964e0 100644
>> --- a/Documentation/ABI/testing/sysfs-devices-memory
>> +++ b/Documentation/ABI/testing/sysfs-devices-memory
>> @@ -91,3 +91,28 @@ Description:
>>   		memory section directory.  For example, the following symbolic
>>   		link is created for memory section 9 on node0.
>>   		/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
>> +
>> +What:		/sys/devices/system/memory/probe
>> +Date:		October 2005
>> +Contact:	Linux Memory Management list <linux-mm@kvack.org>
>> +Description:
>> +		The file /sys/devices/system/memory/probe is write-only, and
>> +		when written will simulate a physical hot-add of a memory
>> +		section at the given address. For example, assuming a section
>> +		of unused memory exists at physical address 0x80000000, it can
>> +		be introduced to the kernel with the following command:
>> +		# echo 0x80000000 > /sys/devices/system/memory/probe
>> +Users:		Memory hotplug testing and development
>> +
>> +What:		/sys/devices/system/memory/memoryX/remove
>> +Date:		February 2019
>> +Contact:	Linux Memory Management list <linux-mm@kvack.org>
>> +Description:
>> +		The file /sys/devices/system/memory/memoryX/remove is
>> +		write-only, and when written with a boolean 'true' value will
>> +		simulate a physical hot-remove of that memory section. For
>> +		example, assuming a 1GB section size, the section added by the
>> +		above "probe" example could be removed again with the following
>> +		command:
>> +		# echo 1 > /sys/devices/system/memory/memory2/remove
>> +Users:		Memory hotplug testing and development
>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>> index 048cbf7d5233..1ba9d1a6ba5e 100644
>> --- a/drivers/base/memory.c
>> +++ b/drivers/base/memory.c
>> @@ -521,7 +521,44 @@ static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
>>   }
>>   
>>   static DEVICE_ATTR_WO(probe);
>> -#endif
>> +
>> +#ifdef CONFIG_MEMORY_HOTREMOVE
>> +static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
>> +			    const char *buf, size_t count)
>> +{
>> +	struct memory_block *mem = to_memory_block(dev);
>> +	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
>> +	bool remove;
>> +	int ret;
>> +
>> +	ret = kstrtobool(buf, &remove);
>> +	if (ret)
>> +		return ret;
>> +	if (!remove)
>> +		return count;
>> +
>> +	if (!is_memblock_offlined(mem))
>> +		return -EBUSY;
>> +
>> +	ret = lock_device_hotplug_sysfs();
>> +	if (ret)
>> +		return ret;
>> +
>> +	if (device_remove_file_self(dev, attr)) {
>> +		__remove_memory(pfn_to_nid(start_pfn), PFN_PHYS(start_pfn),
>> +				MIN_MEMORY_BLOCK_SIZE * sections_per_block);
>> +		ret = count;
>> +	} else {
>> +		ret = -EBUSY;
>> +	}
>> +
>> +	unlock_device_hotplug();
>> +	return ret;
>> +}
>> +
>> +static DEVICE_ATTR_WO(remove);
>> +#endif /* CONFIG_MEMORY_HOTREMOVE */
>> +#endif /* CONFIG_ARCH_MEMORY_PROBE */
>>   
>>   #ifdef CONFIG_MEMORY_FAILURE
>>   /*
>> @@ -615,6 +652,9 @@ static struct attribute *memory_memblk_attrs[] = {
>>   	&dev_attr_removable.attr,
>>   #ifdef CONFIG_MEMORY_HOTREMOVE
>>   	&dev_attr_valid_zones.attr,
>> +#ifdef CONFIG_ARCH_MEMORY_PROBE
>> +	&dev_attr_remove.attr,
>> +#endif
>>   #endif
>>   	NULL
>>   };
>> -- 
>> 2.20.1.dirty
>
Michal Hocko Feb. 12, 2019, 3:11 p.m. UTC | #4
On Tue 12-02-19 14:54:36, Robin Murphy wrote:
> On 12/02/2019 08:33, Michal Hocko wrote:
> > On Mon 11-02-19 17:50:46, Robin Murphy wrote:
> > > ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
> > > but being able to exercise the (arguably trickier) hot-remove path would
> > > be even more useful. Extend the feature to allow removal of offline
> > > sections to be triggered manually to aid development.
> > > 
> > > Since process dictates the new sysfs entry be documented, let's also
> > > document the existing probe entry to match - better 13-and-a-half years
> > > late than never, as they say...
> > 
> > The probe sysfs is quite dubious already TBH. Apart from testing, is
> > anybody using it for something real? Do we need to keep an API for
> > something testing only? Why isn't a customer testing module enough for
> > such a purpose?
> 
> From the arm64 angle, beyond "conventional" servers where we can hopefully
> assume ACPI, I can imagine there being embedded/HPC setups (not all as
> esoteric as that distributed-memory dRedBox thing), as well as virtual
> machines, that are DT-based with minimal runtime firmware. I'm none too keen
> on the idea either, but if such systems want to support physical hotplug
> then driving it from userspace might be the only reasonable approach. I'm
> just loath to actually document it as anything other than a developer
> feature so as not to give the impression that I consider it anything other
> than a last resort for production use.

This doesn't sound convicing to add an user API.

> I do note that my x86 distro kernel
> has ARCH_MEMORY_PROBE enabled despite it being "for testing".

Yeah, there have been mistakes done in the API land & hotplug in the
past.

> > In other words, why do we have to add an API that has to be maintained
> > for ever for a testing only purpose?
> 
> There's already half the API being maintained, though, so adding the
> corresponding other half alongside it doesn't seem like that great an
> overhead, regardless of how it ends up getting used.

As already said above. The hotplug user API is not something to follow
for the future development. So no, we are half broken let's continue is
not a reasonable argument.

> Ultimately, though,
> it's a patch I wrote because I needed it, and if everyone else is adamant
> that it's not useful enough then fair enough - it's at least in the list
> archives now so I can sleep happy that I've done my "contributing back" bit
> as best I could :)

I am not saing this is not useful. It is. But I do not think we want to
make it an official api without a strong usecase. And then we should
think twice to make the api both useable and reasonable. A kernel module
for playing sounds like more than sufficient.
David Hildenbrand Feb. 25, 2019, 9:14 p.m. UTC | #5
On 12.02.19 16:11, Michal Hocko wrote:
> On Tue 12-02-19 14:54:36, Robin Murphy wrote:
>> On 12/02/2019 08:33, Michal Hocko wrote:
>>> On Mon 11-02-19 17:50:46, Robin Murphy wrote:
>>>> ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
>>>> but being able to exercise the (arguably trickier) hot-remove path would
>>>> be even more useful. Extend the feature to allow removal of offline
>>>> sections to be triggered manually to aid development.
>>>>
>>>> Since process dictates the new sysfs entry be documented, let's also
>>>> document the existing probe entry to match - better 13-and-a-half years
>>>> late than never, as they say...
>>>
>>> The probe sysfs is quite dubious already TBH. Apart from testing, is
>>> anybody using it for something real? Do we need to keep an API for
>>> something testing only? Why isn't a customer testing module enough for
>>> such a purpose?
>>
>> From the arm64 angle, beyond "conventional" servers where we can hopefully
>> assume ACPI, I can imagine there being embedded/HPC setups (not all as
>> esoteric as that distributed-memory dRedBox thing), as well as virtual
>> machines, that are DT-based with minimal runtime firmware. I'm none too keen
>> on the idea either, but if such systems want to support physical hotplug
>> then driving it from userspace might be the only reasonable approach. I'm
>> just loath to actually document it as anything other than a developer
>> feature so as not to give the impression that I consider it anything other
>> than a last resort for production use.
> 
> This doesn't sound convicing to add an user API.
> 
>> I do note that my x86 distro kernel
>> has ARCH_MEMORY_PROBE enabled despite it being "for testing".
> 
> Yeah, there have been mistakes done in the API land & hotplug in the
> past.
> 
>>> In other words, why do we have to add an API that has to be maintained
>>> for ever for a testing only purpose?
>>
>> There's already half the API being maintained, though, so adding the
>> corresponding other half alongside it doesn't seem like that great an
>> overhead, regardless of how it ends up getting used.
> 
> As already said above. The hotplug user API is not something to follow
> for the future development. So no, we are half broken let's continue is
> not a reasonable argument.
> 
>> Ultimately, though,
>> it's a patch I wrote because I needed it, and if everyone else is adamant
>> that it's not useful enough then fair enough - it's at least in the list
>> archives now so I can sleep happy that I've done my "contributing back" bit
>> as best I could :)
> 
> I am not saing this is not useful. It is. But I do not think we want to
> make it an official api without a strong usecase. And then we should
> think twice to make the api both useable and reasonable. A kernel module
> for playing sounds like more than sufficient.
> 

I'm late for the party, I consider this very useful for testing, but I
agree that this should not be an official API.

The memory API is already very messed up. We have the "removable"
attribute which does not mean that memory is removable. It means that
memory can be offlined and eventually "unplugged/removed" if the HW
supports it (e.g. a DIMM, otherwise it will never go).

You would be introducing "remove", which would sometimes not work when
"removable" indicates "true" (because your API only works if memory has
already been offlined). I would much rather want to see some of the mess
get cleaned up than new stuff getting added that might not be needed
besides for testing. Yes, not your fault, but an API that keeps getting
more confusing.

I am really starting to strongly dislike the "removable" attribute.
Robin Murphy Feb. 26, 2019, 3:12 p.m. UTC | #6
On 25/02/2019 21:14, David Hildenbrand wrote:
> On 12.02.19 16:11, Michal Hocko wrote:
>> On Tue 12-02-19 14:54:36, Robin Murphy wrote:
>>> On 12/02/2019 08:33, Michal Hocko wrote:
>>>> On Mon 11-02-19 17:50:46, Robin Murphy wrote:
>>>>> ARCH_MEMORY_PROBE is a useful thing for testing and debugging hotplug,
>>>>> but being able to exercise the (arguably trickier) hot-remove path would
>>>>> be even more useful. Extend the feature to allow removal of offline
>>>>> sections to be triggered manually to aid development.
>>>>>
>>>>> Since process dictates the new sysfs entry be documented, let's also
>>>>> document the existing probe entry to match - better 13-and-a-half years
>>>>> late than never, as they say...
>>>>
>>>> The probe sysfs is quite dubious already TBH. Apart from testing, is
>>>> anybody using it for something real? Do we need to keep an API for
>>>> something testing only? Why isn't a customer testing module enough for
>>>> such a purpose?
>>>
>>>  From the arm64 angle, beyond "conventional" servers where we can hopefully
>>> assume ACPI, I can imagine there being embedded/HPC setups (not all as
>>> esoteric as that distributed-memory dRedBox thing), as well as virtual
>>> machines, that are DT-based with minimal runtime firmware. I'm none too keen
>>> on the idea either, but if such systems want to support physical hotplug
>>> then driving it from userspace might be the only reasonable approach. I'm
>>> just loath to actually document it as anything other than a developer
>>> feature so as not to give the impression that I consider it anything other
>>> than a last resort for production use.
>>
>> This doesn't sound convicing to add an user API.
>>
>>> I do note that my x86 distro kernel
>>> has ARCH_MEMORY_PROBE enabled despite it being "for testing".
>>
>> Yeah, there have been mistakes done in the API land & hotplug in the
>> past.
>>
>>>> In other words, why do we have to add an API that has to be maintained
>>>> for ever for a testing only purpose?
>>>
>>> There's already half the API being maintained, though, so adding the
>>> corresponding other half alongside it doesn't seem like that great an
>>> overhead, regardless of how it ends up getting used.
>>
>> As already said above. The hotplug user API is not something to follow
>> for the future development. So no, we are half broken let's continue is
>> not a reasonable argument.
>>
>>> Ultimately, though,
>>> it's a patch I wrote because I needed it, and if everyone else is adamant
>>> that it's not useful enough then fair enough - it's at least in the list
>>> archives now so I can sleep happy that I've done my "contributing back" bit
>>> as best I could :)
>>
>> I am not saing this is not useful. It is. But I do not think we want to
>> make it an official api without a strong usecase. And then we should
>> think twice to make the api both useable and reasonable. A kernel module
>> for playing sounds like more than sufficient.
>>
> 
> I'm late for the party, I consider this very useful for testing, but I
> agree that this should not be an official API.
> 
> The memory API is already very messed up. We have the "removable"
> attribute which does not mean that memory is removable. It means that
> memory can be offlined and eventually "unplugged/removed" if the HW
> supports it (e.g. a DIMM, otherwise it will never go).
> 
> You would be introducing "remove", which would sometimes not work when
> "removable" indicates "true" (because your API only works if memory has
> already been offlined). I would much rather want to see some of the mess
> get cleaned up than new stuff getting added that might not be needed
> besides for testing. Yes, not your fault, but an API that keeps getting
> more confusing.

OK, I guess Andrew should probably drop this patch from -next - I'll add 
my own self-nack if it helps :)

The back of my mind is still ticking over trying to think up a really 
nice design for a self-contained debugfs or module-parameter interface 
completely independent of ARCH_MEMORY_PROBE - I'll probably keep using 
this hack locally to finish off the arm64 hot-remove stuff, but once 
that's done (or if inspiration strikes in the meantime) then I'll try to 
come back with a prototype of the developer interface that I'd find most 
useful.

> I am really starting to strongly dislike the "removable" attribute.

Yeah, I think in general we could do with a new term for boolean-like 
things which have values of "no" and "maybe" - or "yes" and "maybe" in 
the case of security vulnerabilities :)

Robin.
Michal Hocko Feb. 26, 2019, 3:18 p.m. UTC | #7
On Tue 26-02-19 15:12:40, Robin Murphy wrote:
[...]
> The back of my mind is still ticking over trying to think up a really nice
> design for a self-contained debugfs or module-parameter interface completely
> independent of ARCH_MEMORY_PROBE - I'll probably keep using this hack
> locally to finish off the arm64 hot-remove stuff, but once that's done (or
> if inspiration strikes in the meantime) then I'll try to come back with a
> prototype of the developer interface that I'd find most useful.

Would it make more sense to add a module to the testing machinery?

Patch
diff mbox series

diff --git a/Documentation/ABI/testing/sysfs-devices-memory b/Documentation/ABI/testing/sysfs-devices-memory
index deef3b5723cf..02a4250964e0 100644
--- a/Documentation/ABI/testing/sysfs-devices-memory
+++ b/Documentation/ABI/testing/sysfs-devices-memory
@@ -91,3 +91,28 @@  Description:
 		memory section directory.  For example, the following symbolic
 		link is created for memory section 9 on node0.
 		/sys/devices/system/node/node0/memory9 -> ../../memory/memory9
+
+What:		/sys/devices/system/memory/probe
+Date:		October 2005
+Contact:	Linux Memory Management list <linux-mm@kvack.org>
+Description:
+		The file /sys/devices/system/memory/probe is write-only, and
+		when written will simulate a physical hot-add of a memory
+		section at the given address. For example, assuming a section
+		of unused memory exists at physical address 0x80000000, it can
+		be introduced to the kernel with the following command:
+		# echo 0x80000000 > /sys/devices/system/memory/probe
+Users:		Memory hotplug testing and development
+
+What:		/sys/devices/system/memory/memoryX/remove
+Date:		February 2019
+Contact:	Linux Memory Management list <linux-mm@kvack.org>
+Description:
+		The file /sys/devices/system/memory/memoryX/remove is
+		write-only, and when written with a boolean 'true' value will
+		simulate a physical hot-remove of that memory section. For
+		example, assuming a 1GB section size, the section added by the
+		above "probe" example could be removed again with the following
+		command:
+		# echo 1 > /sys/devices/system/memory/memory2/remove
+Users:		Memory hotplug testing and development
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 048cbf7d5233..1ba9d1a6ba5e 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -521,7 +521,44 @@  static ssize_t probe_store(struct device *dev, struct device_attribute *attr,
 }
 
 static DEVICE_ATTR_WO(probe);
-#endif
+
+#ifdef CONFIG_MEMORY_HOTREMOVE
+static ssize_t remove_store(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count)
+{
+	struct memory_block *mem = to_memory_block(dev);
+	unsigned long start_pfn = section_nr_to_pfn(mem->start_section_nr);
+	bool remove;
+	int ret;
+
+	ret = kstrtobool(buf, &remove);
+	if (ret)
+		return ret;
+	if (!remove)
+		return count;
+
+	if (!is_memblock_offlined(mem))
+		return -EBUSY;
+
+	ret = lock_device_hotplug_sysfs();
+	if (ret)
+		return ret;
+
+	if (device_remove_file_self(dev, attr)) {
+		__remove_memory(pfn_to_nid(start_pfn), PFN_PHYS(start_pfn),
+				MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+		ret = count;
+	} else {
+		ret = -EBUSY;
+	}
+
+	unlock_device_hotplug();
+	return ret;
+}
+
+static DEVICE_ATTR_WO(remove);
+#endif /* CONFIG_MEMORY_HOTREMOVE */
+#endif /* CONFIG_ARCH_MEMORY_PROBE */
 
 #ifdef CONFIG_MEMORY_FAILURE
 /*
@@ -615,6 +652,9 @@  static struct attribute *memory_memblk_attrs[] = {
 	&dev_attr_removable.attr,
 #ifdef CONFIG_MEMORY_HOTREMOVE
 	&dev_attr_valid_zones.attr,
+#ifdef CONFIG_ARCH_MEMORY_PROBE
+	&dev_attr_remove.attr,
+#endif
 #endif
 	NULL
 };