linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* memory hotplug and force_remove
@ 2017-03-20 19:29 Michal Hocko
  2017-03-20 21:24 ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-03-20 19:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, Jiri Kosina, joeyli, linux-mm, LKML, linux-api

Hi Rafael,
we have been chasing the following BUG() triggering during the memory
hotremove (remove_memory):
	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
				check_memblock_offlined_cb);
	if (ret)
		BUG();

and it took a while to learn that the issue is caused by
/sys/firmware/acpi/hotplug/force_remove being enabled. I was really
surprised to see such an option because at least for the memory hotplug
it cannot work at all. Memory hotplug fails when the memory is still
in use. Even if we do not BUG() here enforcing the hotplug operation
will lead to problematic behavior later like crash or a silent memory
corruption if the memory gets onlined back and reused by somebody else.

I am wondering what was the motivation for introducing this behavior and
whether there is a way to disallow it for memory hotplug. Or maybe drop
it completely. What would break in such a case?

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-20 19:29 memory hotplug and force_remove Michal Hocko
@ 2017-03-20 21:24 ` Rafael J. Wysocki
  2017-03-21 16:13   ` joeyli
  2017-03-28  7:58   ` Michal Hocko
  0 siblings, 2 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2017-03-20 21:24 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Toshi Kani, Jiri Kosina, joeyli, linux-mm, LKML, linux-api

On Monday, March 20, 2017 03:29:39 PM Michal Hocko wrote:
> Hi Rafael,

Hi,

> we have been chasing the following BUG() triggering during the memory
> hotremove (remove_memory):
> 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> 				check_memblock_offlined_cb);
> 	if (ret)
> 		BUG();
> 
> and it took a while to learn that the issue is caused by
> /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> surprised to see such an option because at least for the memory hotplug
> it cannot work at all. Memory hotplug fails when the memory is still
> in use. Even if we do not BUG() here enforcing the hotplug operation
> will lead to problematic behavior later like crash or a silent memory
> corruption if the memory gets onlined back and reused by somebody else.
> 
> I am wondering what was the motivation for introducing this behavior and
> whether there is a way to disallow it for memory hotplug. Or maybe drop
> it completely. What would break in such a case?

Honestly, I don't remember from the top of my head and I haven't looked at
that code for several months.

I need some time to recall that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-20 21:24 ` Rafael J. Wysocki
@ 2017-03-21 16:13   ` joeyli
  2017-03-28  7:58   ` Michal Hocko
  1 sibling, 0 replies; 16+ messages in thread
From: joeyli @ 2017-03-21 16:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Michal Hocko, Toshi Kani, Jiri Kosina, linux-mm, LKML, linux-api

On Mon, Mar 20, 2017 at 10:24:42PM +0100, Rafael J. Wysocki wrote:
> On Monday, March 20, 2017 03:29:39 PM Michal Hocko wrote:
> > Hi Rafael,
> 
> Hi,
> 
> > we have been chasing the following BUG() triggering during the memory
> > hotremove (remove_memory):
> > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > 				check_memblock_offlined_cb);
> > 	if (ret)
> > 		BUG();
> > 
> > and it took a while to learn that the issue is caused by
> > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > surprised to see such an option because at least for the memory hotplug
> > it cannot work at all. Memory hotplug fails when the memory is still
> > in use. Even if we do not BUG() here enforcing the hotplug operation
> > will lead to problematic behavior later like crash or a silent memory
> > corruption if the memory gets onlined back and reused by somebody else.
> > 
> > I am wondering what was the motivation for introducing this behavior and
> > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > it completely. What would break in such a case?
> 
> Honestly, I don't remember from the top of my head and I haven't looked at
> that code for several months.
> 
> I need some time to recall that.
>

IMHO. 
In the second pass offline in acpi_scan_try_to_offline(), when force_remove flag
enabled, it's still run offline on the parent device even there have any child
device offline failed. And it doesn't return the error from acpi_bus_offline() to
caller. 

	errdev = NULL;
	acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX, 
			    NULL, acpi_bus_offline, (void *)true,
			    (void **)&errdev);
	if (!errdev || acpi_force_hot_remove)                 
		acpi_bus_offline(handle, 0, (void *)true, 
				 (void **)&errdev);

In this situation, the parent device or any child device may not really
offline successfully. But acpi_scan_hot_remove, the caller doesn't know that.
Then it cause the later acpi_bus_trim() process failed.

acpi_bus_trim()
	-> handler->detach()
		-> acpi_memory_device_remove()
			-> remove_memory() -> BUG()  

because some memory doesn't really offline. 

Thanks a lot!
Joey Lee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-20 21:24 ` Rafael J. Wysocki
  2017-03-21 16:13   ` joeyli
@ 2017-03-28  7:58   ` Michal Hocko
  2017-03-28 15:22     ` Rafael J. Wysocki
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-03-28  7:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, Jiri Kosina, joeyli, linux-mm, LKML, linux-api

On Mon 20-03-17 22:24:42, Rafael J. Wysocki wrote:
> On Monday, March 20, 2017 03:29:39 PM Michal Hocko wrote:
> > Hi Rafael,
> 
> Hi,
> 
> > we have been chasing the following BUG() triggering during the memory
> > hotremove (remove_memory):
> > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > 				check_memblock_offlined_cb);
> > 	if (ret)
> > 		BUG();
> > 
> > and it took a while to learn that the issue is caused by
> > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > surprised to see such an option because at least for the memory hotplug
> > it cannot work at all. Memory hotplug fails when the memory is still
> > in use. Even if we do not BUG() here enforcing the hotplug operation
> > will lead to problematic behavior later like crash or a silent memory
> > corruption if the memory gets onlined back and reused by somebody else.
> > 
> > I am wondering what was the motivation for introducing this behavior and
> > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > it completely. What would break in such a case?
> 
> Honestly, I don't remember from the top of my head and I haven't looked at
> that code for several months.
> 
> I need some time to recall that.

Did you have any chance to look into this?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-28  7:58   ` Michal Hocko
@ 2017-03-28 15:22     ` Rafael J. Wysocki
  2017-03-30  8:47       ` Jiri Kosina
  2017-03-31  8:30       ` Michal Hocko
  0 siblings, 2 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2017-03-28 15:22 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Toshi Kani, Jiri Kosina, joeyli, linux-mm, LKML, linux-api

On Tuesday, March 28, 2017 09:58:08 AM Michal Hocko wrote:
> On Mon 20-03-17 22:24:42, Rafael J. Wysocki wrote:
> > On Monday, March 20, 2017 03:29:39 PM Michal Hocko wrote:
> > > Hi Rafael,
> > 
> > Hi,
> > 
> > > we have been chasing the following BUG() triggering during the memory
> > > hotremove (remove_memory):
> > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > 				check_memblock_offlined_cb);
> > > 	if (ret)
> > > 		BUG();
> > > 
> > > and it took a while to learn that the issue is caused by
> > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > surprised to see such an option because at least for the memory hotplug
> > > it cannot work at all. Memory hotplug fails when the memory is still
> > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > will lead to problematic behavior later like crash or a silent memory
> > > corruption if the memory gets onlined back and reused by somebody else.
> > > 
> > > I am wondering what was the motivation for introducing this behavior and
> > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > it completely. What would break in such a case?
> > 
> > Honestly, I don't remember from the top of my head and I haven't looked at
> > that code for several months.
> > 
> > I need some time to recall that.
> 
> Did you have any chance to look into this?

Well, yes.

It looks like that was added for some people who depended on the old behavior
at that time.

I guess we can try to drop it and see what happpens. :-)

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-28 15:22     ` Rafael J. Wysocki
@ 2017-03-30  8:47       ` Jiri Kosina
  2017-03-30 16:20         ` Michal Hocko
  2017-03-31  8:30       ` Michal Hocko
  1 sibling, 1 reply; 16+ messages in thread
From: Jiri Kosina @ 2017-03-30  8:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Michal Hocko, Toshi Kani, joeyli, linux-mm, LKML, linux-api

On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:

> > > > we have been chasing the following BUG() triggering during the memory
> > > > hotremove (remove_memory):
> > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > 				check_memblock_offlined_cb);
> > > > 	if (ret)
> > > > 		BUG();
> > > > 
> > > > and it took a while to learn that the issue is caused by
> > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > surprised to see such an option because at least for the memory hotplug
> > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > will lead to problematic behavior later like crash or a silent memory
> > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > 
> > > > I am wondering what was the motivation for introducing this behavior and
> > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > it completely. What would break in such a case?
> > > 
> > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > that code for several months.
> > > 
> > > I need some time to recall that.
> > 
> > Did you have any chance to look into this?
> 
> Well, yes.
> 
> It looks like that was added for some people who depended on the old behavior
> at that time.
> 
> I guess we can try to drop it and see what happpens. :-)

I'd agree with that; at the same time, udev rule should be submitted to 
systemd folks though. I don't think there is anything existing in this 
area yet (neither do distros ship their own udev rules for this AFAIK).

Thanks,

-- 
Jiri Kosina
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-30  8:47       ` Jiri Kosina
@ 2017-03-30 16:20         ` Michal Hocko
  2017-03-30 16:57           ` joeyli
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-03-30 16:20 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Rafael J. Wysocki, Toshi Kani, joeyli, linux-mm, LKML, linux-api

On Thu 30-03-17 10:47:52, Jiri Kosina wrote:
> On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:
> 
> > > > > we have been chasing the following BUG() triggering during the memory
> > > > > hotremove (remove_memory):
> > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > 				check_memblock_offlined_cb);
> > > > > 	if (ret)
> > > > > 		BUG();
> > > > > 
> > > > > and it took a while to learn that the issue is caused by
> > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > surprised to see such an option because at least for the memory hotplug
> > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > 
> > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > it completely. What would break in such a case?
> > > > 
> > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > that code for several months.
> > > > 
> > > > I need some time to recall that.
> > > 
> > > Did you have any chance to look into this?
> > 
> > Well, yes.
> > 
> > It looks like that was added for some people who depended on the old behavior
> > at that time.
> > 
> > I guess we can try to drop it and see what happpens. :-)
> 
> I'd agree with that; at the same time, udev rule should be submitted to 
> systemd folks though. I don't think there is anything existing in this 
> area yet (neither do distros ship their own udev rules for this AFAIK).

Another option would keepint the force_remove knob but make the code be
error handling aware. In other words rather than ignoring offline error
simply propagate it up the chain and do not consider the offline. Would
that be acceptable?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-30 16:20         ` Michal Hocko
@ 2017-03-30 16:57           ` joeyli
  2017-03-30 20:15             ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: joeyli @ 2017-03-30 16:57 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Jiri Kosina, Rafael J. Wysocki, Toshi Kani, linux-mm, LKML, linux-api

On Thu, Mar 30, 2017 at 06:20:34PM +0200, Michal Hocko wrote:
> On Thu 30-03-17 10:47:52, Jiri Kosina wrote:
> > On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:
> > 
> > > > > > we have been chasing the following BUG() triggering during the memory
> > > > > > hotremove (remove_memory):
> > > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > > 				check_memblock_offlined_cb);
> > > > > > 	if (ret)
> > > > > > 		BUG();
> > > > > > 
> > > > > > and it took a while to learn that the issue is caused by
> > > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > > surprised to see such an option because at least for the memory hotplug
> > > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > > 
> > > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > > it completely. What would break in such a case?
> > > > > 
> > > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > > that code for several months.
> > > > > 
> > > > > I need some time to recall that.
> > > > 
> > > > Did you have any chance to look into this?
> > > 
> > > Well, yes.
> > > 
> > > It looks like that was added for some people who depended on the old behavior
> > > at that time.
> > > 
> > > I guess we can try to drop it and see what happpens. :-)
> > 
> > I'd agree with that; at the same time, udev rule should be submitted to 
> > systemd folks though. I don't think there is anything existing in this 
> > area yet (neither do distros ship their own udev rules for this AFAIK).
> 
> Another option would keepint the force_remove knob but make the code be
> error handling aware. In other words rather than ignoring offline error
> simply propagate it up the chain and do not consider the offline. Would
> that be acceptable?

Then the only difference between normal mode is that the force_remove mode
doesn't send out uevent for not-offline-yet container.

I vote to remove force_remove not just it ignored offline error and also
it's a acpi global knob that it affect all container devices in system.

Thanks a lot!
Joey Lee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-30 16:57           ` joeyli
@ 2017-03-30 20:15             ` Rafael J. Wysocki
  2017-03-31  0:00               ` joeyli
  0 siblings, 1 reply; 16+ messages in thread
From: Rafael J. Wysocki @ 2017-03-30 20:15 UTC (permalink / raw)
  To: joeyli, Michal Hocko; +Cc: Jiri Kosina, linux-mm, LKML, linux-api

On Friday, March 31, 2017 12:57:29 AM joeyli wrote:
> On Thu, Mar 30, 2017 at 06:20:34PM +0200, Michal Hocko wrote:
> > On Thu 30-03-17 10:47:52, Jiri Kosina wrote:
> > > On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:
> > > 
> > > > > > > we have been chasing the following BUG() triggering during the memory
> > > > > > > hotremove (remove_memory):
> > > > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > > > 				check_memblock_offlined_cb);
> > > > > > > 	if (ret)
> > > > > > > 		BUG();
> > > > > > > 
> > > > > > > and it took a while to learn that the issue is caused by
> > > > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > > > surprised to see such an option because at least for the memory hotplug
> > > > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > > > 
> > > > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > > > it completely. What would break in such a case?
> > > > > > 
> > > > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > > > that code for several months.
> > > > > > 
> > > > > > I need some time to recall that.
> > > > > 
> > > > > Did you have any chance to look into this?
> > > > 
> > > > Well, yes.
> > > > 
> > > > It looks like that was added for some people who depended on the old behavior
> > > > at that time.
> > > > 
> > > > I guess we can try to drop it and see what happpens. :-)
> > > 
> > > I'd agree with that; at the same time, udev rule should be submitted to 
> > > systemd folks though. I don't think there is anything existing in this 
> > > area yet (neither do distros ship their own udev rules for this AFAIK).
> > 
> > Another option would keepint the force_remove knob but make the code be
> > error handling aware. In other words rather than ignoring offline error
> > simply propagate it up the chain and do not consider the offline. Would
> > that be acceptable?
> 
> Then the only difference between normal mode is that the force_remove mode
> doesn't send out uevent for not-offline-yet container.

Which would be rather confusing.

The whole point of the thing was the "remove no matter what" behavior and
there's not much point in keeping it around without that.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-30 20:15             ` Rafael J. Wysocki
@ 2017-03-31  0:00               ` joeyli
  0 siblings, 0 replies; 16+ messages in thread
From: joeyli @ 2017-03-31  0:00 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Michal Hocko, Jiri Kosina, linux-mm, LKML, linux-api

On Thu, Mar 30, 2017 at 10:15:04PM +0200, Rafael J. Wysocki wrote:
> On Friday, March 31, 2017 12:57:29 AM joeyli wrote:
> > On Thu, Mar 30, 2017 at 06:20:34PM +0200, Michal Hocko wrote:
> > > On Thu 30-03-17 10:47:52, Jiri Kosina wrote:
> > > > On Tue, 28 Mar 2017, Rafael J. Wysocki wrote:
> > > > 
> > > > > > > > we have been chasing the following BUG() triggering during the memory
> > > > > > > > hotremove (remove_memory):
> > > > > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > > > > 				check_memblock_offlined_cb);
> > > > > > > > 	if (ret)
> > > > > > > > 		BUG();
> > > > > > > > 
> > > > > > > > and it took a while to learn that the issue is caused by
> > > > > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > > > > surprised to see such an option because at least for the memory hotplug
> > > > > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > > > > 
> > > > > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > > > > it completely. What would break in such a case?
> > > > > > > 
> > > > > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > > > > that code for several months.
> > > > > > > 
> > > > > > > I need some time to recall that.
> > > > > > 
> > > > > > Did you have any chance to look into this?
> > > > > 
> > > > > Well, yes.
> > > > > 
> > > > > It looks like that was added for some people who depended on the old behavior
> > > > > at that time.
> > > > > 
> > > > > I guess we can try to drop it and see what happpens. :-)
> > > > 
> > > > I'd agree with that; at the same time, udev rule should be submitted to 
> > > > systemd folks though. I don't think there is anything existing in this 
> > > > area yet (neither do distros ship their own udev rules for this AFAIK).
> > > 
> > > Another option would keepint the force_remove knob but make the code be
> > > error handling aware. In other words rather than ignoring offline error
> > > simply propagate it up the chain and do not consider the offline. Would
> > > that be acceptable?
> > 
> > Then the only difference between normal mode is that the force_remove mode
> > doesn't send out uevent for not-offline-yet container.
> 
> Which would be rather confusing.
> 
> The whole point of the thing was the "remove no matter what" behavior and
> there's not much point in keeping it around without that.
>

OK~ Understood.

Then back the "remove no matter waht" behavior, the point is
force_remove knob causes that acpi_scan_try_to_offline() ignored
the offline error of parent/children devices in the second pass:

drivers/acpi/scan.c
static int acpi_scan_try_to_offline(struct acpi_device *device)
{  
...
	/* first pass to call bus offline for parent */
	acpi_bus_offline(handle, 0, (void *)false, (void **)&errdev);
	/* if failed, then second pass */
	if (errdev) { 
		errdev = NULL;
		/* children devices, second pass */
		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
				    NULL, acpi_bus_offline, (void *)true,
				    (void **)&errdev);
		/* ignored children's offline error here */
		if (!errdev || acpi_force_hot_remove)
			/* ignored parent's offline error */
			acpi_bus_offline(handle, 0, (void *)true,
					 (void **)&errdev);

		/* will not set devices back to online */
		if (errdev && !acpi_force_hot_remove) { 
	...
	}
	return 0;
}

Then acpi_scan_try_to_offline() returns 0 and go to _remove_ stage, then
memory subsystem raises BUG() because the device offline state doesn't sync
with memory block state.


Thanks a lot!
Joey Lee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-28 15:22     ` Rafael J. Wysocki
  2017-03-30  8:47       ` Jiri Kosina
@ 2017-03-31  8:30       ` Michal Hocko
  2017-03-31 10:49         ` joeyli
  1 sibling, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-03-31  8:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Kani Toshimitsu, Jiri Kosina, joeyli, linux-mm, LKML, linux-api

[Fixed up email address of Toshimitsu - the email thread starts
http://lkml.kernel.org/r/20170320192938.GA11363@dhcp22.suse.cz]

On Tue 28-03-17 17:22:58, Rafael J. Wysocki wrote:
> On Tuesday, March 28, 2017 09:58:08 AM Michal Hocko wrote:
> > On Mon 20-03-17 22:24:42, Rafael J. Wysocki wrote:
> > > On Monday, March 20, 2017 03:29:39 PM Michal Hocko wrote:
> > > > Hi Rafael,
> > > 
> > > Hi,
> > > 
> > > > we have been chasing the following BUG() triggering during the memory
> > > > hotremove (remove_memory):
> > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > 				check_memblock_offlined_cb);
> > > > 	if (ret)
> > > > 		BUG();
> > > > 
> > > > and it took a while to learn that the issue is caused by
> > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > surprised to see such an option because at least for the memory hotplug
> > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > will lead to problematic behavior later like crash or a silent memory
> > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > 
> > > > I am wondering what was the motivation for introducing this behavior and
> > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > it completely. What would break in such a case?
> > > 
> > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > that code for several months.
> > > 
> > > I need some time to recall that.
> > 
> > Did you have any chance to look into this?
> 
> Well, yes.
> 
> It looks like that was added for some people who depended on the old behavior
> at that time.
> 
> I guess we can try to drop it and see what happpens. :-)

OK, so what do you think about the following? It is based on the current
linux-next and I have only compile tested it.
---
>From 6c5ae594ce938a1ae9b9718958401682bfab3980 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 31 Mar 2017 10:08:41 +0200
Subject: [PATCH] acpi: drop support for force_remove

/sys/firmware/acpi/hotplug/force_remove was presumably added to support
auto offlining in the past. This is, however, inherently dangerous for
some hotplugable resources like memory. The memory offlining fails when
the memory is still in use and cannot be dropped or migrated. If we
ignore the failure we are basically allowing for subtle memory
corruption or a crash.

We have actually noticed the later while hitting BUG() during the memory
hotremove (remove_memory):
	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
			check_memblock_offlined_cb);
	if (ret)
		BUG();

it took us quite non-trivial time realize that the customer had
force_remove enabled. Even if the BUG was removed here and we could
propagate the error up the call chain it wouldn't help at all because
then we would hit a crash or a memory corruption later and harder to
debug. So force_remove is unfixable for the memory hotremove. We haven't
checked other hotplugable resources to be prone to a similar problems.

Remove the force_remove functionality because it is not fixable currently.
Keep the sysfs file and report an error if somebody tries to enable it.
Encourage users to report about the missing functionality and work with
them with an alternative solution.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/ABI/obsolete/sysfs-firmware-acpi |  8 ++++++++
 Documentation/ABI/testing/sysfs-firmware-acpi  | 10 ----------
 drivers/acpi/internal.h                        |  2 --
 drivers/acpi/scan.c                            | 17 +++--------------
 drivers/acpi/sysfs.c                           |  9 +++++----
 5 files changed, 16 insertions(+), 30 deletions(-)
 create mode 100644 Documentation/ABI/obsolete/sysfs-firmware-acpi

diff --git a/Documentation/ABI/obsolete/sysfs-firmware-acpi b/Documentation/ABI/obsolete/sysfs-firmware-acpi
new file mode 100644
index 000000000000..6715a71bec3d
--- /dev/null
+++ b/Documentation/ABI/obsolete/sysfs-firmware-acpi
@@ -0,0 +1,8 @@
+What:		/sys/firmware/acpi/hotplug/force_remove
+Date:		Mar 2017
+Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+Description:
+		Since the force_remove is inherently broken and dangerous to
+		use for some hotplugable resources like memory (because ignoring
+		the offline failure might lead to memory corruption and crashes)
+		enabling this knob is not safe and thus unsupported.
diff --git a/Documentation/ABI/testing/sysfs-firmware-acpi b/Documentation/ABI/testing/sysfs-firmware-acpi
index c7fc72d4495c..613f42a9d5cd 100644
--- a/Documentation/ABI/testing/sysfs-firmware-acpi
+++ b/Documentation/ABI/testing/sysfs-firmware-acpi
@@ -44,16 +44,6 @@ Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
 		or 0 (unset).  Attempts to write any other values to it will
 		cause -EINVAL to be returned.
 
-What:		/sys/firmware/acpi/hotplug/force_remove
-Date:		May 2013
-Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-Description:
-		The number in this file (0 or 1) determines whether (1) or not
-		(0) the ACPI subsystem will allow devices to be hot-removed even
-		if they cannot be put offline gracefully (from the kernel's
-		viewpoint).  That number can be changed by writing a boolean
-		value to this file.
-
 What:		/sys/firmware/acpi/interrupts/
 Date:		February 2008
 Contact:	Len Brown <lenb@kernel.org>
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index f15900132912..66229ffa909b 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -65,8 +65,6 @@ static inline void acpi_cmos_rtc_init(void) {}
 #endif
 int acpi_rev_override_setup(char *str);
 
-extern bool acpi_force_hot_remove;
-
 void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
 				    const char *name);
 int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 192691880d55..a8d893fcedca 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -30,12 +30,6 @@ extern struct acpi_device *acpi_root;
 
 #define INVALID_ACPI_HANDLE	((acpi_handle)empty_zero_page)
 
-/*
- * If set, devices will be hot-removed even if they cannot be put offline
- * gracefully (from the kernel's standpoint).
- */
-bool acpi_force_hot_remove;
-
 static const char *dummy_hid = "device";
 
 static LIST_HEAD(acpi_dep_list);
@@ -170,9 +164,6 @@ static acpi_status acpi_bus_offline(acpi_handle handle, u32 lvl, void *data,
 			pn->put_online = false;
 		}
 		ret = device_offline(pn->dev);
-		if (acpi_force_hot_remove)
-			continue;
-
 		if (ret >= 0) {
 			pn->put_online = !ret;
 		} else {
@@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
 		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
 				    NULL, acpi_bus_offline, (void *)true,
 				    (void **)&errdev);
-		if (!errdev || acpi_force_hot_remove)
+		if (!errdev)
 			acpi_bus_offline(handle, 0, (void *)true,
 					 (void **)&errdev);
-
-		if (errdev && !acpi_force_hot_remove) {
+		else {
 			dev_warn(errdev, "Offline failed.\n");
 			acpi_bus_online(handle, 0, NULL, NULL);
 			acpi_walk_namespace(ACPI_TYPE_ANY, handle,
@@ -263,8 +253,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
 	unsigned long long sta;
 	acpi_status status;
 
-	if (device->handler && device->handler->hotplug.demand_offline
-	    && !acpi_force_hot_remove) {
+	if (device->handler && device->handler->hotplug.demand_offline) {
 		if (!acpi_scan_is_offline(device, true))
 			return -EBUSY;
 	} else {
diff --git a/drivers/acpi/sysfs.c b/drivers/acpi/sysfs.c
index cf05ae973381..1b5ee1e0e5a3 100644
--- a/drivers/acpi/sysfs.c
+++ b/drivers/acpi/sysfs.c
@@ -921,7 +921,7 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
 static ssize_t force_remove_show(struct kobject *kobj,
 				 struct kobj_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%d\n", !!acpi_force_hot_remove);
+	return sprintf(buf, "%d\n", 0);
 }
 
 static ssize_t force_remove_store(struct kobject *kobj,
@@ -935,9 +935,10 @@ static ssize_t force_remove_store(struct kobject *kobj,
 	if (ret < 0)
 		return ret;
 
-	lock_device_hotplug();
-	acpi_force_hot_remove = val;
-	unlock_device_hotplug();
+	if (val) {
+		pr_err("Enabling force_remove is not supported anymore. Please report to linux-acpi@vger.kernel.org if you depend on this functionality\n");
+		return -EINVAL;
+	}
 	return size;
 }
 
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-31  8:30       ` Michal Hocko
@ 2017-03-31 10:49         ` joeyli
  2017-03-31 10:55           ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: joeyli @ 2017-03-31 10:49 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rafael J. Wysocki, Kani Toshimitsu, Jiri Kosina, linux-mm, LKML,
	linux-api

Hi Michal,

On Fri, Mar 31, 2017 at 10:30:17AM +0200, Michal Hocko wrote:
> [Fixed up email address of Toshimitsu - the email thread starts
> http://lkml.kernel.org/r/20170320192938.GA11363@dhcp22.suse.cz]
> 
> On Tue 28-03-17 17:22:58, Rafael J. Wysocki wrote:
> > On Tuesday, March 28, 2017 09:58:08 AM Michal Hocko wrote:
> > > On Mon 20-03-17 22:24:42, Rafael J. Wysocki wrote:
> > > > On Monday, March 20, 2017 03:29:39 PM Michal Hocko wrote:
> > > > > Hi Rafael,
> > > > 
> > > > Hi,
> > > > 
> > > > > we have been chasing the following BUG() triggering during the memory
> > > > > hotremove (remove_memory):
> > > > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > > > 				check_memblock_offlined_cb);
> > > > > 	if (ret)
> > > > > 		BUG();
> > > > > 
> > > > > and it took a while to learn that the issue is caused by
> > > > > /sys/firmware/acpi/hotplug/force_remove being enabled. I was really
> > > > > surprised to see such an option because at least for the memory hotplug
> > > > > it cannot work at all. Memory hotplug fails when the memory is still
> > > > > in use. Even if we do not BUG() here enforcing the hotplug operation
> > > > > will lead to problematic behavior later like crash or a silent memory
> > > > > corruption if the memory gets onlined back and reused by somebody else.
> > > > > 
> > > > > I am wondering what was the motivation for introducing this behavior and
> > > > > whether there is a way to disallow it for memory hotplug. Or maybe drop
> > > > > it completely. What would break in such a case?
> > > > 
> > > > Honestly, I don't remember from the top of my head and I haven't looked at
> > > > that code for several months.
> > > > 
> > > > I need some time to recall that.
> > > 
> > > Did you have any chance to look into this?
> > 
> > Well, yes.
> > 
> > It looks like that was added for some people who depended on the old behavior
> > at that time.
> > 
> > I guess we can try to drop it and see what happpens. :-)
> 
> OK, so what do you think about the following? It is based on the current
> linux-next and I have only compile tested it.
> ---
> >From 6c5ae594ce938a1ae9b9718958401682bfab3980 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 31 Mar 2017 10:08:41 +0200
> Subject: [PATCH] acpi: drop support for force_remove
> 
> /sys/firmware/acpi/hotplug/force_remove was presumably added to support
> auto offlining in the past. This is, however, inherently dangerous for
> some hotplugable resources like memory. The memory offlining fails when
> the memory is still in use and cannot be dropped or migrated. If we
> ignore the failure we are basically allowing for subtle memory
> corruption or a crash.
> 
> We have actually noticed the later while hitting BUG() during the memory
> hotremove (remove_memory):
> 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> 			check_memblock_offlined_cb);
> 	if (ret)
> 		BUG();
> 
> it took us quite non-trivial time realize that the customer had
> force_remove enabled. Even if the BUG was removed here and we could
> propagate the error up the call chain it wouldn't help at all because
> then we would hit a crash or a memory corruption later and harder to
> debug. So force_remove is unfixable for the memory hotremove. We haven't
> checked other hotplugable resources to be prone to a similar problems.
> 
> Remove the force_remove functionality because it is not fixable currently.
> Keep the sysfs file and report an error if somebody tries to enable it.
> Encourage users to report about the missing functionality and work with
> them with an alternative solution.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>
> ---
>  Documentation/ABI/obsolete/sysfs-firmware-acpi |  8 ++++++++
>  Documentation/ABI/testing/sysfs-firmware-acpi  | 10 ----------
>  drivers/acpi/internal.h                        |  2 --
>  drivers/acpi/scan.c                            | 17 +++--------------
>  drivers/acpi/sysfs.c                           |  9 +++++----
>  5 files changed, 16 insertions(+), 30 deletions(-)
>  create mode 100644 Documentation/ABI/obsolete/sysfs-firmware-acpi
> 
> diff --git a/Documentation/ABI/obsolete/sysfs-firmware-acpi b/Documentation/ABI/obsolete/sysfs-firmware-acpi
> new file mode 100644
> index 000000000000..6715a71bec3d
> --- /dev/null
> +++ b/Documentation/ABI/obsolete/sysfs-firmware-acpi
> @@ -0,0 +1,8 @@
> +What:		/sys/firmware/acpi/hotplug/force_remove
> +Date:		Mar 2017
> +Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> +Description:
> +		Since the force_remove is inherently broken and dangerous to
> +		use for some hotplugable resources like memory (because ignoring
> +		the offline failure might lead to memory corruption and crashes)
> +		enabling this knob is not safe and thus unsupported.
> diff --git a/Documentation/ABI/testing/sysfs-firmware-acpi b/Documentation/ABI/testing/sysfs-firmware-acpi
> index c7fc72d4495c..613f42a9d5cd 100644
> --- a/Documentation/ABI/testing/sysfs-firmware-acpi
> +++ b/Documentation/ABI/testing/sysfs-firmware-acpi
> @@ -44,16 +44,6 @@ Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
>  		or 0 (unset).  Attempts to write any other values to it will
>  		cause -EINVAL to be returned.
>  
> -What:		/sys/firmware/acpi/hotplug/force_remove
> -Date:		May 2013
> -Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> -Description:
> -		The number in this file (0 or 1) determines whether (1) or not
> -		(0) the ACPI subsystem will allow devices to be hot-removed even
> -		if they cannot be put offline gracefully (from the kernel's
> -		viewpoint).  That number can be changed by writing a boolean
> -		value to this file.
> -
>  What:		/sys/firmware/acpi/interrupts/
>  Date:		February 2008
>  Contact:	Len Brown <lenb@kernel.org>
> diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
> index f15900132912..66229ffa909b 100644
> --- a/drivers/acpi/internal.h
> +++ b/drivers/acpi/internal.h
> @@ -65,8 +65,6 @@ static inline void acpi_cmos_rtc_init(void) {}
>  #endif
>  int acpi_rev_override_setup(char *str);
>  
> -extern bool acpi_force_hot_remove;
> -
>  void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
>  				    const char *name);
>  int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 192691880d55..a8d893fcedca 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -30,12 +30,6 @@ extern struct acpi_device *acpi_root;
>  
>  #define INVALID_ACPI_HANDLE	((acpi_handle)empty_zero_page)
>  
> -/*
> - * If set, devices will be hot-removed even if they cannot be put offline
> - * gracefully (from the kernel's standpoint).
> - */
> -bool acpi_force_hot_remove;
> -
>  static const char *dummy_hid = "device";
>  
>  static LIST_HEAD(acpi_dep_list);
> @@ -170,9 +164,6 @@ static acpi_status acpi_bus_offline(acpi_handle handle, u32 lvl, void *data,
>  			pn->put_online = false;
>  		}
>  		ret = device_offline(pn->dev);
> -		if (acpi_force_hot_remove)
> -			continue;
> -
>  		if (ret >= 0) {
>  			pn->put_online = !ret;
>  		} else {
> @@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
>  		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
>  				    NULL, acpi_bus_offline, (void *)true,
>  				    (void **)&errdev);
> -		if (!errdev || acpi_force_hot_remove)
> +		if (!errdev)
>  			acpi_bus_offline(handle, 0, (void *)true,
>  					 (void **)&errdev);
> -
> -		if (errdev && !acpi_force_hot_remove) {
> +		else {
              ^^^^^^^^^^^^^
Here should still checks the parent's errdev state then rollback
parent/children to online state:

-		if (errdev && !acpi_force_hot_remove) {
+		if (errdev) {

>  			dev_warn(errdev, "Offline failed.\n");
>  			acpi_bus_online(handle, 0, NULL, NULL);
>  			acpi_walk_namespace(ACPI_TYPE_ANY, handle,
[...snip]

Thanks a lot!
Joey Lee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-31 10:49         ` joeyli
@ 2017-03-31 10:55           ` Michal Hocko
  2017-03-31 11:55             ` joeyli
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-03-31 10:55 UTC (permalink / raw)
  To: joeyli
  Cc: Rafael J. Wysocki, Kani Toshimitsu, Jiri Kosina, linux-mm, LKML,
	linux-api

On Fri 31-03-17 18:49:05, Joey Lee wrote:
> Hi Michal,
> 
> On Fri, Mar 31, 2017 at 10:30:17AM +0200, Michal Hocko wrote:
[...]
> > @@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
> >  		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
> >  				    NULL, acpi_bus_offline, (void *)true,
> >  				    (void **)&errdev);
> > -		if (!errdev || acpi_force_hot_remove)
> > +		if (!errdev)
> >  			acpi_bus_offline(handle, 0, (void *)true,
> >  					 (void **)&errdev);
> > -
> > -		if (errdev && !acpi_force_hot_remove) {
> > +		else {
>               ^^^^^^^^^^^^^
> Here should still checks the parent's errdev state then rollback
> parent/children to online state:
> 
> -		if (errdev && !acpi_force_hot_remove) {
> +		if (errdev) {

You are right, I have missed that acpi_bus_offline modifies errdev.
Thanks for spotting that! Updated patch is below.
---
>From 8df0abd29988ffb52b6df52407b96d6015861bb7 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Fri, 31 Mar 2017 10:08:41 +0200
Subject: [PATCH] acpi: drop support for force_remove

/sys/firmware/acpi/hotplug/force_remove was presumably added to support
auto offlining in the past. This is, however, inherently dangerous for
some hotplugable resources like memory. The memory offlining fails when
the memory is still in use and cannot be dropped or migrated. If we
ignore the failure we are basically allowing for subtle memory
corruption or a crash.

We have actually noticed the later while hitting BUG() during the memory
hotremove (remove_memory):
	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
			check_memblock_offlined_cb);
	if (ret)
		BUG();

it took us quite non-trivial time realize that the customer had
force_remove enabled. Even if the BUG was removed here and we could
propagate the error up the call chain it wouldn't help at all because
then we would hit a crash or a memory corruption later and harder to
debug. So force_remove is unfixable for the memory hotremove. We haven't
checked other hotplugable resources to be prone to a similar problems.

Remove the force_remove functionality because it is not fixable currently.
Keep the sysfs file and report an error if somebody tries to enable it.
Encourage users to report about the missing functionality and work with
them with an alternative solution.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/ABI/obsolete/sysfs-firmware-acpi |  8 ++++++++
 Documentation/ABI/testing/sysfs-firmware-acpi  | 10 ----------
 drivers/acpi/internal.h                        |  2 --
 drivers/acpi/scan.c                            | 16 +++-------------
 drivers/acpi/sysfs.c                           |  9 +++++----
 5 files changed, 16 insertions(+), 29 deletions(-)
 create mode 100644 Documentation/ABI/obsolete/sysfs-firmware-acpi

diff --git a/Documentation/ABI/obsolete/sysfs-firmware-acpi b/Documentation/ABI/obsolete/sysfs-firmware-acpi
new file mode 100644
index 000000000000..6715a71bec3d
--- /dev/null
+++ b/Documentation/ABI/obsolete/sysfs-firmware-acpi
@@ -0,0 +1,8 @@
+What:		/sys/firmware/acpi/hotplug/force_remove
+Date:		Mar 2017
+Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
+Description:
+		Since the force_remove is inherently broken and dangerous to
+		use for some hotplugable resources like memory (because ignoring
+		the offline failure might lead to memory corruption and crashes)
+		enabling this knob is not safe and thus unsupported.
diff --git a/Documentation/ABI/testing/sysfs-firmware-acpi b/Documentation/ABI/testing/sysfs-firmware-acpi
index c7fc72d4495c..613f42a9d5cd 100644
--- a/Documentation/ABI/testing/sysfs-firmware-acpi
+++ b/Documentation/ABI/testing/sysfs-firmware-acpi
@@ -44,16 +44,6 @@ Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
 		or 0 (unset).  Attempts to write any other values to it will
 		cause -EINVAL to be returned.
 
-What:		/sys/firmware/acpi/hotplug/force_remove
-Date:		May 2013
-Contact:	Rafael J. Wysocki <rafael.j.wysocki@intel.com>
-Description:
-		The number in this file (0 or 1) determines whether (1) or not
-		(0) the ACPI subsystem will allow devices to be hot-removed even
-		if they cannot be put offline gracefully (from the kernel's
-		viewpoint).  That number can be changed by writing a boolean
-		value to this file.
-
 What:		/sys/firmware/acpi/interrupts/
 Date:		February 2008
 Contact:	Len Brown <lenb@kernel.org>
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index f15900132912..66229ffa909b 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -65,8 +65,6 @@ static inline void acpi_cmos_rtc_init(void) {}
 #endif
 int acpi_rev_override_setup(char *str);
 
-extern bool acpi_force_hot_remove;
-
 void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
 				    const char *name);
 int acpi_scan_add_handler_with_hotplug(struct acpi_scan_handler *handler,
diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 192691880d55..e2080b6e54aa 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -30,12 +30,6 @@ extern struct acpi_device *acpi_root;
 
 #define INVALID_ACPI_HANDLE	((acpi_handle)empty_zero_page)
 
-/*
- * If set, devices will be hot-removed even if they cannot be put offline
- * gracefully (from the kernel's standpoint).
- */
-bool acpi_force_hot_remove;
-
 static const char *dummy_hid = "device";
 
 static LIST_HEAD(acpi_dep_list);
@@ -170,9 +164,6 @@ static acpi_status acpi_bus_offline(acpi_handle handle, u32 lvl, void *data,
 			pn->put_online = false;
 		}
 		ret = device_offline(pn->dev);
-		if (acpi_force_hot_remove)
-			continue;
-
 		if (ret >= 0) {
 			pn->put_online = !ret;
 		} else {
@@ -241,11 +232,11 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
 		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
 				    NULL, acpi_bus_offline, (void *)true,
 				    (void **)&errdev);
-		if (!errdev || acpi_force_hot_remove)
+		if (!errdev)
 			acpi_bus_offline(handle, 0, (void *)true,
 					 (void **)&errdev);
 
-		if (errdev && !acpi_force_hot_remove) {
+		if (errdev) {
 			dev_warn(errdev, "Offline failed.\n");
 			acpi_bus_online(handle, 0, NULL, NULL);
 			acpi_walk_namespace(ACPI_TYPE_ANY, handle,
@@ -263,8 +254,7 @@ static int acpi_scan_hot_remove(struct acpi_device *device)
 	unsigned long long sta;
 	acpi_status status;
 
-	if (device->handler && device->handler->hotplug.demand_offline
-	    && !acpi_force_hot_remove) {
+	if (device->handler && device->handler->hotplug.demand_offline) {
 		if (!acpi_scan_is_offline(device, true))
 			return -EBUSY;
 	} else {
diff --git a/drivers/acpi/sysfs.c b/drivers/acpi/sysfs.c
index cf05ae973381..1b5ee1e0e5a3 100644
--- a/drivers/acpi/sysfs.c
+++ b/drivers/acpi/sysfs.c
@@ -921,7 +921,7 @@ void acpi_sysfs_add_hotplug_profile(struct acpi_hotplug_profile *hotplug,
 static ssize_t force_remove_show(struct kobject *kobj,
 				 struct kobj_attribute *attr, char *buf)
 {
-	return sprintf(buf, "%d\n", !!acpi_force_hot_remove);
+	return sprintf(buf, "%d\n", 0);
 }
 
 static ssize_t force_remove_store(struct kobject *kobj,
@@ -935,9 +935,10 @@ static ssize_t force_remove_store(struct kobject *kobj,
 	if (ret < 0)
 		return ret;
 
-	lock_device_hotplug();
-	acpi_force_hot_remove = val;
-	unlock_device_hotplug();
+	if (val) {
+		pr_err("Enabling force_remove is not supported anymore. Please report to linux-acpi@vger.kernel.org if you depend on this functionality\n");
+		return -EINVAL;
+	}
 	return size;
 }
 
-- 
2.11.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-31 10:55           ` Michal Hocko
@ 2017-03-31 11:55             ` joeyli
  2017-03-31 12:02               ` Michal Hocko
  0 siblings, 1 reply; 16+ messages in thread
From: joeyli @ 2017-03-31 11:55 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Rafael J. Wysocki, Kani Toshimitsu, Jiri Kosina, linux-mm, LKML,
	linux-api

On Fri, Mar 31, 2017 at 12:55:05PM +0200, Michal Hocko wrote:
> On Fri 31-03-17 18:49:05, Joey Lee wrote:
> > Hi Michal,
> > 
> > On Fri, Mar 31, 2017 at 10:30:17AM +0200, Michal Hocko wrote:
> [...]
> > > @@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
> > >  		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
> > >  				    NULL, acpi_bus_offline, (void *)true,
> > >  				    (void **)&errdev);
> > > -		if (!errdev || acpi_force_hot_remove)
> > > +		if (!errdev)
> > >  			acpi_bus_offline(handle, 0, (void *)true,
> > >  					 (void **)&errdev);
> > > -
> > > -		if (errdev && !acpi_force_hot_remove) {
> > > +		else {
> >               ^^^^^^^^^^^^^
> > Here should still checks the parent's errdev state then rollback
> > parent/children to online state:
> > 
> > -		if (errdev && !acpi_force_hot_remove) {
> > +		if (errdev) {
> 
> You are right, I have missed that acpi_bus_offline modifies errdev.
> Thanks for spotting that! Updated patch is below.
> ---
> >From 8df0abd29988ffb52b6df52407b96d6015861bb7 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@suse.com>
> Date: Fri, 31 Mar 2017 10:08:41 +0200
> Subject: [PATCH] acpi: drop support for force_remove
> 
> /sys/firmware/acpi/hotplug/force_remove was presumably added to support
> auto offlining in the past. This is, however, inherently dangerous for
> some hotplugable resources like memory. The memory offlining fails when
> the memory is still in use and cannot be dropped or migrated. If we
> ignore the failure we are basically allowing for subtle memory
> corruption or a crash.
> 
> We have actually noticed the later while hitting BUG() during the memory
> hotremove (remove_memory):
> 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> 			check_memblock_offlined_cb);
> 	if (ret)
> 		BUG();
> 
> it took us quite non-trivial time realize that the customer had
> force_remove enabled. Even if the BUG was removed here and we could
> propagate the error up the call chain it wouldn't help at all because
> then we would hit a crash or a memory corruption later and harder to
> debug. So force_remove is unfixable for the memory hotremove. We haven't
> checked other hotplugable resources to be prone to a similar problems.
> 
> Remove the force_remove functionality because it is not fixable currently.
> Keep the sysfs file and report an error if somebody tries to enable it.
> Encourage users to report about the missing functionality and work with
> them with an alternative solution.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.com>

This patch is good to me. Please feel free to add:

Reviewed-by: Lee, Chun-Yi <jlee@suse.com>

Regards
Joey Lee

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-31 11:55             ` joeyli
@ 2017-03-31 12:02               ` Michal Hocko
  2017-03-31 22:35                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 16+ messages in thread
From: Michal Hocko @ 2017-03-31 12:02 UTC (permalink / raw)
  To: joeyli
  Cc: Rafael J. Wysocki, Kani Toshimitsu, Jiri Kosina, linux-mm, LKML,
	linux-api

On Fri 31-03-17 19:55:30, Joey Lee wrote:
> On Fri, Mar 31, 2017 at 12:55:05PM +0200, Michal Hocko wrote:
> > On Fri 31-03-17 18:49:05, Joey Lee wrote:
> > > Hi Michal,
> > > 
> > > On Fri, Mar 31, 2017 at 10:30:17AM +0200, Michal Hocko wrote:
> > [...]
> > > > @@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
> > > >  		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
> > > >  				    NULL, acpi_bus_offline, (void *)true,
> > > >  				    (void **)&errdev);
> > > > -		if (!errdev || acpi_force_hot_remove)
> > > > +		if (!errdev)
> > > >  			acpi_bus_offline(handle, 0, (void *)true,
> > > >  					 (void **)&errdev);
> > > > -
> > > > -		if (errdev && !acpi_force_hot_remove) {
> > > > +		else {
> > >               ^^^^^^^^^^^^^
> > > Here should still checks the parent's errdev state then rollback
> > > parent/children to online state:
> > > 
> > > -		if (errdev && !acpi_force_hot_remove) {
> > > +		if (errdev) {
> > 
> > You are right, I have missed that acpi_bus_offline modifies errdev.
> > Thanks for spotting that! Updated patch is below.
> > ---
> > >From 8df0abd29988ffb52b6df52407b96d6015861bb7 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Fri, 31 Mar 2017 10:08:41 +0200
> > Subject: [PATCH] acpi: drop support for force_remove
> > 
> > /sys/firmware/acpi/hotplug/force_remove was presumably added to support
> > auto offlining in the past. This is, however, inherently dangerous for
> > some hotplugable resources like memory. The memory offlining fails when
> > the memory is still in use and cannot be dropped or migrated. If we
> > ignore the failure we are basically allowing for subtle memory
> > corruption or a crash.
> > 
> > We have actually noticed the later while hitting BUG() during the memory
> > hotremove (remove_memory):
> > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > 			check_memblock_offlined_cb);
> > 	if (ret)
> > 		BUG();
> > 
> > it took us quite non-trivial time realize that the customer had
> > force_remove enabled. Even if the BUG was removed here and we could
> > propagate the error up the call chain it wouldn't help at all because
> > then we would hit a crash or a memory corruption later and harder to
> > debug. So force_remove is unfixable for the memory hotremove. We haven't
> > checked other hotplugable resources to be prone to a similar problems.
> > 
> > Remove the force_remove functionality because it is not fixable currently.
> > Keep the sysfs file and report an error if somebody tries to enable it.
> > Encourage users to report about the missing functionality and work with
> > them with an alternative solution.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> This patch is good to me. Please feel free to add:
> 
> Reviewed-by: Lee, Chun-Yi <jlee@suse.com>

Thanks for the review Joey!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: memory hotplug and force_remove
  2017-03-31 12:02               ` Michal Hocko
@ 2017-03-31 22:35                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 16+ messages in thread
From: Rafael J. Wysocki @ 2017-03-31 22:35 UTC (permalink / raw)
  To: Michal Hocko
  Cc: joeyli, Kani Toshimitsu, Jiri Kosina, linux-mm, LKML, linux-api

On Friday, March 31, 2017 02:02:36 PM Michal Hocko wrote:
> On Fri 31-03-17 19:55:30, Joey Lee wrote:
> > On Fri, Mar 31, 2017 at 12:55:05PM +0200, Michal Hocko wrote:
> > > On Fri 31-03-17 18:49:05, Joey Lee wrote:
> > > > Hi Michal,
> > > > 
> > > > On Fri, Mar 31, 2017 at 10:30:17AM +0200, Michal Hocko wrote:
> > > [...]
> > > > > @@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
> > > > >  		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
> > > > >  				    NULL, acpi_bus_offline, (void *)true,
> > > > >  				    (void **)&errdev);
> > > > > -		if (!errdev || acpi_force_hot_remove)
> > > > > +		if (!errdev)
> > > > >  			acpi_bus_offline(handle, 0, (void *)true,
> > > > >  					 (void **)&errdev);
> > > > > -
> > > > > -		if (errdev && !acpi_force_hot_remove) {
> > > > > +		else {
> > > >               ^^^^^^^^^^^^^
> > > > Here should still checks the parent's errdev state then rollback
> > > > parent/children to online state:
> > > > 
> > > > -		if (errdev && !acpi_force_hot_remove) {
> > > > +		if (errdev) {
> > > 
> > > You are right, I have missed that acpi_bus_offline modifies errdev.
> > > Thanks for spotting that! Updated patch is below.
> > > ---
> > > >From 8df0abd29988ffb52b6df52407b96d6015861bb7 Mon Sep 17 00:00:00 2001
> > > From: Michal Hocko <mhocko@suse.com>
> > > Date: Fri, 31 Mar 2017 10:08:41 +0200
> > > Subject: [PATCH] acpi: drop support for force_remove
> > > 
> > > /sys/firmware/acpi/hotplug/force_remove was presumably added to support
> > > auto offlining in the past. This is, however, inherently dangerous for
> > > some hotplugable resources like memory. The memory offlining fails when
> > > the memory is still in use and cannot be dropped or migrated. If we
> > > ignore the failure we are basically allowing for subtle memory
> > > corruption or a crash.
> > > 
> > > We have actually noticed the later while hitting BUG() during the memory
> > > hotremove (remove_memory):
> > > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > > 			check_memblock_offlined_cb);
> > > 	if (ret)
> > > 		BUG();
> > > 
> > > it took us quite non-trivial time realize that the customer had
> > > force_remove enabled. Even if the BUG was removed here and we could
> > > propagate the error up the call chain it wouldn't help at all because
> > > then we would hit a crash or a memory corruption later and harder to
> > > debug. So force_remove is unfixable for the memory hotremove. We haven't
> > > checked other hotplugable resources to be prone to a similar problems.
> > > 
> > > Remove the force_remove functionality because it is not fixable currently.
> > > Keep the sysfs file and report an error if somebody tries to enable it.
> > > Encourage users to report about the missing functionality and work with
> > > them with an alternative solution.
> > > 
> > > Signed-off-by: Michal Hocko <mhocko@suse.com>
> > 
> > This patch is good to me. Please feel free to add:
> > 
> > Reviewed-by: Lee, Chun-Yi <jlee@suse.com>
> 
> Thanks for the review Joey!

Can you please resend it with a CC to linux-acpi to give the people on that list
a chance to speak up if they have any concerns?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2017-03-31 22:41 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-20 19:29 memory hotplug and force_remove Michal Hocko
2017-03-20 21:24 ` Rafael J. Wysocki
2017-03-21 16:13   ` joeyli
2017-03-28  7:58   ` Michal Hocko
2017-03-28 15:22     ` Rafael J. Wysocki
2017-03-30  8:47       ` Jiri Kosina
2017-03-30 16:20         ` Michal Hocko
2017-03-30 16:57           ` joeyli
2017-03-30 20:15             ` Rafael J. Wysocki
2017-03-31  0:00               ` joeyli
2017-03-31  8:30       ` Michal Hocko
2017-03-31 10:49         ` joeyli
2017-03-31 10:55           ` Michal Hocko
2017-03-31 11:55             ` joeyli
2017-03-31 12:02               ` Michal Hocko
2017-03-31 22:35                 ` Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).