linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
@ 2012-11-23 17:50 Vasilis Liaskovitis
  2012-11-23 17:50 ` [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops Vasilis Liaskovitis
                   ` (3 more replies)
  0 siblings, 4 replies; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-23 17:50 UTC (permalink / raw)
  To: linux-acpi, isimatu.yasuaki, wency
  Cc: rjw, lenb, toshi.kani, gregkh, linux-kernel, linux-mm,
	Vasilis Liaskovitis

As discussed in https://patchwork.kernel.org/patch/1581581/
the driver core remove function needs to always succeed. This means we need
to know that the device can be successfully removed before acpi_bus_trim / 
acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
or SCI-initiated eject of memory devices fail e.g with:
echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

since the ACPI core goes ahead and ejects the device regardless of whether the
the memory is still in use or not.

For this reason a new acpi_device operation called prepare_remove is introduced.
This operation should be registered for acpi devices whose removal (from kernel
perspective) can fail.  Memory devices fall in this category.

acpi_bus_remove() is changed to handle removal in 2 steps:
- preparation for removal i.e. perform part of removal that can fail. Should
  succeed for device and all its children.
- if above step was successfull, proceed to actual device removal

With this patchset, only acpi memory devices use the new prepare_remove
device operation. The actual memory removal (VM-related offline and other memory
cleanups) is moved to prepare_remove. The old remove operation just cleans up
the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
haven't tested yet with an ACPI container which contains memory devices.

Note that unbinding the acpi driver from a memory device with:
echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

will no longer try to remove the memory. This is in compliance with normal
unbind driver core semantics, see the discussion in v2 of this patchset:
https://lkml.org/lkml/2012/11/16/649

After a successful unbind of the driver:
- OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
return -ENODEV on missing driver.
- SCI ejects of the memory device also cannot proceed, as they will also get
a "driver data is NULL" error.
So the memory can continue to be used safely after unbind.

Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
Comments welcome.

v2->v3:
- remove driver core changes. Only acpi core changes needed. Unbind semantics
follow driver core rules. Unbind does not remove memory.
- new patch to set enable bit in order to proceed with ejects on driver
re-binding scenario.

v1->v2:
- new patch to introduce bus_type prepare_remove callback. Needed to prepare
removal on driver unbinding from device-driver core.
- v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not require
argument changes.

Vasilis Liaskovitis (3):
  acpi: Introduce prepare_remove operation in acpi_device_ops
  acpi_memhotplug: Add prepare_remove operation
  acpi_memhotplug: Allow eject to proceed on rebind scenario

 drivers/acpi/acpi_memhotplug.c |   21 +++++++++++++++++----
 drivers/acpi/scan.c            |    9 ++++++++-
 include/acpi/acpi_bus.h        |    2 ++
 3 files changed, 27 insertions(+), 5 deletions(-)

-- 
1.7.9


^ permalink raw reply	[flat|nested] 92+ messages in thread

* [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops
  2012-11-23 17:50 [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Vasilis Liaskovitis
@ 2012-11-23 17:50 ` Vasilis Liaskovitis
  2012-11-27  0:10   ` Toshi Kani
  2012-11-23 17:50 ` [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation Vasilis Liaskovitis
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-23 17:50 UTC (permalink / raw)
  To: linux-acpi, isimatu.yasuaki, wency
  Cc: rjw, lenb, toshi.kani, gregkh, linux-kernel, linux-mm,
	Vasilis Liaskovitis

This function should be registered for devices that need to execute some
non-acpi related action in order to be safely removed. If this function
returns zero, the acpi core can continue with removing the device.

Make acpi_bus_remove call the device-specific prepare_remove callback before
removing the device. If prepare_remove fails, the removal is aborted.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 drivers/acpi/scan.c     |    9 ++++++++-
 include/acpi/acpi_bus.h |    2 ++
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index 8c4ac6d..e1c1d5d 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)
 
 static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 {
+	int ret = 0;
 	if (!dev)
 		return -EINVAL;
 
 	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
+
+	if (dev->driver && dev->driver->ops.prepare_remove)
+		ret = dev->driver->ops.prepare_remove(dev);
+	if (ret)
+		return ret;
 	device_release_driver(&dev->dev);
 
 	if (!rmdevice)
@@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
 				err = acpi_bus_remove(child, rmdevice);
 			else
 				err = acpi_bus_remove(child, 1);
-
+			if (err)
+				return err;
 			continue;
 		}
 
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 7ced5dc..9d94a55 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
 typedef int (*acpi_op_bind) (struct acpi_device * device);
 typedef int (*acpi_op_unbind) (struct acpi_device * device);
 typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
+typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);
 
 struct acpi_bus_ops {
 	u32 acpi_op_add:1;
@@ -107,6 +108,7 @@ struct acpi_device_ops {
 	acpi_op_bind bind;
 	acpi_op_unbind unbind;
 	acpi_op_notify notify;
+	acpi_op_prepare_remove prepare_remove;
 };
 
 #define ACPI_DRIVER_ALL_NOTIFY_EVENTS	0x1	/* system AND device events */
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation
  2012-11-23 17:50 [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Vasilis Liaskovitis
  2012-11-23 17:50 ` [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops Vasilis Liaskovitis
@ 2012-11-23 17:50 ` Vasilis Liaskovitis
  2012-11-24 16:23   ` Wen Congyang
  2012-11-23 17:50 ` [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario Vasilis Liaskovitis
  2012-11-28 11:05 ` [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Hanjun Guo
  3 siblings, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-23 17:50 UTC (permalink / raw)
  To: linux-acpi, isimatu.yasuaki, wency
  Cc: rjw, lenb, toshi.kani, gregkh, linux-kernel, linux-mm,
	Vasilis Liaskovitis

Offlining and removal of memory is now done in the prepare_remove callback,
not in the remove callback.

The prepare_remove callback will be called when trying to remove a memory device
with the following ways:

1. send eject request by SCI
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

Note that unbinding the acpi driver from a memory device with:
echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

will no longer try to remove the memory. This is in compliance with normal
unbind driver core semantics, see the discussion in v2 of this patchset:
https://lkml.org/lkml/2012/11/16/649

After a successful unbind of the driver:
- OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
return -ENODEV on missing driver.
- SCI ejects of the memory device also cannot proceed, as they will also get
a "driver data is NULL" error.
So the memory can continue to be used safely after unbind.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 drivers/acpi/acpi_memhotplug.c |   18 ++++++++++++++++--
 1 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index eb30e5a..d0cfbd9 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -55,6 +55,7 @@ MODULE_LICENSE("GPL");
 
 static int acpi_memory_device_add(struct acpi_device *device);
 static int acpi_memory_device_remove(struct acpi_device *device, int type);
+static int acpi_memory_device_prepare_remove(struct acpi_device *device);
 
 static const struct acpi_device_id memory_device_ids[] = {
 	{ACPI_MEMORY_DEVICE_HID, 0},
@@ -69,6 +70,7 @@ static struct acpi_driver acpi_memory_device_driver = {
 	.ops = {
 		.add = acpi_memory_device_add,
 		.remove = acpi_memory_device_remove,
+		.prepare_remove = acpi_memory_device_prepare_remove,
 		},
 };
 
@@ -448,6 +450,20 @@ static int acpi_memory_device_add(struct acpi_device *device)
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
 	struct acpi_memory_device *mem_device = NULL;
+
+	if (!device || !acpi_driver_data(device))
+		return -EINVAL;
+
+	mem_device = acpi_driver_data(device);
+
+	acpi_memory_device_free(mem_device);
+
+	return 0;
+}
+
+static int acpi_memory_device_prepare_remove(struct acpi_device *device)
+{
+	struct acpi_memory_device *mem_device = NULL;
 	int result;
 
 	if (!device || !acpi_driver_data(device))
@@ -459,8 +475,6 @@ static int acpi_memory_device_remove(struct acpi_device *device, int type)
 	if (result)
 		return result;
 
-	acpi_memory_device_free(mem_device);
-
 	return 0;
 }
 
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-23 17:50 [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Vasilis Liaskovitis
  2012-11-23 17:50 ` [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops Vasilis Liaskovitis
  2012-11-23 17:50 ` [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation Vasilis Liaskovitis
@ 2012-11-23 17:50 ` Vasilis Liaskovitis
  2012-11-24 16:20   ` Wen Congyang
  2012-11-28 11:05 ` [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Hanjun Guo
  3 siblings, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-23 17:50 UTC (permalink / raw)
  To: linux-acpi, isimatu.yasuaki, wency
  Cc: rjw, lenb, toshi.kani, gregkh, linux-kernel, linux-mm,
	Vasilis Liaskovitis

Consider the following sequence of operations for a hotplugged memory device:

1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
2. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/bind
3. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

The driver is successfully re-bound to the device in step 2. However step 3 will
not attempt to remove the memory. This is because the acpi_memory_info enabled
bit for the newly bound driver has not been set to 1. This bit needs to be set
in the case where the memory is already used by the kernel (add_memory returns
-EEXIST)

Setting the enabled bit in this case (in acpi_memory_enable_device) makes the
driver function properly after a rebind of the driver i.e. eject operation
attempts to remove memory after a successful rebind.

I am not sure if this breaks some other usage of the enabled bit (see commit
65479472). When is it possible for the memory to be in use by the kernel but
not managed by the acpi driver, apart from a driver unbind scenario?

Perhaps the patch is not needed, depending on expected semantics of re-binding.
Is the newly bound driver supposed to manage the device, if it was earlier
managed by the same driver?

This patch is only specific to this scenario, and can be dropped from the patch
series if needed.

Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
---
 drivers/acpi/acpi_memhotplug.c |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index d0cfbd9..0562cb4 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -271,12 +271,11 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
 			continue;
 		}
 
-		if (!result)
-			info->enabled = 1;
 		/*
 		 * Add num_enable even if add_memory() returns -EEXIST, so the
 		 * device is bound to this driver.
 		 */
+		info->enabled = 1;
 		num_enabled++;
 	}
 	if (!num_enabled) {
-- 
1.7.9


^ permalink raw reply related	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-23 17:50 ` [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario Vasilis Liaskovitis
@ 2012-11-24 16:20   ` Wen Congyang
  2012-11-26  8:36     ` Vasilis Liaskovitis
  0 siblings, 1 reply; 92+ messages in thread
From: Wen Congyang @ 2012-11-24 16:20 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: linux-acpi, isimatu.yasuaki, wency, rjw, lenb, toshi.kani,
	gregkh, linux-kernel, linux-mm

At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> Consider the following sequence of operations for a hotplugged memory device:
> 
> 1. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 2. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/bind
> 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> The driver is successfully re-bound to the device in step 2. However step 3 will
> not attempt to remove the memory. This is because the acpi_memory_info enabled
> bit for the newly bound driver has not been set to 1. This bit needs to be set
> in the case where the memory is already used by the kernel (add_memory returns
> -EEXIST)

Hmm, I think the reason is that we don't offline/remove memory when
unbinding it
from the driver. I have sent a patch to fix this problem, and this patch
is in
pm tree now. With this patch, we will offline/remove memory when
unbinding it from
the drriver.

Consider the following sequence of operations for a hotplugged memory
device:

1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject

If we don't offline/remove the memory, we have no chance to do it in
step 2. After
step2, the memory is used by the kernel, but we have powered off it. It
is very
dangerous.

So this patch is unnecessary now.

Thanks
Wen Congyang

> 
> Setting the enabled bit in this case (in acpi_memory_enable_device) makes the
> driver function properly after a rebind of the driver i.e. eject operation
> attempts to remove memory after a successful rebind.
> 
> I am not sure if this breaks some other usage of the enabled bit (see commit
> 65479472). When is it possible for the memory to be in use by the kernel but
> not managed by the acpi driver, apart from a driver unbind scenario?
> 
> Perhaps the patch is not needed, depending on expected semantics of re-binding.
> Is the newly bound driver supposed to manage the device, if it was earlier
> managed by the same driver?
> 
> This patch is only specific to this scenario, and can be dropped from the patch
> series if needed.
> 
> Signed-off-by: Vasilis Liaskovitis<vasilis.liaskovitis@profitbricks.com>
> ---
>   drivers/acpi/acpi_memhotplug.c |    3 +--
>   1 files changed, 1 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index d0cfbd9..0562cb4 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -271,12 +271,11 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
>   			continue;
>   		}
> 
> -		if (!result)
> -			info->enabled = 1;
>   		/*
>   		 * Add num_enable even if add_memory() returns -EEXIST, so the
>   		 * device is bound to this driver.
>   		 */
> +		info->enabled = 1;
>   		num_enabled++;
>   	}
>   	if (!num_enabled) {

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation
  2012-11-23 17:50 ` [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation Vasilis Liaskovitis
@ 2012-11-24 16:23   ` Wen Congyang
  0 siblings, 0 replies; 92+ messages in thread
From: Wen Congyang @ 2012-11-24 16:23 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: linux-acpi, isimatu.yasuaki, wency, rjw, lenb, toshi.kani,
	gregkh, linux-kernel, linux-mm

At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> Offlining and removal of memory is now done in the prepare_remove callback,
> not in the remove callback.
> 
> The prepare_remove callback will be called when trying to remove a memory device
> with the following ways:
> 
> 1. send eject request by SCI
> 2. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> Note that unbinding the acpi driver from a memory device with:
> echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 
> will no longer try to remove the memory. This is in compliance with normal
> unbind driver core semantics, see the discussion in v2 of this patchset:
> https://lkml.org/lkml/2012/11/16/649

If we don't remove it when unbinding it, it may cause kernel panicked.

I have explained in another mail.

Thanks
Wen Congyang

> 
> After a successful unbind of the driver:
> - OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
> return -ENODEV on missing driver.
> - SCI ejects of the memory device also cannot proceed, as they will also get
> a "driver data is NULL" error.
> So the memory can continue to be used safely after unbind.
> 
> Signed-off-by: Vasilis Liaskovitis<vasilis.liaskovitis@profitbricks.com>
> ---
>   drivers/acpi/acpi_memhotplug.c |   18 ++++++++++++++++--
>   1 files changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
> index eb30e5a..d0cfbd9 100644
> --- a/drivers/acpi/acpi_memhotplug.c
> +++ b/drivers/acpi/acpi_memhotplug.c
> @@ -55,6 +55,7 @@ MODULE_LICENSE("GPL");
> 
>   static int acpi_memory_device_add(struct acpi_device *device);
>   static int acpi_memory_device_remove(struct acpi_device *device, int type);
> +static int acpi_memory_device_prepare_remove(struct acpi_device *device);
> 
>   static const struct acpi_device_id memory_device_ids[] = {
>   	{ACPI_MEMORY_DEVICE_HID, 0},
> @@ -69,6 +70,7 @@ static struct acpi_driver acpi_memory_device_driver = {
>   	.ops = {
>   		.add = acpi_memory_device_add,
>   		.remove = acpi_memory_device_remove,
> +		.prepare_remove = acpi_memory_device_prepare_remove,
>   		},
>   };
> 
> @@ -448,6 +450,20 @@ static int acpi_memory_device_add(struct acpi_device *device)
>   static int acpi_memory_device_remove(struct acpi_device *device, int type)
>   {
>   	struct acpi_memory_device *mem_device = NULL;
> +
> +	if (!device || !acpi_driver_data(device))
> +		return -EINVAL;
> +
> +	mem_device = acpi_driver_data(device);
> +
> +	acpi_memory_device_free(mem_device);
> +
> +	return 0;
> +}
> +
> +static int acpi_memory_device_prepare_remove(struct acpi_device *device)
> +{
> +	struct acpi_memory_device *mem_device = NULL;
>   	int result;
> 
>   	if (!device || !acpi_driver_data(device))
> @@ -459,8 +475,6 @@ static int acpi_memory_device_remove(struct acpi_device *device, int type)
>   	if (result)
>   		return result;
> 
> -	acpi_memory_device_free(mem_device);
> -
>   	return 0;
>   }
> 

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-24 16:20   ` Wen Congyang
@ 2012-11-26  8:36     ` Vasilis Liaskovitis
  2012-11-26  9:11       ` Wen Congyang
  0 siblings, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-26  8:36 UTC (permalink / raw)
  To: Wen Congyang
  Cc: linux-acpi, isimatu.yasuaki, wency, rjw, lenb, toshi.kani,
	gregkh, linux-kernel, linux-mm

On Sun, Nov 25, 2012 at 12:20:47AM +0800, Wen Congyang wrote:
> At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
> > Consider the following sequence of operations for a hotplugged memory device:
> > 
> > 1. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > 2. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/bind
> > 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
> > 
> > The driver is successfully re-bound to the device in step 2. However step 3 will
> > not attempt to remove the memory. This is because the acpi_memory_info enabled
> > bit for the newly bound driver has not been set to 1. This bit needs to be set
> > in the case where the memory is already used by the kernel (add_memory returns
> > -EEXIST)
> 
> Hmm, I think the reason is that we don't offline/remove memory when
> unbinding it
> from the driver. I have sent a patch to fix this problem, and this patch
> is in
> pm tree now. With this patch, we will offline/remove memory when
> unbinding it from
> the drriver.

ok. Which patch is this? Does it require driver-core changes?

> 
> Consider the following sequence of operations for a hotplugged memory
> device:
> 
> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> If we don't offline/remove the memory, we have no chance to do it in
> step 2. After
> step2, the memory is used by the kernel, but we have powered off it. It
> is very
> dangerous.

How does power-off happen after unbind? acpi_eject_store checks for existing
driver before taking any action:

#ifndef FORCE_EJECT
	if (acpi_device->driver == NULL) {
		ret = -ENODEV;
		goto err;
	}
#endif

FORCE_EJECT is not defined afaict, so the function returns without scheduling
acpi_bus_hot_remove_device. Is there another code path that calls power-off?

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-26  8:36     ` Vasilis Liaskovitis
@ 2012-11-26  9:11       ` Wen Congyang
  2012-11-27  0:19         ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Wen Congyang @ 2012-11-26  9:11 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Wen Congyang, linux-acpi, isimatu.yasuaki, rjw, lenb, toshi.kani,
	gregkh, linux-kernel, linux-mm

At 11/26/2012 04:36 PM, Vasilis Liaskovitis Wrote:
> On Sun, Nov 25, 2012 at 12:20:47AM +0800, Wen Congyang wrote:
>> At 2012/11/24 1:50, Vasilis Liaskovitis Wrote:
>>> Consider the following sequence of operations for a hotplugged memory device:
>>>
>>> 1. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>>> 2. echo "PNP0C80:XX">  /sys/bus/acpi/drivers/acpi_memhotplug/bind
>>> 3. echo 1>/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> The driver is successfully re-bound to the device in step 2. However step 3 will
>>> not attempt to remove the memory. This is because the acpi_memory_info enabled
>>> bit for the newly bound driver has not been set to 1. This bit needs to be set
>>> in the case where the memory is already used by the kernel (add_memory returns
>>> -EEXIST)
>>
>> Hmm, I think the reason is that we don't offline/remove memory when
>> unbinding it
>> from the driver. I have sent a patch to fix this problem, and this patch
>> is in
>> pm tree now. With this patch, we will offline/remove memory when
>> unbinding it from
>> the drriver.
> 
> ok. Which patch is this? Does it require driver-core changes?

https://lkml.org/lkml/2012/11/15/21

Patch 1-6 is in pm tree now.

> 
>>
>> Consider the following sequence of operations for a hotplugged memory
>> device:
>>
>> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
>> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>
>> If we don't offline/remove the memory, we have no chance to do it in
>> step 2. After
>> step2, the memory is used by the kernel, but we have powered off it. It
>> is very
>> dangerous.
> 
> How does power-off happen after unbind? acpi_eject_store checks for existing
> driver before taking any action:
> 
> #ifndef FORCE_EJECT
> 	if (acpi_device->driver == NULL) {
> 		ret = -ENODEV;
> 		goto err;
> 	}
> #endif
> 
> FORCE_EJECT is not defined afaict, so the function returns without scheduling
> acpi_bus_hot_remove_device. Is there another code path that calls power-off?

Consider the following case:

We hotremove the memory device by SCI and unbind it from the driver at the same time:

CPUa                                                  CPUb
acpi_memory_device_notify()
                                       unbind it from the driver
    acpi_bus_hot_remove_device()

Thanks
Wen Congyang

> 
> thanks,
> 
> - Vasilis
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops
  2012-11-23 17:50 ` [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops Vasilis Liaskovitis
@ 2012-11-27  0:10   ` Toshi Kani
  2012-11-27 18:36     ` Vasilis Liaskovitis
  2012-11-27 23:18     ` Rafael J. Wysocki
  0 siblings, 2 replies; 92+ messages in thread
From: Toshi Kani @ 2012-11-27  0:10 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: linux-acpi, isimatu.yasuaki, wency, rjw, lenb, gregkh,
	linux-kernel, linux-mm

On Fri, 2012-11-23 at 18:50 +0100, Vasilis Liaskovitis wrote:
> This function should be registered for devices that need to execute some
> non-acpi related action in order to be safely removed. If this function
> returns zero, the acpi core can continue with removing the device.
> 
> Make acpi_bus_remove call the device-specific prepare_remove callback before
> removing the device. If prepare_remove fails, the removal is aborted.
> 
> Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> ---
>  drivers/acpi/scan.c     |    9 ++++++++-
>  include/acpi/acpi_bus.h |    2 ++
>  2 files changed, 10 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index 8c4ac6d..e1c1d5d 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)
>  
>  static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
>  {
> +	int ret = 0;
>  	if (!dev)
>  		return -EINVAL;
>  
>  	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> +
> +	if (dev->driver && dev->driver->ops.prepare_remove)
> +		ret = dev->driver->ops.prepare_remove(dev);
> +	if (ret)
> +		return ret;

Hi Vasilis,

The above code should be like below. Then you do not need to initialize
ret, either.  Please also add some comments explaining about
prepare_remove can fail, but remove cannot.

	if (dev->driver && dev->driver->ops.prepare_remove) {
		ret = dev->driver->ops.prepare_remove(dev);
		if (ret)
			return ret;
	}

>  	device_release_driver(&dev->dev);
>  
>  	if (!rmdevice)
> @@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
>  				err = acpi_bus_remove(child, rmdevice);
>  			else
>  				err = acpi_bus_remove(child, 1);
> -
> +			if (err)
> +				return err;
>  			continue;
>  		}
>  
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index 7ced5dc..9d94a55 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
>  typedef int (*acpi_op_bind) (struct acpi_device * device);
>  typedef int (*acpi_op_unbind) (struct acpi_device * device);
>  typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
> +typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);
>  
>  struct acpi_bus_ops {
>  	u32 acpi_op_add:1;
> @@ -107,6 +108,7 @@ struct acpi_device_ops {
>  	acpi_op_bind bind;
>  	acpi_op_unbind unbind;
>  	acpi_op_notify notify;
> +	acpi_op_prepare_remove prepare_remove;

I'd prefer pre_remove, which indicates this interface is called before
remove.  prepare_remove sounds as if it only performs preparation, which
may be misleading.

BTW, Rafael mentioned we should avoid extending ACPI driver's
interface...  But I do not have other idea, either.


Thanks,
-Toshi



>  };
>  
>  #define ACPI_DRIVER_ALL_NOTIFY_EVENTS	0x1	/* system AND device events */



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-26  9:11       ` Wen Congyang
@ 2012-11-27  0:19         ` Toshi Kani
  2012-11-27 18:32           ` Vasilis Liaskovitis
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-27  0:19 UTC (permalink / raw)
  To: Wen Congyang
  Cc: Vasilis Liaskovitis, Wen Congyang, linux-acpi, isimatu.yasuaki,
	rjw, lenb, gregkh, linux-kernel, linux-mm

> >> Consider the following sequence of operations for a hotplugged memory
> >> device:
> >>
> >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>
> >> If we don't offline/remove the memory, we have no chance to do it in
> >> step 2. After
> >> step2, the memory is used by the kernel, but we have powered off it. It
> >> is very
> >> dangerous.
> > 
> > How does power-off happen after unbind? acpi_eject_store checks for existing
> > driver before taking any action:
> > 
> > #ifndef FORCE_EJECT
> > 	if (acpi_device->driver == NULL) {
> > 		ret = -ENODEV;
> > 		goto err;
> > 	}
> > #endif
> > 
> > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> 
> Consider the following case:
> 
> We hotremove the memory device by SCI and unbind it from the driver at the same time:
> 
> CPUa                                                  CPUb
> acpi_memory_device_notify()
>                                        unbind it from the driver
>     acpi_bus_hot_remove_device()

Can we make acpi_bus_remove() to fail if a given acpi_device is not
bound with a driver?  If so, can we make the unbind operation to perform
unbind only?

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-27  0:19         ` Toshi Kani
@ 2012-11-27 18:32           ` Vasilis Liaskovitis
  2012-11-27 22:03             ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-27 18:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Wen Congyang, Wen Congyang, linux-acpi, isimatu.yasuaki, rjw,
	lenb, gregkh, linux-kernel, linux-mm

On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > >> Consider the following sequence of operations for a hotplugged memory
> > >> device:
> > >>
> > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > >>
> > >> If we don't offline/remove the memory, we have no chance to do it in
> > >> step 2. After
> > >> step2, the memory is used by the kernel, but we have powered off it. It
> > >> is very
> > >> dangerous.
> > > 
> > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > driver before taking any action:
> > > 
> > > #ifndef FORCE_EJECT
> > > 	if (acpi_device->driver == NULL) {
> > > 		ret = -ENODEV;
> > > 		goto err;
> > > 	}
> > > #endif
> > > 
> > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > 
> > Consider the following case:
> > 
> > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > 
> > CPUa                                                  CPUb
> > acpi_memory_device_notify()
> >                                        unbind it from the driver
> >     acpi_bus_hot_remove_device()
> 
> Can we make acpi_bus_remove() to fail if a given acpi_device is not
> bound with a driver?  If so, can we make the unbind operation to perform
> unbind only?

acpi_bus_remove_device could check if the driver is present, and return -ENODEV
if it's not present (dev->driver == NULL).

But there can still be a race between an eject and an unbind operation happening
simultaneously. This seems like a general problem to me i.e. not specific to an
acpi memory device. How do we ensure an eject does not race with a driver unbind
for other acpi devices?

Is there a per-device lock in acpi-core or device-core that can prevent this from
happening? Driver core does a device_lock(dev) on all operations, but this is
probably not grabbed on SCI-initiated acpi ejects.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops
  2012-11-27  0:10   ` Toshi Kani
@ 2012-11-27 18:36     ` Vasilis Liaskovitis
  2012-11-27 23:18     ` Rafael J. Wysocki
  1 sibling, 0 replies; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-27 18:36 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, isimatu.yasuaki, wency, rjw, lenb, gregkh,
	linux-kernel, linux-mm

Hi Toshi,

On Mon, Nov 26, 2012 at 05:10:21PM -0700, Toshi Kani wrote:
> On Fri, 2012-11-23 at 18:50 +0100, Vasilis Liaskovitis wrote:
> > This function should be registered for devices that need to execute some
> > non-acpi related action in order to be safely removed. If this function
> > returns zero, the acpi core can continue with removing the device.
> > 
> > Make acpi_bus_remove call the device-specific prepare_remove callback before
> > removing the device. If prepare_remove fails, the removal is aborted.
> > 
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > ---
> >  drivers/acpi/scan.c     |    9 ++++++++-
> >  include/acpi/acpi_bus.h |    2 ++
> >  2 files changed, 10 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 8c4ac6d..e1c1d5d 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)
> >  
> >  static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> >  {
> > +	int ret = 0;
> >  	if (!dev)
> >  		return -EINVAL;
> >  
> >  	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> > +
> > +	if (dev->driver && dev->driver->ops.prepare_remove)
> > +		ret = dev->driver->ops.prepare_remove(dev);
> > +	if (ret)
> > +		return ret;
> 
> Hi Vasilis,
> 
> The above code should be like below. Then you do not need to initialize
> ret, either.  Please also add some comments explaining about
> prepare_remove can fail, but remove cannot.
> 
> 	if (dev->driver && dev->driver->ops.prepare_remove) {
> 		ret = dev->driver->ops.prepare_remove(dev);
> 		if (ret)
> 			return ret;
> 	}

right.

> 
> >  	device_release_driver(&dev->dev);
> >  
> >  	if (!rmdevice)
> > @@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
> >  				err = acpi_bus_remove(child, rmdevice);
> >  			else
> >  				err = acpi_bus_remove(child, 1);
> > -
> > +			if (err)
> > +				return err;
> >  			continue;
> >  		}
> >  
> > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> > index 7ced5dc..9d94a55 100644
> > --- a/include/acpi/acpi_bus.h
> > +++ b/include/acpi/acpi_bus.h
> > @@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
> >  typedef int (*acpi_op_bind) (struct acpi_device * device);
> >  typedef int (*acpi_op_unbind) (struct acpi_device * device);
> >  typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
> > +typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);
> >  
> >  struct acpi_bus_ops {
> >  	u32 acpi_op_add:1;
> > @@ -107,6 +108,7 @@ struct acpi_device_ops {
> >  	acpi_op_bind bind;
> >  	acpi_op_unbind unbind;
> >  	acpi_op_notify notify;
> > +	acpi_op_prepare_remove prepare_remove;
> 
> I'd prefer pre_remove, which indicates this interface is called before
> remove.  prepare_remove sounds as if it only performs preparation, which
> may be misleading.

ok, I 'll use pre_remove from now on.

> 
> BTW, Rafael mentioned we should avoid extending ACPI driver's
> interface...  But I do not have other idea, either.

If we reach agreement that this is the approach we want, I 'll resend the series.

thanks,

- Vasilis


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-27 18:32           ` Vasilis Liaskovitis
@ 2012-11-27 22:03             ` Toshi Kani
  2012-11-27 23:41               ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-27 22:03 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Wen Congyang, Wen Congyang, linux-acpi, isimatu.yasuaki, rjw,
	lenb, gregkh, linux-kernel, linux-mm

On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > >> Consider the following sequence of operations for a hotplugged memory
> > > >> device:
> > > >>
> > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > >>
> > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > >> step 2. After
> > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > >> is very
> > > >> dangerous.
> > > > 
> > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > driver before taking any action:
> > > > 
> > > > #ifndef FORCE_EJECT
> > > > 	if (acpi_device->driver == NULL) {
> > > > 		ret = -ENODEV;
> > > > 		goto err;
> > > > 	}
> > > > #endif
> > > > 
> > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > 
> > > Consider the following case:
> > > 
> > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > 
> > > CPUa                                                  CPUb
> > > acpi_memory_device_notify()
> > >                                        unbind it from the driver
> > >     acpi_bus_hot_remove_device()
> > 
> > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > bound with a driver?  If so, can we make the unbind operation to perform
> > unbind only?
> 
> acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> if it's not present (dev->driver == NULL).
> 
> But there can still be a race between an eject and an unbind operation happening
> simultaneously. This seems like a general problem to me i.e. not specific to an
> acpi memory device. How do we ensure an eject does not race with a driver unbind
> for other acpi devices?
> 
> Is there a per-device lock in acpi-core or device-core that can prevent this from
> happening? Driver core does a device_lock(dev) on all operations, but this is
> probably not grabbed on SCI-initiated acpi ejects.

Since driver_unbind() calls device_lock(dev->parent) before calling
device_release_driver(), I am wondering if we can call
device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
(i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
parent lock is otherwise released after device_release_driver() is done.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops
  2012-11-27  0:10   ` Toshi Kani
  2012-11-27 18:36     ` Vasilis Liaskovitis
@ 2012-11-27 23:18     ` Rafael J. Wysocki
  1 sibling, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-27 23:18 UTC (permalink / raw)
  To: linux-acpi
  Cc: Toshi Kani, Vasilis Liaskovitis, isimatu.yasuaki, wency, lenb,
	gregkh, linux-kernel, linux-mm

On Monday, November 26, 2012 05:10:21 PM Toshi Kani wrote:
> On Fri, 2012-11-23 at 18:50 +0100, Vasilis Liaskovitis wrote:
> > This function should be registered for devices that need to execute some
> > non-acpi related action in order to be safely removed. If this function
> > returns zero, the acpi core can continue with removing the device.
> > 
> > Make acpi_bus_remove call the device-specific prepare_remove callback before
> > removing the device. If prepare_remove fails, the removal is aborted.
> > 
> > Signed-off-by: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
> > ---
> >  drivers/acpi/scan.c     |    9 ++++++++-
> >  include/acpi/acpi_bus.h |    2 ++
> >  2 files changed, 10 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> > index 8c4ac6d..e1c1d5d 100644
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -1380,10 +1380,16 @@ static int acpi_device_set_context(struct acpi_device *device)
> >  
> >  static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
> >  {
> > +	int ret = 0;
> >  	if (!dev)
> >  		return -EINVAL;
> >  
> >  	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> > +
> > +	if (dev->driver && dev->driver->ops.prepare_remove)
> > +		ret = dev->driver->ops.prepare_remove(dev);
> > +	if (ret)
> > +		return ret;
> 
> Hi Vasilis,
> 
> The above code should be like below. Then you do not need to initialize
> ret, either.  Please also add some comments explaining about
> prepare_remove can fail, but remove cannot.
> 
> 	if (dev->driver && dev->driver->ops.prepare_remove) {
> 		ret = dev->driver->ops.prepare_remove(dev);
> 		if (ret)
> 			return ret;
> 	}
> 
> >  	device_release_driver(&dev->dev);
> >  
> >  	if (!rmdevice)
> > @@ -1702,7 +1708,8 @@ int acpi_bus_trim(struct acpi_device *start, int rmdevice)
> >  				err = acpi_bus_remove(child, rmdevice);
> >  			else
> >  				err = acpi_bus_remove(child, 1);
> > -
> > +			if (err)
> > +				return err;
> >  			continue;
> >  		}
> >  
> > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> > index 7ced5dc..9d94a55 100644
> > --- a/include/acpi/acpi_bus.h
> > +++ b/include/acpi/acpi_bus.h
> > @@ -94,6 +94,7 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
> >  typedef int (*acpi_op_bind) (struct acpi_device * device);
> >  typedef int (*acpi_op_unbind) (struct acpi_device * device);
> >  typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
> > +typedef int (*acpi_op_prepare_remove) (struct acpi_device *device);
> >  
> >  struct acpi_bus_ops {
> >  	u32 acpi_op_add:1;
> > @@ -107,6 +108,7 @@ struct acpi_device_ops {
> >  	acpi_op_bind bind;
> >  	acpi_op_unbind unbind;
> >  	acpi_op_notify notify;
> > +	acpi_op_prepare_remove prepare_remove;
> 
> I'd prefer pre_remove, which indicates this interface is called before
> remove.  prepare_remove sounds as if it only performs preparation, which
> may be misleading.
> 
> BTW, Rafael mentioned we should avoid extending ACPI driver's
> interface...  But I do not have other idea, either.

It's fine in this particular case, since it looks like it would be difficult
to do that differently with what we have at the moment.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-27 22:03             ` Toshi Kani
@ 2012-11-27 23:41               ` Rafael J. Wysocki
  2012-11-28 16:01                 ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-27 23:41 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > >> device:
> > > > >>
> > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > >>
> > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > >> step 2. After
> > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > >> is very
> > > > >> dangerous.
> > > > > 
> > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > driver before taking any action:
> > > > > 
> > > > > #ifndef FORCE_EJECT
> > > > > 	if (acpi_device->driver == NULL) {
> > > > > 		ret = -ENODEV;
> > > > > 		goto err;
> > > > > 	}
> > > > > #endif
> > > > > 
> > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > 
> > > > Consider the following case:
> > > > 
> > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > 
> > > > CPUa                                                  CPUb
> > > > acpi_memory_device_notify()
> > > >                                        unbind it from the driver
> > > >     acpi_bus_hot_remove_device()
> > > 
> > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > bound with a driver?  If so, can we make the unbind operation to perform
> > > unbind only?
> > 
> > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > if it's not present (dev->driver == NULL).
> > 
> > But there can still be a race between an eject and an unbind operation happening
> > simultaneously. This seems like a general problem to me i.e. not specific to an
> > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > for other acpi devices?
> > 
> > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > happening? Driver core does a device_lock(dev) on all operations, but this is
> > probably not grabbed on SCI-initiated acpi ejects.
> 
> Since driver_unbind() calls device_lock(dev->parent) before calling
> device_release_driver(), I am wondering if we can call
> device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> parent lock is otherwise released after device_release_driver() is done.

I would be careful.  You may introduce some subtle locking-related issues
this way.

Besides, there may be an alternative approach to all this.  For example,
what if we don't remove struct device objects on eject?  The ACPI handles
associated with them don't go away in that case after all, do they?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-23 17:50 [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Vasilis Liaskovitis
                   ` (2 preceding siblings ...)
  2012-11-23 17:50 ` [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario Vasilis Liaskovitis
@ 2012-11-28 11:05 ` Hanjun Guo
  2012-11-28 18:41   ` Toshi Kani
  3 siblings, 1 reply; 92+ messages in thread
From: Hanjun Guo @ 2012-11-28 11:05 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: linux-acpi, isimatu.yasuaki, wency, rjw, lenb, toshi.kani,
	gregkh, linux-kernel, linux-mm, Tang Chen

On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> As discussed in https://patchwork.kernel.org/patch/1581581/
> the driver core remove function needs to always succeed. This means we need
> to know that the device can be successfully removed before acpi_bus_trim / 
> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> or SCI-initiated eject of memory devices fail e.g with:
> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> 
> since the ACPI core goes ahead and ejects the device regardless of whether the
> the memory is still in use or not.
> 
> For this reason a new acpi_device operation called prepare_remove is introduced.
> This operation should be registered for acpi devices whose removal (from kernel
> perspective) can fail.  Memory devices fall in this category.
> 
> acpi_bus_remove() is changed to handle removal in 2 steps:
> - preparation for removal i.e. perform part of removal that can fail. Should
>   succeed for device and all its children.
> - if above step was successfull, proceed to actual device removal

Hi Vasilis,
We met the same problem when we doing computer node hotplug, It is a good idea
to introduce prepare_remove before actual device removal.

I think we could do more in prepare_remove, such as rollback. In most cases, we can
offline most of memory sections except kernel used pages now, should we rollback
and online the memory sections when prepare_remove failed ?

As you may know, the ACPI based hotplug framework we are working on already addressed
this problem, and the way we slove this problem is a bit like yours.

We introduce hp_ops in struct acpi_device_ops:
struct acpi_device_ops {
	acpi_op_add add;
	acpi_op_remove remove;
	acpi_op_start start;
	acpi_op_bind bind;
	acpi_op_unbind unbind;
	acpi_op_notify notify;
#ifdef	CONFIG_ACPI_HOTPLUG
	struct acpihp_dev_ops *hp_ops;
#endif	/* CONFIG_ACPI_HOTPLUG */
};

in hp_ops, we divide the prepare_remove into six small steps, that is:
1) pre_release(): optional step to mark device going to be removed/busy
2) release(): reclaim device from running system
3) post_release(): rollback if cancelled by user or error happened
4) pre_unconfigure(): optional step to solve possible dependency issue
5) unconfigure(): remove devices from running system
6) post_unconfigure(): free resources used by devices

In this way, we can easily rollback if error happens.
How do you think of this solution, any suggestion ? I think we can achieve
a better way for sharing ideas. :)

Thanks
Hanjun Guo

> 
> With this patchset, only acpi memory devices use the new prepare_remove
> device operation. The actual memory removal (VM-related offline and other memory
> cleanups) is moved to prepare_remove. The old remove operation just cleans up
> the acpi structures. Directly ejecting PNP0C80 memory devices works safely. I
> haven't tested yet with an ACPI container which contains memory devices.
> 
> Note that unbinding the acpi driver from a memory device with:
> echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 
> will no longer try to remove the memory. This is in compliance with normal
> unbind driver core semantics, see the discussion in v2 of this patchset:
> https://lkml.org/lkml/2012/11/16/649
> 
> After a successful unbind of the driver:
> - OSPM ejects of the memory device cannot proceed, as acpi_eject_store will
> return -ENODEV on missing driver.
> - SCI ejects of the memory device also cannot proceed, as they will also get
> a "driver data is NULL" error.
> So the memory can continue to be used safely after unbind.
> 
> Patchset based on Rafael's linux-pm/linux-next (commit 78c38651).
> Comments welcome.
> 
> v2->v3:
> - remove driver core changes. Only acpi core changes needed. Unbind semantics
> follow driver core rules. Unbind does not remove memory.
> - new patch to set enable bit in order to proceed with ejects on driver
> re-binding scenario.
> 
> v1->v2:
> - new patch to introduce bus_type prepare_remove callback. Needed to prepare
> removal on driver unbinding from device-driver core.
> - v1 patches 1 and 2 simplified and merged in one. acpi_bus_trim does not require
> argument changes.
> 
> Vasilis Liaskovitis (3):
>   acpi: Introduce prepare_remove operation in acpi_device_ops
>   acpi_memhotplug: Add prepare_remove operation
>   acpi_memhotplug: Allow eject to proceed on rebind scenario
> 
>  drivers/acpi/acpi_memhotplug.c |   21 +++++++++++++++++----
>  drivers/acpi/scan.c            |    9 ++++++++-
>  include/acpi/acpi_bus.h        |    2 ++
>  3 files changed, 27 insertions(+), 5 deletions(-)
> 



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-27 23:41               ` Rafael J. Wysocki
@ 2012-11-28 16:01                 ` Toshi Kani
  2012-11-28 18:40                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-28 16:01 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wed, 2012-11-28 at 00:41 +0100, Rafael J. Wysocki wrote:
> On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> > On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > > >> device:
> > > > > >>
> > > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > >>
> > > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > > >> step 2. After
> > > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > > >> is very
> > > > > >> dangerous.
> > > > > > 
> > > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > > driver before taking any action:
> > > > > > 
> > > > > > #ifndef FORCE_EJECT
> > > > > > 	if (acpi_device->driver == NULL) {
> > > > > > 		ret = -ENODEV;
> > > > > > 		goto err;
> > > > > > 	}
> > > > > > #endif
> > > > > > 
> > > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > > 
> > > > > Consider the following case:
> > > > > 
> > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > 
> > > > > CPUa                                                  CPUb
> > > > > acpi_memory_device_notify()
> > > > >                                        unbind it from the driver
> > > > >     acpi_bus_hot_remove_device()
> > > > 
> > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > unbind only?
> > > 
> > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > if it's not present (dev->driver == NULL).
> > > 
> > > But there can still be a race between an eject and an unbind operation happening
> > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > for other acpi devices?
> > > 
> > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > probably not grabbed on SCI-initiated acpi ejects.
> > 
> > Since driver_unbind() calls device_lock(dev->parent) before calling
> > device_release_driver(), I am wondering if we can call
> > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > parent lock is otherwise released after device_release_driver() is done.
> 
> I would be careful.  You may introduce some subtle locking-related issues
> this way.

Right.  This requires careful inspection and testing.  As far as the
locking is concerned, I am not keen on using fine grained locking for
hot-plug.  It is much simpler and solid if we serialize such operations.

> Besides, there may be an alternative approach to all this.  For example,
> what if we don't remove struct device objects on eject?  The ACPI handles
> associated with them don't go away in that case after all, do they?

Umm...  Sorry, I am not getting your point.  The issue is that we need
to be able to fail a request when memory range cannot be off-lined.
Otherwise, we end up ejecting online memory range.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 16:01                 ` Toshi Kani
@ 2012-11-28 18:40                   ` Rafael J. Wysocki
  2012-11-28 21:02                     ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 18:40 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wednesday, November 28, 2012 09:01:13 AM Toshi Kani wrote:
> On Wed, 2012-11-28 at 00:41 +0100, Rafael J. Wysocki wrote:
> > On Tuesday, November 27, 2012 03:03:47 PM Toshi Kani wrote:
> > > On Tue, 2012-11-27 at 19:32 +0100, Vasilis Liaskovitis wrote:
> > > > On Mon, Nov 26, 2012 at 05:19:01PM -0700, Toshi Kani wrote:
> > > > > > >> Consider the following sequence of operations for a hotplugged memory
> > > > > > >> device:
> > > > > > >>
> > > > > > >> 1. echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > > >> 2. echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > > >>
> > > > > > >> If we don't offline/remove the memory, we have no chance to do it in
> > > > > > >> step 2. After
> > > > > > >> step2, the memory is used by the kernel, but we have powered off it. It
> > > > > > >> is very
> > > > > > >> dangerous.
> > > > > > > 
> > > > > > > How does power-off happen after unbind? acpi_eject_store checks for existing
> > > > > > > driver before taking any action:
> > > > > > > 
> > > > > > > #ifndef FORCE_EJECT
> > > > > > > 	if (acpi_device->driver == NULL) {
> > > > > > > 		ret = -ENODEV;
> > > > > > > 		goto err;
> > > > > > > 	}
> > > > > > > #endif
> > > > > > > 
> > > > > > > FORCE_EJECT is not defined afaict, so the function returns without scheduling
> > > > > > > acpi_bus_hot_remove_device. Is there another code path that calls power-off?
> > > > > > 
> > > > > > Consider the following case:
> > > > > > 
> > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > 
> > > > > > CPUa                                                  CPUb
> > > > > > acpi_memory_device_notify()
> > > > > >                                        unbind it from the driver
> > > > > >     acpi_bus_hot_remove_device()
> > > > > 
> > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > unbind only?
> > > > 
> > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > if it's not present (dev->driver == NULL).
> > > > 
> > > > But there can still be a race between an eject and an unbind operation happening
> > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > for other acpi devices?
> > > > 
> > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > probably not grabbed on SCI-initiated acpi ejects.
> > > 
> > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > device_release_driver(), I am wondering if we can call
> > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > parent lock is otherwise released after device_release_driver() is done.
> > 
> > I would be careful.  You may introduce some subtle locking-related issues
> > this way.
> 
> Right.  This requires careful inspection and testing.  As far as the
> locking is concerned, I am not keen on using fine grained locking for
> hot-plug.  It is much simpler and solid if we serialize such operations.
> 
> > Besides, there may be an alternative approach to all this.  For example,
> > what if we don't remove struct device objects on eject?  The ACPI handles
> > associated with them don't go away in that case after all, do they?
> 
> Umm...  Sorry, I am not getting your point.  The issue is that we need
> to be able to fail a request when memory range cannot be off-lined.
> Otherwise, we end up ejecting online memory range.

Yes, this is the major one.  The minor issue, however, is a race condition
between unbinding a driver from a device and removing the device if I
understand it correctly.  Which will go away automatically if the device is
not removed in the first place.  Or so I would think. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-28 11:05 ` [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Hanjun Guo
@ 2012-11-28 18:41   ` Toshi Kani
  2012-11-29  4:48     ` Hanjun Guo
                       ` (2 more replies)
  0 siblings, 3 replies; 92+ messages in thread
From: Toshi Kani @ 2012-11-28 18:41 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > As discussed in https://patchwork.kernel.org/patch/1581581/
> > the driver core remove function needs to always succeed. This means we need
> > to know that the device can be successfully removed before acpi_bus_trim / 
> > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > or SCI-initiated eject of memory devices fail e.g with:
> > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > 
> > since the ACPI core goes ahead and ejects the device regardless of whether the
> > the memory is still in use or not.
> > 
> > For this reason a new acpi_device operation called prepare_remove is introduced.
> > This operation should be registered for acpi devices whose removal (from kernel
> > perspective) can fail.  Memory devices fall in this category.
> > 
> > acpi_bus_remove() is changed to handle removal in 2 steps:
> > - preparation for removal i.e. perform part of removal that can fail. Should
> >   succeed for device and all its children.
> > - if above step was successfull, proceed to actual device removal
> 
> Hi Vasilis,
> We met the same problem when we doing computer node hotplug, It is a good idea
> to introduce prepare_remove before actual device removal.
> 
> I think we could do more in prepare_remove, such as rollback. In most cases, we can
> offline most of memory sections except kernel used pages now, should we rollback
> and online the memory sections when prepare_remove failed ?

I think hot-plug operation should have all-or-nothing semantics.  That
is, an operation should either complete successfully, or rollback to the
original state.

> As you may know, the ACPI based hotplug framework we are working on already addressed
> this problem, and the way we slove this problem is a bit like yours.
> 
> We introduce hp_ops in struct acpi_device_ops:
> struct acpi_device_ops {
> 	acpi_op_add add;
> 	acpi_op_remove remove;
> 	acpi_op_start start;
> 	acpi_op_bind bind;
> 	acpi_op_unbind unbind;
> 	acpi_op_notify notify;
> #ifdef	CONFIG_ACPI_HOTPLUG
> 	struct acpihp_dev_ops *hp_ops;
> #endif	/* CONFIG_ACPI_HOTPLUG */
> };
> 
> in hp_ops, we divide the prepare_remove into six small steps, that is:
> 1) pre_release(): optional step to mark device going to be removed/busy
> 2) release(): reclaim device from running system
> 3) post_release(): rollback if cancelled by user or error happened
> 4) pre_unconfigure(): optional step to solve possible dependency issue
> 5) unconfigure(): remove devices from running system
> 6) post_unconfigure(): free resources used by devices
> 
> In this way, we can easily rollback if error happens.
> How do you think of this solution, any suggestion ? I think we can achieve
> a better way for sharing ideas. :)

Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
have not looked at all your changes yet..), but in my mind, a hot-plug
operation should be composed with the following 3 phases.

1. Validate phase - Verify if the request is a supported operation.  All
known restrictions are verified at this phase.  For instance, if a
hot-remove request involves kernel memory, it is failed in this phase.
Since this phase makes no change, no rollback is necessary to fail.  

2. Execute phase - Perform hot-add / hot-remove operation that can be
rolled-back in case of error or cancel.

3. Commit phase - Perform the final hot-add / hot-remove operation that
cannot be rolled-back.  No error / cancel is allowed in this phase.  For
instance, eject operation is performed at this phase.  


Thanks,
-Toshi





^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 18:40                   ` Rafael J. Wysocki
@ 2012-11-28 21:02                     ` Toshi Kani
  2012-11-28 21:40                       ` Rafael J. Wysocki
  2012-11-28 23:49                       ` Rafael J. Wysocki
  0 siblings, 2 replies; 92+ messages in thread
From: Toshi Kani @ 2012-11-28 21:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

> > > > > > > Consider the following case:
> > > > > > > 
> > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > 
> > > > > > > CPUa                                                  CPUb
> > > > > > > acpi_memory_device_notify()
> > > > > > >                                        unbind it from the driver
> > > > > > >     acpi_bus_hot_remove_device()
> > > > > > 
> > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > unbind only?
> > > > > 
> > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > if it's not present (dev->driver == NULL).
> > > > > 
> > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > for other acpi devices?
> > > > > 
> > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > 
> > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > device_release_driver(), I am wondering if we can call
> > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > parent lock is otherwise released after device_release_driver() is done.
> > > 
> > > I would be careful.  You may introduce some subtle locking-related issues
> > > this way.
> > 
> > Right.  This requires careful inspection and testing.  As far as the
> > locking is concerned, I am not keen on using fine grained locking for
> > hot-plug.  It is much simpler and solid if we serialize such operations.
> > 
> > > Besides, there may be an alternative approach to all this.  For example,
> > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > associated with them don't go away in that case after all, do they?
> > 
> > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > to be able to fail a request when memory range cannot be off-lined.
> > Otherwise, we end up ejecting online memory range.
> 
> Yes, this is the major one.  The minor issue, however, is a race condition
> between unbinding a driver from a device and removing the device if I
> understand it correctly.  Which will go away automatically if the device is
> not removed in the first place.  Or so I would think. :-)

I see.  I do not think whether or not the device is removed on eject
makes any difference here.  The issue is that after driver_unbind() is
done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
driver (hence, it cannot fail in prepare_remove), and goes ahead to call
_EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
it cannot off-line kernel memory ranges.  So, we basically need to
either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
during the operation.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 21:40                       ` Rafael J. Wysocki
@ 2012-11-28 21:40                         ` Toshi Kani
  2012-11-28 22:01                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-28 21:40 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > Consider the following case:
> > > > > > > > > 
> > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > 
> > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > acpi_memory_device_notify()
> > > > > > > > >                                        unbind it from the driver
> > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > 
> > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > unbind only?
> > > > > > > 
> > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > 
> > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > for other acpi devices?
> > > > > > > 
> > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > 
> > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > device_release_driver(), I am wondering if we can call
> > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > 
> > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > this way.
> > > > 
> > > > Right.  This requires careful inspection and testing.  As far as the
> > > > locking is concerned, I am not keen on using fine grained locking for
> > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > 
> > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > associated with them don't go away in that case after all, do they?
> > > > 
> > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > to be able to fail a request when memory range cannot be off-lined.
> > > > Otherwise, we end up ejecting online memory range.
> > > 
> > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > between unbinding a driver from a device and removing the device if I
> > > understand it correctly.  Which will go away automatically if the device is
> > > not removed in the first place.  Or so I would think. :-)
> > 
> > I see.  I do not think whether or not the device is removed on eject
> > makes any difference here.  The issue is that after driver_unbind() is
> > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > it cannot off-line kernel memory ranges.  So, we basically need to
> > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > during the operation.
> 
> OK, I see the problem now.
> 
> What exactly is triggering the driver_unbind() in this scenario?

User can request driver_unbind() from sysfs as follows.  I do not see
much reason why user has to do for memory, though.

echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind


Thanks,
-Toshi




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 21:02                     ` Toshi Kani
@ 2012-11-28 21:40                       ` Rafael J. Wysocki
  2012-11-28 21:40                         ` Toshi Kani
  2012-11-28 23:49                       ` Rafael J. Wysocki
  1 sibling, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 21:40 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > Consider the following case:
> > > > > > > > 
> > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > 
> > > > > > > > CPUa                                                  CPUb
> > > > > > > > acpi_memory_device_notify()
> > > > > > > >                                        unbind it from the driver
> > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > 
> > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > unbind only?
> > > > > > 
> > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > if it's not present (dev->driver == NULL).
> > > > > > 
> > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > for other acpi devices?
> > > > > > 
> > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > 
> > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > device_release_driver(), I am wondering if we can call
> > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > 
> > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > this way.
> > > 
> > > Right.  This requires careful inspection and testing.  As far as the
> > > locking is concerned, I am not keen on using fine grained locking for
> > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > 
> > > > Besides, there may be an alternative approach to all this.  For example,
> > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > associated with them don't go away in that case after all, do they?
> > > 
> > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > to be able to fail a request when memory range cannot be off-lined.
> > > Otherwise, we end up ejecting online memory range.
> > 
> > Yes, this is the major one.  The minor issue, however, is a race condition
> > between unbinding a driver from a device and removing the device if I
> > understand it correctly.  Which will go away automatically if the device is
> > not removed in the first place.  Or so I would think. :-)
> 
> I see.  I do not think whether or not the device is removed on eject
> makes any difference here.  The issue is that after driver_unbind() is
> done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> it cannot off-line kernel memory ranges.  So, we basically need to
> either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> during the operation.

OK, I see the problem now.

What exactly is triggering the driver_unbind() in this scenario?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 21:40                         ` Toshi Kani
@ 2012-11-28 22:01                           ` Rafael J. Wysocki
  2012-11-28 22:04                             ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 22:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > Consider the following case:
> > > > > > > > > > 
> > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > 
> > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > 
> > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > unbind only?
> > > > > > > > 
> > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > 
> > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > for other acpi devices?
> > > > > > > > 
> > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > 
> > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > 
> > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > this way.
> > > > > 
> > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > 
> > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > associated with them don't go away in that case after all, do they?
> > > > > 
> > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > Otherwise, we end up ejecting online memory range.
> > > > 
> > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > between unbinding a driver from a device and removing the device if I
> > > > understand it correctly.  Which will go away automatically if the device is
> > > > not removed in the first place.  Or so I would think. :-)
> > > 
> > > I see.  I do not think whether or not the device is removed on eject
> > > makes any difference here.  The issue is that after driver_unbind() is
> > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
> > 
> > OK, I see the problem now.
> > 
> > What exactly is triggering the driver_unbind() in this scenario?
> 
> User can request driver_unbind() from sysfs as follows.  I do not see
> much reason why user has to do for memory, though.
> 
> echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind

This is wrong.  Even if we want to permit user space to forcibly unbind
drivers from anything like this, we should at least check for some
situations in which it is plain dangerous.  Like in this case.  So I think
the above should fail unless we know that the driver won't be necessary
to handle hot-removal of memory.

Alternatively, this may actually try to carry out the hot-removal and only
call driver_unbind() if that succeeds.  Whichever is preferable, I'd say.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 22:01                           ` Rafael J. Wysocki
@ 2012-11-28 22:04                             ` Toshi Kani
  2012-11-28 22:21                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-28 22:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wed, 2012-11-28 at 23:01 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > > 
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > 
> > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > 
> > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > > unbind only?
> > > > > > > > > 
> > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > > 
> > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > for other acpi devices?
> > > > > > > > > 
> > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > > 
> > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > > 
> > > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > > this way.
> > > > > > 
> > > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > > 
> > > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > > associated with them don't go away in that case after all, do they?
> > > > > > 
> > > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > Otherwise, we end up ejecting online memory range.
> > > > > 
> > > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > > between unbinding a driver from a device and removing the device if I
> > > > > understand it correctly.  Which will go away automatically if the device is
> > > > > not removed in the first place.  Or so I would think. :-)
> > > > 
> > > > I see.  I do not think whether or not the device is removed on eject
> > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> > > 
> > > OK, I see the problem now.
> > > 
> > > What exactly is triggering the driver_unbind() in this scenario?
> > 
> > User can request driver_unbind() from sysfs as follows.  I do not see
> > much reason why user has to do for memory, though.
> > 
> > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> 
> This is wrong.  Even if we want to permit user space to forcibly unbind
> drivers from anything like this, we should at least check for some
> situations in which it is plain dangerous.  Like in this case.  So I think
> the above should fail unless we know that the driver won't be necessary
> to handle hot-removal of memory.

Well, we tried twice already... :)
https://lkml.org/lkml/2012/11/16/649

> Alternatively, this may actually try to carry out the hot-removal and only
> call driver_unbind() if that succeeds.  Whichever is preferable, I'd say.

Greg clarified in the above link that this interface is "unbind", not
remove.


Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 22:21                               ` Rafael J. Wysocki
@ 2012-11-28 22:16                                 ` Toshi Kani
  2012-11-28 22:39                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-28 22:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

> > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > during the operation.
> > > > > 
> > > > > OK, I see the problem now.
> > > > > 
> > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > 
> > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > much reason why user has to do for memory, though.
> > > > 
> > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > 
> > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > drivers from anything like this, we should at least check for some
> > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > the above should fail unless we know that the driver won't be necessary
> > > to handle hot-removal of memory.
> > 
> > Well, we tried twice already... :)
> > https://lkml.org/lkml/2012/11/16/649
> 
> I didn't mean driver_unbind() should fail.  The code path that executes
> driver_unbind() eventually should fail _before_ executing it.

driver_unbind() is the handler, so it is called directly from this
unbind interface.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 22:04                             ` Toshi Kani
@ 2012-11-28 22:21                               ` Rafael J. Wysocki
  2012-11-28 22:16                                 ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 22:21 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wednesday, November 28, 2012 03:04:52 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 23:01 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:40:09 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 22:40 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > 
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > 
> > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > 
> > > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > > > unbind only?
> > > > > > > > > > 
> > > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > > > 
> > > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > > for other acpi devices?
> > > > > > > > > > 
> > > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > > > 
> > > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > > > 
> > > > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > > > this way.
> > > > > > > 
> > > > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > > > 
> > > > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > > > associated with them don't go away in that case after all, do they?
> > > > > > > 
> > > > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > > Otherwise, we end up ejecting online memory range.
> > > > > > 
> > > > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > > > between unbinding a driver from a device and removing the device if I
> > > > > > understand it correctly.  Which will go away automatically if the device is
> > > > > > not removed in the first place.  Or so I would think. :-)
> > > > > 
> > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > during the operation.
> > > > 
> > > > OK, I see the problem now.
> > > > 
> > > > What exactly is triggering the driver_unbind() in this scenario?
> > > 
> > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > much reason why user has to do for memory, though.
> > > 
> > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > 
> > This is wrong.  Even if we want to permit user space to forcibly unbind
> > drivers from anything like this, we should at least check for some
> > situations in which it is plain dangerous.  Like in this case.  So I think
> > the above should fail unless we know that the driver won't be necessary
> > to handle hot-removal of memory.
> 
> Well, we tried twice already... :)
> https://lkml.org/lkml/2012/11/16/649

I didn't mean driver_unbind() should fail.  The code path that executes
driver_unbind() eventually should fail _before_ executing it.

> > Alternatively, this may actually try to carry out the hot-removal and only
> > call driver_unbind() if that succeeds.  Whichever is preferable, I'd say.
> 
> Greg clarified in the above link that this interface is "unbind", not
> remove.

OK, so that's clear.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 22:16                                 ` Toshi Kani
@ 2012-11-28 22:39                                   ` Rafael J. Wysocki
  2012-11-28 22:46                                     ` Greg KH
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 22:39 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > during the operation.
> > > > > > 
> > > > > > OK, I see the problem now.
> > > > > > 
> > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > 
> > > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > > much reason why user has to do for memory, though.
> > > > > 
> > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > 
> > > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > > drivers from anything like this, we should at least check for some
> > > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > > the above should fail unless we know that the driver won't be necessary
> > > > to handle hot-removal of memory.
> > > 
> > > Well, we tried twice already... :)
> > > https://lkml.org/lkml/2012/11/16/649
> > 
> > I didn't mean driver_unbind() should fail.  The code path that executes
> > driver_unbind() eventually should fail _before_ executing it.
> 
> driver_unbind() is the handler, so it is called directly from this
> unbind interface.

Yes, sorry for the confusion.

So, it looks like the driver core wants us to handle driver unbinding no
matter what.

This pretty much means that it is a bad idea to have a driver that is
exposed as a "device driver" in sysfs for memory hotplugging.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 22:39                                   ` Rafael J. Wysocki
@ 2012-11-28 22:46                                     ` Greg KH
  2012-11-28 23:05                                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Greg KH @ 2012-11-28 22:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, linux-acpi, Vasilis Liaskovitis, Wen Congyang,
	Wen Congyang, isimatu.yasuaki, lenb, linux-kernel, linux-mm

On Wed, Nov 28, 2012 at 11:39:22PM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > > during the operation.
> > > > > > > 
> > > > > > > OK, I see the problem now.
> > > > > > > 
> > > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > > 
> > > > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > > > much reason why user has to do for memory, though.
> > > > > > 
> > > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > 
> > > > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > > > drivers from anything like this, we should at least check for some
> > > > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > > > the above should fail unless we know that the driver won't be necessary
> > > > > to handle hot-removal of memory.
> > > > 
> > > > Well, we tried twice already... :)
> > > > https://lkml.org/lkml/2012/11/16/649
> > > 
> > > I didn't mean driver_unbind() should fail.  The code path that executes
> > > driver_unbind() eventually should fail _before_ executing it.
> > 
> > driver_unbind() is the handler, so it is called directly from this
> > unbind interface.
> 
> Yes, sorry for the confusion.
> 
> So, it looks like the driver core wants us to handle driver unbinding no
> matter what.

Yes.  Well, the driver core does the unbinding no matter what, if it was
told, by a user, to do so.  Why is that a problem?  The user then is
responsible for any bad things (i.e. not able to control the device any
more), if they do so.

> This pretty much means that it is a bad idea to have a driver that is
> exposed as a "device driver" in sysfs for memory hotplugging.

Again, why?  All this means is that the driver is now not connected to
the device (memory in this case.)  The memory is still there, still
operates as before, only difference is, the driver can't touch it
anymore.

This is the same for any ACPI driver, and has been for years.

Please don't confuse unbind with any "normal" system operation, it is
not to be used for memory hotplug, or anything else like this.

Also, if you really do not want to do this, turn off the ability to
unbind/bind for these devices, that is under your control in your bus
logic.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 22:46                                     ` Greg KH
@ 2012-11-28 23:05                                       ` Rafael J. Wysocki
  2012-11-28 23:10                                         ` Greg KH
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 23:05 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, linux-acpi, Vasilis Liaskovitis, Wen Congyang,
	Wen Congyang, isimatu.yasuaki, lenb, linux-kernel, linux-mm

On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> On Wed, Nov 28, 2012 at 11:39:22PM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 03:16:22 PM Toshi Kani wrote:
> > > > > > > > > I see.  I do not think whether or not the device is removed on eject
> > > > > > > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > > > > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > > > > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > > > > > > _EJ0.  If driver_unbind() did off-line the memory, this is OK.  However,
> > > > > > > > > it cannot off-line kernel memory ranges.  So, we basically need to
> > > > > > > > > either 1) serialize acpi_bus_hot_remove_device() and driver_unbind(), or
> > > > > > > > > 2) make acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > > > > > > during the operation.
> > > > > > > > 
> > > > > > > > OK, I see the problem now.
> > > > > > > > 
> > > > > > > > What exactly is triggering the driver_unbind() in this scenario?
> > > > > > > 
> > > > > > > User can request driver_unbind() from sysfs as follows.  I do not see
> > > > > > > much reason why user has to do for memory, though.
> > > > > > > 
> > > > > > > echo "PNP0C80:XX" > /sys/bus/acpi/drivers/acpi_memhotplug/unbind
> > > > > > 
> > > > > > This is wrong.  Even if we want to permit user space to forcibly unbind
> > > > > > drivers from anything like this, we should at least check for some
> > > > > > situations in which it is plain dangerous.  Like in this case.  So I think
> > > > > > the above should fail unless we know that the driver won't be necessary
> > > > > > to handle hot-removal of memory.
> > > > > 
> > > > > Well, we tried twice already... :)
> > > > > https://lkml.org/lkml/2012/11/16/649
> > > > 
> > > > I didn't mean driver_unbind() should fail.  The code path that executes
> > > > driver_unbind() eventually should fail _before_ executing it.
> > > 
> > > driver_unbind() is the handler, so it is called directly from this
> > > unbind interface.
> > 
> > Yes, sorry for the confusion.
> > 
> > So, it looks like the driver core wants us to handle driver unbinding no
> > matter what.
> 
> Yes.  Well, the driver core does the unbinding no matter what, if it was
> told, by a user, to do so.  Why is that a problem?  The user then is
> responsible for any bad things (i.e. not able to control the device any
> more), if they do so.

I don't really agree with that, because the user may simply not know what
the consequences of that will be.  In my not so humble opinion any interface
allowing user space to crash the kernel is a bad one.  And this is an example
of that.

> > This pretty much means that it is a bad idea to have a driver that is
> > exposed as a "device driver" in sysfs for memory hotplugging.
> 
> Again, why?  All this means is that the driver is now not connected to
> the device (memory in this case.)  The memory is still there, still
> operates as before, only difference is, the driver can't touch it
> anymore.
> 
> This is the same for any ACPI driver, and has been for years.

Except that if this driver has been unbound and the removal is triggered by
an SCI, the core will just go on and remove the memory, although it may
be killing the kernel this way.

Arguably, this may be considered as the core's fault, but the only way to
fix that would be to move the code from that driver into the core and not to
register it as a "driver" any more.  Which was my point. :-)

> Please don't confuse unbind with any "normal" system operation, it is
> not to be used for memory hotplug, or anything else like this.
> 
> Also, if you really do not want to do this, turn off the ability to
> unbind/bind for these devices, that is under your control in your bus
> logic.

OK, but how?  I'm looking at driver_unbind() and not seeing any way to do
that actually.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 23:05                                       ` Rafael J. Wysocki
@ 2012-11-28 23:10                                         ` Greg KH
  2012-11-28 23:31                                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Greg KH @ 2012-11-28 23:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, linux-acpi, Vasilis Liaskovitis, Wen Congyang,
	Wen Congyang, isimatu.yasuaki, lenb, linux-kernel, linux-mm

On Thu, Nov 29, 2012 at 12:05:20AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> > > So, it looks like the driver core wants us to handle driver unbinding no
> > > matter what.
> > 
> > Yes.  Well, the driver core does the unbinding no matter what, if it was
> > told, by a user, to do so.  Why is that a problem?  The user then is
> > responsible for any bad things (i.e. not able to control the device any
> > more), if they do so.
> 
> I don't really agree with that, because the user may simply not know what
> the consequences of that will be.  In my not so humble opinion any interface
> allowing user space to crash the kernel is a bad one.  And this is an example
> of that.

This has been in place since 2005, over 7 years now, and I have never
heard any problems with it being used to crash the kernel, despite the
easy ability for people to unbind all of their devices from drivers and
instantly cause a system hang.  So really doubt this is a problem in
real life :)

> > > This pretty much means that it is a bad idea to have a driver that is
> > > exposed as a "device driver" in sysfs for memory hotplugging.
> > 
> > Again, why?  All this means is that the driver is now not connected to
> > the device (memory in this case.)  The memory is still there, still
> > operates as before, only difference is, the driver can't touch it
> > anymore.
> > 
> > This is the same for any ACPI driver, and has been for years.
> 
> Except that if this driver has been unbound and the removal is triggered by
> an SCI, the core will just go on and remove the memory, although it may
> be killing the kernel this way.

Why would memory go away if a driver is unbound from a device?  The
device didn't go away.  It's the same if the driver was a module and it
was unloaded, you should not turn memory off in that situation, right?
Are you also going to prevent module unloading of this driver?

> Arguably, this may be considered as the core's fault, but the only way to
> fix that would be to move the code from that driver into the core and not to
> register it as a "driver" any more.  Which was my point. :-)

No, I think people are totally overreacting to the unbind/bind files,
which are there to aid in development, and in adding new device ids to
drivers, as well as sometimes doing a hacky revoke() call.

> > Please don't confuse unbind with any "normal" system operation, it is
> > not to be used for memory hotplug, or anything else like this.
> > 
> > Also, if you really do not want to do this, turn off the ability to
> > unbind/bind for these devices, that is under your control in your bus
> > logic.
> 
> OK, but how?  I'm looking at driver_unbind() and not seeing any way to do
> that actually.

See the suppress_bind_attrs field in struct device_driver.  It's even
documented in device.h, but sadly, no one reads documentation :)

I recommend you set this field if you don't want the bind/unbind files
to show up for your memory driver, although I would argue that the
driver needs to be fixed up to not do foolish things like removing
memory from a system unless it really does go away...

hope this helps,

greg k-h

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 23:10                                         ` Greg KH
@ 2012-11-28 23:31                                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 23:31 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, linux-acpi, Vasilis Liaskovitis, Wen Congyang,
	Wen Congyang, isimatu.yasuaki, lenb, linux-kernel, linux-mm

On Wednesday, November 28, 2012 03:10:46 PM Greg KH wrote:
> On Thu, Nov 29, 2012 at 12:05:20AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:46:33 PM Greg KH wrote:
> > > > So, it looks like the driver core wants us to handle driver unbinding no
> > > > matter what.
> > > 
> > > Yes.  Well, the driver core does the unbinding no matter what, if it was
> > > told, by a user, to do so.  Why is that a problem?  The user then is
> > > responsible for any bad things (i.e. not able to control the device any
> > > more), if they do so.
> > 
> > I don't really agree with that, because the user may simply not know what
> > the consequences of that will be.  In my not so humble opinion any interface
> > allowing user space to crash the kernel is a bad one.  And this is an example
> > of that.
> 
> This has been in place since 2005, over 7 years now, and I have never
> heard any problems with it being used to crash the kernel, despite the
> easy ability for people to unbind all of their devices from drivers and
> instantly cause a system hang.  So really doubt this is a problem in
> real life :)
> 
> > > > This pretty much means that it is a bad idea to have a driver that is
> > > > exposed as a "device driver" in sysfs for memory hotplugging.
> > > 
> > > Again, why?  All this means is that the driver is now not connected to
> > > the device (memory in this case.)  The memory is still there, still
> > > operates as before, only difference is, the driver can't touch it
> > > anymore.
> > > 
> > > This is the same for any ACPI driver, and has been for years.
> > 
> > Except that if this driver has been unbound and the removal is triggered by
> > an SCI, the core will just go on and remove the memory, although it may
> > be killing the kernel this way.
> 
> Why would memory go away if a driver is unbound from a device?  The
> device didn't go away.  It's the same if the driver was a module and it
> was unloaded, you should not turn memory off in that situation, right?

Right.  It looks like there's some confusion about the role of .remove()
in the ACPI subsystem, but I need to investigate it a bit more.

> Are you also going to prevent module unloading of this driver?

I'm not sure what I'm going to do with that driver at the moment to be honest. :-)

> > Arguably, this may be considered as the core's fault, but the only way to
> > fix that would be to move the code from that driver into the core and not to
> > register it as a "driver" any more.  Which was my point. :-)
> 
> No, I think people are totally overreacting to the unbind/bind files,
> which are there to aid in development, and in adding new device ids to
> drivers, as well as sometimes doing a hacky revoke() call.
> 
> > > Please don't confuse unbind with any "normal" system operation, it is
> > > not to be used for memory hotplug, or anything else like this.
> > > 
> > > Also, if you really do not want to do this, turn off the ability to
> > > unbind/bind for these devices, that is under your control in your bus
> > > logic.
> > 
> > OK, but how?  I'm looking at driver_unbind() and not seeing any way to do
> > that actually.
> 
> See the suppress_bind_attrs field in struct device_driver.  It's even
> documented in device.h, but sadly, no one reads documentation :)

That's good to know, thanks. :-)

And if I knew I could find that information in device.h, I'd look in there.

> I recommend you set this field if you don't want the bind/unbind files
> to show up for your memory driver, although I would argue that the
> driver needs to be fixed up to not do foolish things like removing
> memory from a system unless it really does go away...

Quite frankly, I need to look at that driver and how things are supposed to
work more thoroughly, because I don't seem to see a reason to do various
things the way they are done.  Well, maybe it's just me.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 21:02                     ` Toshi Kani
  2012-11-28 21:40                       ` Rafael J. Wysocki
@ 2012-11-28 23:49                       ` Rafael J. Wysocki
  2012-11-29  1:02                         ` Toshi Kani
  1 sibling, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-28 23:49 UTC (permalink / raw)
  To: linux-acpi
  Cc: Toshi Kani, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > Consider the following case:
> > > > > > > > 
> > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > 
> > > > > > > > CPUa                                                  CPUb
> > > > > > > > acpi_memory_device_notify()
> > > > > > > >                                        unbind it from the driver
> > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > 
> > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > unbind only?
> > > > > > 
> > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > if it's not present (dev->driver == NULL).
> > > > > > 
> > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > for other acpi devices?
> > > > > > 
> > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > 
> > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > device_release_driver(), I am wondering if we can call
> > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > 
> > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > this way.
> > > 
> > > Right.  This requires careful inspection and testing.  As far as the
> > > locking is concerned, I am not keen on using fine grained locking for
> > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > 
> > > > Besides, there may be an alternative approach to all this.  For example,
> > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > associated with them don't go away in that case after all, do they?
> > > 
> > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > to be able to fail a request when memory range cannot be off-lined.
> > > Otherwise, we end up ejecting online memory range.
> > 
> > Yes, this is the major one.  The minor issue, however, is a race condition
> > between unbinding a driver from a device and removing the device if I
> > understand it correctly.  Which will go away automatically if the device is
> > not removed in the first place.  Or so I would think. :-)
> 
> I see.  I do not think whether or not the device is removed on eject
> makes any difference here.  The issue is that after driver_unbind() is
> done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> _EJ0.

I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
acpi_eject_store() which is exposed through sysfs.  If we disabled exposing
acpi_eject_store() for memory devices, then the only way would be from the
notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
notify handler for memory (so that memory eject events are simply dropped on
the floor after unbinding the driver)?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-28 23:49                       ` Rafael J. Wysocki
@ 2012-11-29  1:02                         ` Toshi Kani
  2012-11-29  1:15                           ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29  1:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > Consider the following case:
> > > > > > > > > 
> > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > 
> > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > acpi_memory_device_notify()
> > > > > > > > >                                        unbind it from the driver
> > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > 
> > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > unbind only?
> > > > > > > 
> > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > 
> > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > for other acpi devices?
> > > > > > > 
> > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > 
> > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > device_release_driver(), I am wondering if we can call
> > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > 
> > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > this way.
> > > > 
> > > > Right.  This requires careful inspection and testing.  As far as the
> > > > locking is concerned, I am not keen on using fine grained locking for
> > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > 
> > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > associated with them don't go away in that case after all, do they?
> > > > 
> > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > to be able to fail a request when memory range cannot be off-lined.
> > > > Otherwise, we end up ejecting online memory range.
> > > 
> > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > between unbinding a driver from a device and removing the device if I
> > > understand it correctly.  Which will go away automatically if the device is
> > > not removed in the first place.  Or so I would think. :-)
> > 
> > I see.  I do not think whether or not the device is removed on eject
> > makes any difference here.  The issue is that after driver_unbind() is
> > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > _EJ0.
> 
> I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> acpi_eject_store() which is exposed through sysfs.  

Yes, that is correct.

> If we disabled exposing
> acpi_eject_store() for memory devices, then the only way would be from the
> notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> notify handler for memory (so that memory eject events are simply dropped on
> the floor after unbinding the driver)?

If driver_unbind() happens before an eject request, we do not have a
problem.  acpi_eject_store() fails if a driver is not bound to the
device.  acpi_memory_device_notify() fails as well.

The race condition Wen pointed out (see the top of this email) is that
driver_unbind() may come in while eject operation is in-progress.  This
is why I mentioned the following in previous email.

> So, we basically need to either 1) serialize
> acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> during the operation.


Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29  1:02                         ` Toshi Kani
@ 2012-11-29  1:15                           ` Toshi Kani
  2012-11-29 10:03                             ` Rafael J. Wysocki
  2012-11-29 11:04                             ` Vasilis Liaskovitis
  0 siblings, 2 replies; 92+ messages in thread
From: Toshi Kani @ 2012-11-29  1:15 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > Consider the following case:
> > > > > > > > > > 
> > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > 
> > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > 
> > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > unbind only?
> > > > > > > > 
> > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > 
> > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > for other acpi devices?
> > > > > > > > 
> > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > 
> > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > 
> > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > this way.
> > > > > 
> > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > 
> > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > associated with them don't go away in that case after all, do they?
> > > > > 
> > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > Otherwise, we end up ejecting online memory range.
> > > > 
> > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > between unbinding a driver from a device and removing the device if I
> > > > understand it correctly.  Which will go away automatically if the device is
> > > > not removed in the first place.  Or so I would think. :-)
> > > 
> > > I see.  I do not think whether or not the device is removed on eject
> > > makes any difference here.  The issue is that after driver_unbind() is
> > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > _EJ0.
> > 
> > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > acpi_eject_store() which is exposed through sysfs.  
> 
> Yes, that is correct.
> 
> > If we disabled exposing
> > acpi_eject_store() for memory devices, then the only way would be from the
> > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > notify handler for memory (so that memory eject events are simply dropped on
> > the floor after unbinding the driver)?
> 
> If driver_unbind() happens before an eject request, we do not have a
> problem.  acpi_eject_store() fails if a driver is not bound to the
> device.  acpi_memory_device_notify() fails as well.
> 
> The race condition Wen pointed out (see the top of this email) is that
> driver_unbind() may come in while eject operation is in-progress.  This
> is why I mentioned the following in previous email.
> 
> > So, we basically need to either 1) serialize
> > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > during the operation.

Forgot to mention.  The 3rd option is what Greg said -- use the
suppress_bind_attrs field.  I think this is a good option to address
this race condition for now.  For a long term solution, we should have a
better infrastructure in place to address such issue in general.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-28 18:41   ` Toshi Kani
@ 2012-11-29  4:48     ` Hanjun Guo
  2012-11-29 22:27       ` Toshi Kani
  2012-11-29 10:15     ` Rafael J. Wysocki
  2012-12-06 16:00     ` Jiang Liu
  2 siblings, 1 reply; 92+ messages in thread
From: Hanjun Guo @ 2012-11-29  4:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On 2012/11/29 2:41, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>> the driver core remove function needs to always succeed. This means we need
>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>> or SCI-initiated eject of memory devices fail e.g with:
>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>> the memory is still in use or not.
>>>
>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>> This operation should be registered for acpi devices whose removal (from kernel
>>> perspective) can fail.  Memory devices fall in this category.
>>>
>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>   succeed for device and all its children.
>>> - if above step was successfull, proceed to actual device removal
>>
>> Hi Vasilis,
>> We met the same problem when we doing computer node hotplug, It is a good idea
>> to introduce prepare_remove before actual device removal.
>>
>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>> offline most of memory sections except kernel used pages now, should we rollback
>> and online the memory sections when prepare_remove failed ?
> 
> I think hot-plug operation should have all-or-nothing semantics.  That
> is, an operation should either complete successfully, or rollback to the
> original state.

Yes, we have the same point of view with you. We handle this problem in the ACPI
based hot-plug framework as following:
1) hot add / hot remove complete successfully if no error happens;
2) automatic rollback to the original state if meets some error ;
3) rollback to the original if hot-plug operation cancelled by user ;

> 
>> As you may know, the ACPI based hotplug framework we are working on already addressed
>> this problem, and the way we slove this problem is a bit like yours.
>>
>> We introduce hp_ops in struct acpi_device_ops:
>> struct acpi_device_ops {
>> 	acpi_op_add add;
>> 	acpi_op_remove remove;
>> 	acpi_op_start start;
>> 	acpi_op_bind bind;
>> 	acpi_op_unbind unbind;
>> 	acpi_op_notify notify;
>> #ifdef	CONFIG_ACPI_HOTPLUG
>> 	struct acpihp_dev_ops *hp_ops;
>> #endif	/* CONFIG_ACPI_HOTPLUG */
>> };
>>
>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>> 1) pre_release(): optional step to mark device going to be removed/busy
>> 2) release(): reclaim device from running system
>> 3) post_release(): rollback if cancelled by user or error happened
>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>> 5) unconfigure(): remove devices from running system
>> 6) post_unconfigure(): free resources used by devices
>>
>> In this way, we can easily rollback if error happens.
>> How do you think of this solution, any suggestion ? I think we can achieve
>> a better way for sharing ideas. :)
> 
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.

Good idea ! we also implement a hot-plug operation in 3 phases:
1) acpihp_drv_pre_execute
2) acpihp_drv_execute
3) acpihp_drv_post_execute
you may refer to :
https://lkml.org/lkml/2012/11/4/79

> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail. 

Yes, we have done this in acpihp_drv_pre_execute, and check following things:

1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
   when memory device remove;

2) Dependency check involved. For instance, if hot-add a memory device,
   processor should be added first, otherwise it's not valid to this operation.

3) Race condition check. if the device and its dependent device is in hot-plug
   process, another request will be denied.

No rollback is needed for the above checks.

> 
> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.

In this phase, we introduce a state machine for the hot-plugble device,
please refer to:
https://lkml.org/lkml/2012/11/4/79

I think we have the same idea for the major framework, but the ACPI based
hot-plug framework implement it differently in detail, right ?

Thanks
Hanjun

> 
> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> instance, eject operation is performed at this phase.  
> 
> 
> Thanks,
> -Toshi
> 




^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29  1:15                           ` Toshi Kani
@ 2012-11-29 10:03                             ` Rafael J. Wysocki
  2012-11-29 11:30                               ` Vasilis Liaskovitis
  2012-11-29 16:43                               ` Toshi Kani
  2012-11-29 11:04                             ` Vasilis Liaskovitis
  1 sibling, 2 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 10:03 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > > 
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > 
> > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > 
> > > > > > > > > > Can we make acpi_bus_remove() to fail if a given acpi_device is not
> > > > > > > > > > bound with a driver?  If so, can we make the unbind operation to perform
> > > > > > > > > > unbind only?
> > > > > > > > > 
> > > > > > > > > acpi_bus_remove_device could check if the driver is present, and return -ENODEV
> > > > > > > > > if it's not present (dev->driver == NULL).
> > > > > > > > > 
> > > > > > > > > But there can still be a race between an eject and an unbind operation happening
> > > > > > > > > simultaneously. This seems like a general problem to me i.e. not specific to an
> > > > > > > > > acpi memory device. How do we ensure an eject does not race with a driver unbind
> > > > > > > > > for other acpi devices?
> > > > > > > > > 
> > > > > > > > > Is there a per-device lock in acpi-core or device-core that can prevent this from
> > > > > > > > > happening? Driver core does a device_lock(dev) on all operations, but this is
> > > > > > > > > probably not grabbed on SCI-initiated acpi ejects.
> > > > > > > > 
> > > > > > > > Since driver_unbind() calls device_lock(dev->parent) before calling
> > > > > > > > device_release_driver(), I am wondering if we can call
> > > > > > > > device_lock(dev->dev->parent) at the beginning of acpi_bus_remove()
> > > > > > > > (i.e. before calling pre_remove) and fails if dev->driver is NULL.  The
> > > > > > > > parent lock is otherwise released after device_release_driver() is done.
> > > > > > > 
> > > > > > > I would be careful.  You may introduce some subtle locking-related issues
> > > > > > > this way.
> > > > > > 
> > > > > > Right.  This requires careful inspection and testing.  As far as the
> > > > > > locking is concerned, I am not keen on using fine grained locking for
> > > > > > hot-plug.  It is much simpler and solid if we serialize such operations.
> > > > > > 
> > > > > > > Besides, there may be an alternative approach to all this.  For example,
> > > > > > > what if we don't remove struct device objects on eject?  The ACPI handles
> > > > > > > associated with them don't go away in that case after all, do they?
> > > > > > 
> > > > > > Umm...  Sorry, I am not getting your point.  The issue is that we need
> > > > > > to be able to fail a request when memory range cannot be off-lined.
> > > > > > Otherwise, we end up ejecting online memory range.
> > > > > 
> > > > > Yes, this is the major one.  The minor issue, however, is a race condition
> > > > > between unbinding a driver from a device and removing the device if I
> > > > > understand it correctly.  Which will go away automatically if the device is
> > > > > not removed in the first place.  Or so I would think. :-)
> > > > 
> > > > I see.  I do not think whether or not the device is removed on eject
> > > > makes any difference here.  The issue is that after driver_unbind() is
> > > > done, acpi_bus_hot_remove_device() no longer calls the ACPI memory
> > > > driver (hence, it cannot fail in prepare_remove), and goes ahead to call
> > > > _EJ0.
> > > 
> > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > acpi_eject_store() which is exposed through sysfs.  
> > 
> > Yes, that is correct.
> > 
> > > If we disabled exposing
> > > acpi_eject_store() for memory devices, then the only way would be from the
> > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > notify handler for memory (so that memory eject events are simply dropped on
> > > the floor after unbinding the driver)?
> > 
> > If driver_unbind() happens before an eject request, we do not have a
> > problem.  acpi_eject_store() fails if a driver is not bound to the
> > device.  acpi_memory_device_notify() fails as well.
> > 
> > The race condition Wen pointed out (see the top of this email) is that
> > driver_unbind() may come in while eject operation is in-progress.  This
> > is why I mentioned the following in previous email.
> > 
> > > So, we basically need to either 1) serialize
> > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
> 
> Forgot to mention.  The 3rd option is what Greg said -- use the
> suppress_bind_attrs field.  I think this is a good option to address
> this race condition for now.  For a long term solution, we should have a
> better infrastructure in place to address such issue in general.

Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
friends and I think there's a way to address all of these problems
without big redesign (for now).

First, why don't we introduce an ACPI device flag (in the flags field of
struct acpi_device) called eject_forbidden or something like this such that:

(1) It will be clear by default.
(2) It may only be set by a driver's .add() routine if necessary.
(3) Once set, it may only be cleared by the driver's .remove() routine if
    it's safe to physically remove the device after the .remove().

Then, after the .remove() (which must be successful) has returned, and the
flag is set, it will tell acpi_bus_remove() to return a specific error code
(such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
earlier, because if it left the flag set, there's no way to clear it afterward
and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
should be unregistered anyway if that error code is to be returned.

[By the way, do you know where we free the memory allocated for struct
 acpi_device objects?]

Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
store it, but continue the trimming normally and finally it should return that
error code to acpi_bus_hot_remove_device().

Now, if acpi_bus_hot_remove_device() gets that error code, it should just
reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
we attempted to eject) and notify the firmware about the failure.

If we have that, then the memory hotplug driver would only need to set
flags.eject_forbidden in its .add() routine and make its .remove() routine
only clear that flag if it is safe to actually remove the memory.

Does this make sense to you?

[BTW, using _PS3 in acpi_bus_hot_remove_device() directly to power off the
 device is a nonsense, because this method is not guaranteed to turn the power
 off in the first place (it may just put the device into D3hot).  If anything,
 acpi_device_set_power() should be used for that, but even that is not
 guaranteed to actually remove the power (power resources may be shared with
 other devices, so in fact that operation should be done by acpi_bus_trim()
 for each of the trimmed devices.]

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-28 18:41   ` Toshi Kani
  2012-11-29  4:48     ` Hanjun Guo
@ 2012-11-29 10:15     ` Rafael J. Wysocki
  2012-11-29 11:36       ` Vasilis Liaskovitis
                         ` (2 more replies)
  2012-12-06 16:00     ` Jiang Liu
  2 siblings, 3 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 10:15 UTC (permalink / raw)
  To: linux-acpi
  Cc: Toshi Kani, Hanjun Guo, Vasilis Liaskovitis, isimatu.yasuaki,
	wency, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > the driver core remove function needs to always succeed. This means we need
> > > to know that the device can be successfully removed before acpi_bus_trim / 
> > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > or SCI-initiated eject of memory devices fail e.g with:
> > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > 
> > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > the memory is still in use or not.
> > > 
> > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > This operation should be registered for acpi devices whose removal (from kernel
> > > perspective) can fail.  Memory devices fall in this category.
> > > 
> > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > - preparation for removal i.e. perform part of removal that can fail. Should
> > >   succeed for device and all its children.
> > > - if above step was successfull, proceed to actual device removal
> > 
> > Hi Vasilis,
> > We met the same problem when we doing computer node hotplug, It is a good idea
> > to introduce prepare_remove before actual device removal.
> > 
> > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > offline most of memory sections except kernel used pages now, should we rollback
> > and online the memory sections when prepare_remove failed ?
> 
> I think hot-plug operation should have all-or-nothing semantics.  That
> is, an operation should either complete successfully, or rollback to the
> original state.

That's correct.

> > As you may know, the ACPI based hotplug framework we are working on already addressed
> > this problem, and the way we slove this problem is a bit like yours.
> > 
> > We introduce hp_ops in struct acpi_device_ops:
> > struct acpi_device_ops {
> > 	acpi_op_add add;
> > 	acpi_op_remove remove;
> > 	acpi_op_start start;
> > 	acpi_op_bind bind;
> > 	acpi_op_unbind unbind;
> > 	acpi_op_notify notify;
> > #ifdef	CONFIG_ACPI_HOTPLUG
> > 	struct acpihp_dev_ops *hp_ops;
> > #endif	/* CONFIG_ACPI_HOTPLUG */
> > };
> > 
> > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > 1) pre_release(): optional step to mark device going to be removed/busy
> > 2) release(): reclaim device from running system
> > 3) post_release(): rollback if cancelled by user or error happened
> > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > 5) unconfigure(): remove devices from running system
> > 6) post_unconfigure(): free resources used by devices
> > 
> > In this way, we can easily rollback if error happens.
> > How do you think of this solution, any suggestion ? I think we can achieve
> > a better way for sharing ideas. :)
> 
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.
> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail.  

Actually, we can't do it this way, because the conditions may change between
the check and the execution.  So the first phase needs to involve execution
to some extent, although only as far as it remains reversible.

> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.

I would just merge 1 and 2.

> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> instance, eject operation is performed at this phase.  

Yup.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29  1:15                           ` Toshi Kani
  2012-11-29 10:03                             ` Rafael J. Wysocki
@ 2012-11-29 11:04                             ` Vasilis Liaskovitis
  2012-11-29 17:44                               ` Toshi Kani
  1 sibling, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-29 11:04 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

Hi,

On Wed, Nov 28, 2012 at 06:15:42PM -0700, Toshi Kani wrote:
> On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > Consider the following case:
> > > > > > > > > > > 
> > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > 
> > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > acpi_eject_store() which is exposed through sysfs.  
> > 
> > Yes, that is correct.
> > 
> > > If we disabled exposing
> > > acpi_eject_store() for memory devices, then the only way would be from the
> > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > notify handler for memory (so that memory eject events are simply dropped on
> > > the floor after unbinding the driver)?
> > 
> > If driver_unbind() happens before an eject request, we do not have a
> > problem.  acpi_eject_store() fails if a driver is not bound to the
> > device.  acpi_memory_device_notify() fails as well.
> > 
> > The race condition Wen pointed out (see the top of this email) is that
> > driver_unbind() may come in while eject operation is in-progress.  This
> > is why I mentioned the following in previous email.
> > 
> > > So, we basically need to either 1) serialize
> > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > during the operation.
> 
> Forgot to mention.  The 3rd option is what Greg said -- use the
> suppress_bind_attrs field.  I think this is a good option to address
> this race condition for now.  For a long term solution, we should have a
> better infrastructure in place to address such issue in general.

I like the suppress_bind_attrs idea, I 'll take a look.

As I said for option 2), acpi_bus_remove could check for driver presence.
But It's more a quick hack to abort the eject (the race with unbind can still
happen, but acpi_bus_remove can now detect it later in the eject path).
Something like:

 static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
 {
+	int ret;
 	if (!dev)
 		return -EINVAL;
 
 	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
+
+	if (dev->driver && dev->driver->ops.prepare_remove) {
+		ret = dev->driver->ops.prepare_remove(dev);
+		if (ret)
+			return ret;
+	}
+	else if (!dev->driver)
+		return -ENODEV;
 	device_release_driver(&dev->dev);

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 10:03                             ` Rafael J. Wysocki
@ 2012-11-29 11:30                               ` Vasilis Liaskovitis
  2012-11-29 16:57                                 ` Rafael J. Wysocki
  2012-11-29 17:56                                 ` Toshi Kani
  2012-11-29 16:43                               ` Toshi Kani
  1 sibling, 2 replies; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-29 11:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > 
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > 
> > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > 
[...]
> Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> friends and I think there's a way to address all of these problems
> without big redesign (for now).
> 
> First, why don't we introduce an ACPI device flag (in the flags field of
> struct acpi_device) called eject_forbidden or something like this such that:
> 
> (1) It will be clear by default.
> (2) It may only be set by a driver's .add() routine if necessary.
> (3) Once set, it may only be cleared by the driver's .remove() routine if
>     it's safe to physically remove the device after the .remove().
> 
> Then, after the .remove() (which must be successful) has returned, and the
> flag is set, it will tell acpi_bus_remove() to return a specific error code
> (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> earlier, because if it left the flag set, there's no way to clear it afterward
> and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> should be unregistered anyway if that error code is to be returned.
> 
> [By the way, do you know where we free the memory allocated for struct
>  acpi_device objects?]
> 
> Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> store it, but continue the trimming normally and finally it should return that
> error code to acpi_bus_hot_remove_device().

Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
fails). Trimming is not continued. 

Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
error. Is this the desired behaviour that we want to keep for bus_trim? (This is
more a general question, not specific to the eject_forbidden suggestion)

> 
> Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> we attempted to eject) and notify the firmware about the failure.

sounds like this rollback needs to be implemented in any solution we choose
to implement, correct?

> 
> If we have that, then the memory hotplug driver would only need to set
> flags.eject_forbidden in its .add() routine and make its .remove() routine
> only clear that flag if it is safe to actually remove the memory.
> 

But when .remove op is called, we are already in the irreversible/error-free
removal (final removal step).
Maybe we need to reset eject_forbidden in a prepare_remove operation which
handles the removal part that can fail ?

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 10:15     ` Rafael J. Wysocki
@ 2012-11-29 11:36       ` Vasilis Liaskovitis
  2012-12-06 16:59         ` Jiang Liu
  2012-11-29 17:03       ` Toshi Kani
  2012-12-06 16:56       ` Jiang Liu
  2 siblings, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-11-29 11:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Toshi Kani, Hanjun Guo, isimatu.yasuaki, wency, lenb,
	gregkh, linux-kernel, linux-mm, Tang Chen

On Thu, Nov 29, 2012 at 11:15:31AM +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > to introduce prepare_remove before actual device removal.
> > > 
> > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > offline most of memory sections except kernel used pages now, should we rollback
> > > and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> 
> That's correct.
> 
> > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > this problem, and the way we slove this problem is a bit like yours.
> > > 
> > > We introduce hp_ops in struct acpi_device_ops:
> > > struct acpi_device_ops {
> > > 	acpi_op_add add;
> > > 	acpi_op_remove remove;
> > > 	acpi_op_start start;
> > > 	acpi_op_bind bind;
> > > 	acpi_op_unbind unbind;
> > > 	acpi_op_notify notify;
> > > #ifdef	CONFIG_ACPI_HOTPLUG
> > > 	struct acpihp_dev_ops *hp_ops;
> > > #endif	/* CONFIG_ACPI_HOTPLUG */
> > > };
> > > 
> > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > 2) release(): reclaim device from running system
> > > 3) post_release(): rollback if cancelled by user or error happened
> > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > 5) unconfigure(): remove devices from running system
> > > 6) post_unconfigure(): free resources used by devices
> > > 
> > > In this way, we can easily rollback if error happens.
> > > How do you think of this solution, any suggestion ? I think we can achieve
> > > a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.  
> 
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution.  So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.
> 
> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> 
> I would just merge 1 and 2.

I agree steps 1 and 2 can be merged, at least for the current ACPI framework.
E.g. for memory hotplug, the mm function we call for memory removal
(remove_memory) handles both these steps.

The new ACPI framework could perhaps expand the operations as Hanjun described,
if it makes sense.

thanks,

- Vasilis

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 10:03                             ` Rafael J. Wysocki
  2012-11-29 11:30                               ` Vasilis Liaskovitis
@ 2012-11-29 16:43                               ` Toshi Kani
  1 sibling, 0 replies; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 16:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-11-29 at 11:03 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > If we disabled exposing
> > > > acpi_eject_store() for memory devices, then the only way would be from the
> > > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > > notify handler for memory (so that memory eject events are simply dropped on
> > > > the floor after unbinding the driver)?
> > > 
> > > If driver_unbind() happens before an eject request, we do not have a
> > > problem.  acpi_eject_store() fails if a driver is not bound to the
> > > device.  acpi_memory_device_notify() fails as well.
> > > 
> > > The race condition Wen pointed out (see the top of this email) is that
> > > driver_unbind() may come in while eject operation is in-progress.  This
> > > is why I mentioned the following in previous email.
> > > 
> > > > So, we basically need to either 1) serialize
> > > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> > 
> > Forgot to mention.  The 3rd option is what Greg said -- use the
> > suppress_bind_attrs field.  I think this is a good option to address
> > this race condition for now.  For a long term solution, we should have a
> > better infrastructure in place to address such issue in general.
> 
> Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> friends and I think there's a way to address all of these problems
> without big redesign (for now).
> 
> First, why don't we introduce an ACPI device flag (in the flags field of
> struct acpi_device) called eject_forbidden or something like this such that:
> 
> (1) It will be clear by default.
> (2) It may only be set by a driver's .add() routine if necessary.
> (3) Once set, it may only be cleared by the driver's .remove() routine if
>     it's safe to physically remove the device after the .remove().
> 
> Then, after the .remove() (which must be successful) has returned, and the
> flag is set, it will tell acpi_bus_remove() to return a specific error code
> (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> earlier, because if it left the flag set, there's no way to clear it afterward
> and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> should be unregistered anyway if that error code is to be returned.

I like the idea!  It's a good intermediate solution if we need to keep
the bind/unbind interface.  That said, I still prefer to go with option
3) for now.  I do not see much reason to keep the bind/unbind interface
for ACPI hotplug drivers, and it seems that the semantics of .remove()
is .remove_driver(), not .remove_device() for driver_unbind().  So, I
think we should disable the bind/unbind interface until we settle this
issue.

> [By the way, do you know where we free the memory allocated for struct
>  acpi_device objects?]

device_release() -> acpi_device_release().

> Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> store it, but continue the trimming normally and finally it should return that
> error code to acpi_bus_hot_remove_device().
> 
> Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> we attempted to eject) and notify the firmware about the failure.
> 
> If we have that, then the memory hotplug driver would only need to set
> flags.eject_forbidden in its .add() routine and make its .remove() routine
> only clear that flag if it is safe to actually remove the memory.
> 
> Does this make sense to you?

In high-level, yes.  Rollback strategy, such as we should continue the
trimming after an error, is something we need to think about along with
the framework design.  I think we need a good framework before
implementing rollback.

> [BTW, using _PS3 in acpi_bus_hot_remove_device() directly to power off the
>  device is a nonsense, because this method is not guaranteed to turn the power
>  off in the first place (it may just put the device into D3hot).  If anything,
>  acpi_device_set_power() should be used for that, but even that is not
>  guaranteed to actually remove the power (power resources may be shared with
>  other devices, so in fact that operation should be done by acpi_bus_trim()
>  for each of the trimmed devices.]

I agree.  I cannot tell for other vendor's implementation, but I expect
that _EJ0 takes care of the power state after it is ejected.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 11:30                               ` Vasilis Liaskovitis
@ 2012-11-29 16:57                                 ` Rafael J. Wysocki
  2012-11-29 17:56                                 ` Toshi Kani
  1 sibling, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 16:57 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Toshi Kani, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thursday, November 29, 2012 12:30:30 PM Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > 
> [...]
> > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > friends and I think there's a way to address all of these problems
> > without big redesign (for now).
> > 
> > First, why don't we introduce an ACPI device flag (in the flags field of
> > struct acpi_device) called eject_forbidden or something like this such that:
> > 
> > (1) It will be clear by default.
> > (2) It may only be set by a driver's .add() routine if necessary.
> > (3) Once set, it may only be cleared by the driver's .remove() routine if
> >     it's safe to physically remove the device after the .remove().
> > 
> > Then, after the .remove() (which must be successful) has returned, and the
> > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > earlier, because if it left the flag set, there's no way to clear it afterward
> > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > should be unregistered anyway if that error code is to be returned.
> > 
> > [By the way, do you know where we free the memory allocated for struct
> >  acpi_device objects?]
> > 
> > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > store it, but continue the trimming normally and finally it should return that
> > error code to acpi_bus_hot_remove_device().
> 
> Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> fails). Trimming is not continued. 
> 
> Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> more a general question, not specific to the eject_forbidden suggestion)
> 
> > 
> > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > we attempted to eject) and notify the firmware about the failure.
> 
> sounds like this rollback needs to be implemented in any solution we choose
> to implement, correct?
> 
> > 
> > If we have that, then the memory hotplug driver would only need to set
> > flags.eject_forbidden in its .add() routine and make its .remove() routine
> > only clear that flag if it is safe to actually remove the memory.
> > 
> 
> But when .remove op is called, we are already in the irreversible/error-free
> removal (final removal step).

Why so?  What prevents us from doing a bus scan again and binding the driver
again to the device?  Is .remove() doing something to the firmware?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 10:15     ` Rafael J. Wysocki
  2012-11-29 11:36       ` Vasilis Liaskovitis
@ 2012-11-29 17:03       ` Toshi Kani
  2012-11-29 20:30         ` Rafael J. Wysocki
  2012-12-06 17:01         ` Jiang Liu
  2012-12-06 16:56       ` Jiang Liu
  2 siblings, 2 replies; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 17:03 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Hanjun Guo, Vasilis Liaskovitis, isimatu.yasuaki,
	wency, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > the driver core remove function needs to always succeed. This means we need
> > > > to know that the device can be successfully removed before acpi_bus_trim / 
> > > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > 
> > > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > > the memory is still in use or not.
> > > > 
> > > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > > This operation should be registered for acpi devices whose removal (from kernel
> > > > perspective) can fail.  Memory devices fall in this category.
> > > > 
> > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > - preparation for removal i.e. perform part of removal that can fail. Should
> > > >   succeed for device and all its children.
> > > > - if above step was successfull, proceed to actual device removal
> > > 
> > > Hi Vasilis,
> > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > to introduce prepare_remove before actual device removal.
> > > 
> > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > offline most of memory sections except kernel used pages now, should we rollback
> > > and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> 
> That's correct.
> 
> > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > this problem, and the way we slove this problem is a bit like yours.
> > > 
> > > We introduce hp_ops in struct acpi_device_ops:
> > > struct acpi_device_ops {
> > > 	acpi_op_add add;
> > > 	acpi_op_remove remove;
> > > 	acpi_op_start start;
> > > 	acpi_op_bind bind;
> > > 	acpi_op_unbind unbind;
> > > 	acpi_op_notify notify;
> > > #ifdef	CONFIG_ACPI_HOTPLUG
> > > 	struct acpihp_dev_ops *hp_ops;
> > > #endif	/* CONFIG_ACPI_HOTPLUG */
> > > };
> > > 
> > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > 2) release(): reclaim device from running system
> > > 3) post_release(): rollback if cancelled by user or error happened
> > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > 5) unconfigure(): remove devices from running system
> > > 6) post_unconfigure(): free resources used by devices
> > > 
> > > In this way, we can easily rollback if error happens.
> > > How do you think of this solution, any suggestion ? I think we can achieve
> > > a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.  
> 
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution.  So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.

For memory hot-remove, we can check if the target memory ranges are
within ZONE_MOVABLE.  We should not allow user to change this setup
during hot-remove operation.  Other things may be to check if a target
node contains cpu0 (until it is supported), the console UART (assuming
we cannot delete it), etc.  We should avoid doing rollback as much as we
can.

Thanks,
-Toshi


> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> 
> I would just merge 1 and 2.
> 
> > 3. Commit phase - Perform the final hot-add / hot-remove operation that
> > cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> > instance, eject operation is performed at this phase.  
> 
> Yup.
> 
> Thanks,
> Rafael
> 
> 



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 11:04                             ` Vasilis Liaskovitis
@ 2012-11-29 17:44                               ` Toshi Kani
  2012-12-06  9:30                                 ` Vasilis Liaskovitis
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 17:44 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Rafael J. Wysocki, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-11-29 at 12:04 +0100, Vasilis Liaskovitis wrote:
> Hi,
> 
> On Wed, Nov 28, 2012 at 06:15:42PM -0700, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > 
> > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > 
> > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > I see two reasons for calling acpi_bus_hot_remove_device() for memory (correct
> > > > me if I'm wrong): (1) from the memhotplug driver's notify handler and (2) from
> > > > acpi_eject_store() which is exposed through sysfs.  
> > > 
> > > Yes, that is correct.
> > > 
> > > > If we disabled exposing
> > > > acpi_eject_store() for memory devices, then the only way would be from the
> > > > notify handler.  So I wonder if driver_unbind() shouldn't just uninstall the
> > > > notify handler for memory (so that memory eject events are simply dropped on
> > > > the floor after unbinding the driver)?
> > > 
> > > If driver_unbind() happens before an eject request, we do not have a
> > > problem.  acpi_eject_store() fails if a driver is not bound to the
> > > device.  acpi_memory_device_notify() fails as well.
> > > 
> > > The race condition Wen pointed out (see the top of this email) is that
> > > driver_unbind() may come in while eject operation is in-progress.  This
> > > is why I mentioned the following in previous email.
> > > 
> > > > So, we basically need to either 1) serialize
> > > > acpi_bus_hot_remove_device() and driver_unbind(), or 2) make
> > > > acpi_bus_hot_remove_device() to fail if driver_unbind() is run
> > > > during the operation.
> > 
> > Forgot to mention.  The 3rd option is what Greg said -- use the
> > suppress_bind_attrs field.  I think this is a good option to address
> > this race condition for now.  For a long term solution, we should have a
> > better infrastructure in place to address such issue in general.
> 
> I like the suppress_bind_attrs idea, I 'll take a look.

Great!

> As I said for option 2), acpi_bus_remove could check for driver presence.
> But It's more a quick hack to abort the eject (the race with unbind can still
> happen, but acpi_bus_remove can now detect it later in the eject path).
> Something like:
> 
>  static int acpi_bus_remove(struct acpi_device *dev, int rmdevice)
>  {
> +	int ret;
>  	if (!dev)
>  		return -EINVAL;
>  
>  	dev->removal_type = ACPI_BUS_REMOVAL_EJECT;
> +
> +	if (dev->driver && dev->driver->ops.prepare_remove) {
> +		ret = dev->driver->ops.prepare_remove(dev);
> +		if (ret)
> +			return ret;
> +	}
> +	else if (!dev->driver)
> +		return -ENODEV;
>  	device_release_driver(&dev->dev);

Yes, that's what I had in mind along with device_lock().  I think the
lock is necessary to close the window.
http://www.spinics.net/lists/linux-mm/msg46973.html

But as I mentioned in other email, I prefer option 3 with
suppress_bind_attrs.  So, yes, please take a look to see how it works
out.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 11:30                               ` Vasilis Liaskovitis
  2012-11-29 16:57                                 ` Rafael J. Wysocki
@ 2012-11-29 17:56                                 ` Toshi Kani
  2012-11-29 20:25                                   ` Rafael J. Wysocki
  1 sibling, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 17:56 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Rafael J. Wysocki, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > 
> > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > 
> [...]
> > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > friends and I think there's a way to address all of these problems
> > without big redesign (for now).
> > 
> > First, why don't we introduce an ACPI device flag (in the flags field of
> > struct acpi_device) called eject_forbidden or something like this such that:
> > 
> > (1) It will be clear by default.
> > (2) It may only be set by a driver's .add() routine if necessary.
> > (3) Once set, it may only be cleared by the driver's .remove() routine if
> >     it's safe to physically remove the device after the .remove().
> > 
> > Then, after the .remove() (which must be successful) has returned, and the
> > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > earlier, because if it left the flag set, there's no way to clear it afterward
> > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > should be unregistered anyway if that error code is to be returned.
> > 
> > [By the way, do you know where we free the memory allocated for struct
> >  acpi_device objects?]
> > 
> > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > store it, but continue the trimming normally and finally it should return that
> > error code to acpi_bus_hot_remove_device().
> 
> Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> fails). Trimming is not continued. 
> 
> Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> more a general question, not specific to the eject_forbidden suggestion)

Your change makes sense to me.  At least until we have rollback code in
place, we need to fail as soon as we hit an error.
 
> > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > we attempted to eject) and notify the firmware about the failure.
> 
> sounds like this rollback needs to be implemented in any solution we choose
> to implement, correct?

Yes, rollback is necessary.  But I do not think we need to include it
into your patch, though.

Thanks,
-Toshi
 
> > If we have that, then the memory hotplug driver would only need to set
> > flags.eject_forbidden in its .add() routine and make its .remove() routine
> > only clear that flag if it is safe to actually remove the memory.
> > 
> 
> But when .remove op is called, we are already in the irreversible/error-free
> removal (final removal step).
> Maybe we need to reset eject_forbidden in a prepare_remove operation which
> handles the removal part that can fail ?
> 
> thanks,
> 
> - Vasilis



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 17:56                                 ` Toshi Kani
@ 2012-11-29 20:25                                   ` Rafael J. Wysocki
  2012-11-29 20:38                                     ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 20:25 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Vasilis Liaskovitis, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > 
> > > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > > 
> > [...]
> > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > friends and I think there's a way to address all of these problems
> > > without big redesign (for now).
> > > 
> > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > struct acpi_device) called eject_forbidden or something like this such that:
> > > 
> > > (1) It will be clear by default.
> > > (2) It may only be set by a driver's .add() routine if necessary.
> > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > >     it's safe to physically remove the device after the .remove().
> > > 
> > > Then, after the .remove() (which must be successful) has returned, and the
> > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > > should be unregistered anyway if that error code is to be returned.
> > > 
> > > [By the way, do you know where we free the memory allocated for struct
> > >  acpi_device objects?]
> > > 
> > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > store it, but continue the trimming normally and finally it should return that
> > > error code to acpi_bus_hot_remove_device().
> > 
> > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > fails). Trimming is not continued. 
> > 
> > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > more a general question, not specific to the eject_forbidden suggestion)
> 
> Your change makes sense to me.  At least until we have rollback code in
> place, we need to fail as soon as we hit an error.

Are you sure this makes sense?  What happens to the devices that we have
trimmed already and then there's an error?  Looks like they are just unusable
going forward, aren't they?

> > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > we attempted to eject) and notify the firmware about the failure.
> > 
> > sounds like this rollback needs to be implemented in any solution we choose
> > to implement, correct?
> 
> Yes, rollback is necessary.  But I do not think we need to include it
> into your patch, though.

As the first step, we should just trim everything and then return an error
code in my opinion.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 17:03       ` Toshi Kani
@ 2012-11-29 20:30         ` Rafael J. Wysocki
  2012-11-29 20:39           ` Toshi Kani
  2012-12-06 17:07           ` Jiang Liu
  2012-12-06 17:01         ` Jiang Liu
  1 sibling, 2 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 20:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Hanjun Guo, Vasilis Liaskovitis, isimatu.yasuaki,
	wency, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > > the driver core remove function needs to always succeed. This means we need
> > > > > to know that the device can be successfully removed before acpi_bus_trim / 
> > > > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > 
> > > > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > > > the memory is still in use or not.
> > > > > 
> > > > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > > > This operation should be registered for acpi devices whose removal (from kernel
> > > > > perspective) can fail.  Memory devices fall in this category.
> > > > > 
> > > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > > - preparation for removal i.e. perform part of removal that can fail. Should
> > > > >   succeed for device and all its children.
> > > > > - if above step was successfull, proceed to actual device removal
> > > > 
> > > > Hi Vasilis,
> > > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > > to introduce prepare_remove before actual device removal.
> > > > 
> > > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > > offline most of memory sections except kernel used pages now, should we rollback
> > > > and online the memory sections when prepare_remove failed ?
> > > 
> > > I think hot-plug operation should have all-or-nothing semantics.  That
> > > is, an operation should either complete successfully, or rollback to the
> > > original state.
> > 
> > That's correct.
> > 
> > > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > > this problem, and the way we slove this problem is a bit like yours.
> > > > 
> > > > We introduce hp_ops in struct acpi_device_ops:
> > > > struct acpi_device_ops {
> > > > 	acpi_op_add add;
> > > > 	acpi_op_remove remove;
> > > > 	acpi_op_start start;
> > > > 	acpi_op_bind bind;
> > > > 	acpi_op_unbind unbind;
> > > > 	acpi_op_notify notify;
> > > > #ifdef	CONFIG_ACPI_HOTPLUG
> > > > 	struct acpihp_dev_ops *hp_ops;
> > > > #endif	/* CONFIG_ACPI_HOTPLUG */
> > > > };
> > > > 
> > > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > > 2) release(): reclaim device from running system
> > > > 3) post_release(): rollback if cancelled by user or error happened
> > > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > > 5) unconfigure(): remove devices from running system
> > > > 6) post_unconfigure(): free resources used by devices
> > > > 
> > > > In this way, we can easily rollback if error happens.
> > > > How do you think of this solution, any suggestion ? I think we can achieve
> > > > a better way for sharing ideas. :)
> > > 
> > > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > > have not looked at all your changes yet..), but in my mind, a hot-plug
> > > operation should be composed with the following 3 phases.
> > > 
> > > 1. Validate phase - Verify if the request is a supported operation.  All
> > > known restrictions are verified at this phase.  For instance, if a
> > > hot-remove request involves kernel memory, it is failed in this phase.
> > > Since this phase makes no change, no rollback is necessary to fail.  
> > 
> > Actually, we can't do it this way, because the conditions may change between
> > the check and the execution.  So the first phase needs to involve execution
> > to some extent, although only as far as it remains reversible.
> 
> For memory hot-remove, we can check if the target memory ranges are
> within ZONE_MOVABLE.  We should not allow user to change this setup
> during hot-remove operation.  Other things may be to check if a target
> node contains cpu0 (until it is supported), the console UART (assuming
> we cannot delete it), etc.  We should avoid doing rollback as much as we
> can.

Yes, we can make some checks upfront as an optimization and fail early if
the conditions are not met, but for correctness we need to repeat those
checks later anyway.  Once we've decided to go for the eject, the conditions
must hold whatever happens.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 20:25                                   ` Rafael J. Wysocki
@ 2012-11-29 20:38                                     ` Toshi Kani
  2012-11-29 21:23                                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 20:38 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vasilis Liaskovitis, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > > > 
> > > [...]
> > > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > > friends and I think there's a way to address all of these problems
> > > > without big redesign (for now).
> > > > 
> > > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > > struct acpi_device) called eject_forbidden or something like this such that:
> > > > 
> > > > (1) It will be clear by default.
> > > > (2) It may only be set by a driver's .add() routine if necessary.
> > > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > >     it's safe to physically remove the device after the .remove().
> > > > 
> > > > Then, after the .remove() (which must be successful) has returned, and the
> > > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > > > should be unregistered anyway if that error code is to be returned.
> > > > 
> > > > [By the way, do you know where we free the memory allocated for struct
> > > >  acpi_device objects?]
> > > > 
> > > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > > store it, but continue the trimming normally and finally it should return that
> > > > error code to acpi_bus_hot_remove_device().
> > > 
> > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > fails). Trimming is not continued. 
> > > 
> > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > more a general question, not specific to the eject_forbidden suggestion)
> > 
> > Your change makes sense to me.  At least until we have rollback code in
> > place, we need to fail as soon as we hit an error.
> 
> Are you sure this makes sense?  What happens to the devices that we have
> trimmed already and then there's an error?  Looks like they are just unusable
> going forward, aren't they?

Yes, the devices trimmed already are released from the kernel, and their
memory ranges become unusable.  This is bad.  But I do not think we
should trim further to make more devices unusable after an error. 


> > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > we attempted to eject) and notify the firmware about the failure.
> > > 
> > > sounds like this rollback needs to be implemented in any solution we choose
> > > to implement, correct?
> > 
> > Yes, rollback is necessary.  But I do not think we need to include it
> > into your patch, though.
> 
> As the first step, we should just trim everything and then return an error
> code in my opinion.

But we cannot trim devices with kernel memory.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 20:30         ` Rafael J. Wysocki
@ 2012-11-29 20:39           ` Toshi Kani
  2012-11-29 20:56             ` Toshi Kani
  2012-12-06 17:07           ` Jiang Liu
  1 sibling, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 20:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Hanjun Guo, Vasilis Liaskovitis, isimatu.yasuaki,
	wency, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> > > > > On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> > > > > > As discussed in https://patchwork.kernel.org/patch/1581581/
> > > > > > the driver core remove function needs to always succeed. This means we need
> > > > > > to know that the device can be successfully removed before acpi_bus_trim / 
> > > > > > acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> > > > > > or SCI-initiated eject of memory devices fail e.g with:
> > > > > > echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> > > > > > 
> > > > > > since the ACPI core goes ahead and ejects the device regardless of whether the
> > > > > > the memory is still in use or not.
> > > > > > 
> > > > > > For this reason a new acpi_device operation called prepare_remove is introduced.
> > > > > > This operation should be registered for acpi devices whose removal (from kernel
> > > > > > perspective) can fail.  Memory devices fall in this category.
> > > > > > 
> > > > > > acpi_bus_remove() is changed to handle removal in 2 steps:
> > > > > > - preparation for removal i.e. perform part of removal that can fail. Should
> > > > > >   succeed for device and all its children.
> > > > > > - if above step was successfull, proceed to actual device removal
> > > > > 
> > > > > Hi Vasilis,
> > > > > We met the same problem when we doing computer node hotplug, It is a good idea
> > > > > to introduce prepare_remove before actual device removal.
> > > > > 
> > > > > I think we could do more in prepare_remove, such as rollback. In most cases, we can
> > > > > offline most of memory sections except kernel used pages now, should we rollback
> > > > > and online the memory sections when prepare_remove failed ?
> > > > 
> > > > I think hot-plug operation should have all-or-nothing semantics.  That
> > > > is, an operation should either complete successfully, or rollback to the
> > > > original state.
> > > 
> > > That's correct.
> > > 
> > > > > As you may know, the ACPI based hotplug framework we are working on already addressed
> > > > > this problem, and the way we slove this problem is a bit like yours.
> > > > > 
> > > > > We introduce hp_ops in struct acpi_device_ops:
> > > > > struct acpi_device_ops {
> > > > > 	acpi_op_add add;
> > > > > 	acpi_op_remove remove;
> > > > > 	acpi_op_start start;
> > > > > 	acpi_op_bind bind;
> > > > > 	acpi_op_unbind unbind;
> > > > > 	acpi_op_notify notify;
> > > > > #ifdef	CONFIG_ACPI_HOTPLUG
> > > > > 	struct acpihp_dev_ops *hp_ops;
> > > > > #endif	/* CONFIG_ACPI_HOTPLUG */
> > > > > };
> > > > > 
> > > > > in hp_ops, we divide the prepare_remove into six small steps, that is:
> > > > > 1) pre_release(): optional step to mark device going to be removed/busy
> > > > > 2) release(): reclaim device from running system
> > > > > 3) post_release(): rollback if cancelled by user or error happened
> > > > > 4) pre_unconfigure(): optional step to solve possible dependency issue
> > > > > 5) unconfigure(): remove devices from running system
> > > > > 6) post_unconfigure(): free resources used by devices
> > > > > 
> > > > > In this way, we can easily rollback if error happens.
> > > > > How do you think of this solution, any suggestion ? I think we can achieve
> > > > > a better way for sharing ideas. :)
> > > > 
> > > > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > > > have not looked at all your changes yet..), but in my mind, a hot-plug
> > > > operation should be composed with the following 3 phases.
> > > > 
> > > > 1. Validate phase - Verify if the request is a supported operation.  All
> > > > known restrictions are verified at this phase.  For instance, if a
> > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > Since this phase makes no change, no rollback is necessary to fail.  
> > > 
> > > Actually, we can't do it this way, because the conditions may change between
> > > the check and the execution.  So the first phase needs to involve execution
> > > to some extent, although only as far as it remains reversible.
> > 
> > For memory hot-remove, we can check if the target memory ranges are
> > within ZONE_MOVABLE.  We should not allow user to change this setup
> > during hot-remove operation.  Other things may be to check if a target
> > node contains cpu0 (until it is supported), the console UART (assuming
> > we cannot delete it), etc.  We should avoid doing rollback as much as we
> > can.
> 
> Yes, we can make some checks upfront as an optimization and fail early if
> the conditions are not met, but for correctness we need to repeat those
> checks later anyway.  Once we've decided to go for the eject, the conditions
> must hold whatever happens.

Agreed.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 20:39           ` Toshi Kani
@ 2012-11-29 20:56             ` Toshi Kani
  2012-11-29 21:25               ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 20:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Hanjun Guo, Vasilis Liaskovitis, isimatu.yasuaki,
	wency, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
> On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > > 1. Validate phase - Verify if the request is a supported operation.  All
> > > > > known restrictions are verified at this phase.  For instance, if a
> > > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > > Since this phase makes no change, no rollback is necessary to fail.  
> > > > 
> > > > Actually, we can't do it this way, because the conditions may change between
> > > > the check and the execution.  So the first phase needs to involve execution
> > > > to some extent, although only as far as it remains reversible.
> > > 
> > > For memory hot-remove, we can check if the target memory ranges are
> > > within ZONE_MOVABLE.  We should not allow user to change this setup
> > > during hot-remove operation.  Other things may be to check if a target
> > > node contains cpu0 (until it is supported), the console UART (assuming
> > > we cannot delete it), etc.  We should avoid doing rollback as much as we
> > > can.
> > 
> > Yes, we can make some checks upfront as an optimization and fail early if
> > the conditions are not met, but for correctness we need to repeat those
> > checks later anyway.  Once we've decided to go for the eject, the conditions
> > must hold whatever happens.
> 
> Agreed.

BTW, it is not an optimization I am after for this phase.  There are
many error cases during hot-plug operations.  It is difficult to assure
that rollback is successful for every error condition in terms of
testing and maintaining the code.  So, it is easier to fail beforehand
when possible.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 20:38                                     ` Toshi Kani
@ 2012-11-29 21:23                                       ` Rafael J. Wysocki
  2012-11-29 21:46                                         ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 21:23 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Vasilis Liaskovitis, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > On Thu, Nov 29, 2012 at 11:03:05AM +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 06:15:42 PM Toshi Kani wrote:
> > > > > > On Wed, 2012-11-28 at 18:02 -0700, Toshi Kani wrote:
> > > > > > > On Thu, 2012-11-29 at 00:49 +0100, Rafael J. Wysocki wrote:
> > > > > > > > On Wednesday, November 28, 2012 02:02:48 PM Toshi Kani wrote:
> > > > > > > > > > > > > > > > Consider the following case:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > We hotremove the memory device by SCI and unbind it from the driver at the same time:
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > > CPUa                                                  CPUb
> > > > > > > > > > > > > > > > acpi_memory_device_notify()
> > > > > > > > > > > > > > > >                                        unbind it from the driver
> > > > > > > > > > > > > > > >     acpi_bus_hot_remove_device()
> > > > > > > > > > > > > > > 
> > > > [...]
> > > > > Well, in the meantime I've had a look at acpi_bus_hot_remove_device() and
> > > > > friends and I think there's a way to address all of these problems
> > > > > without big redesign (for now).
> > > > > 
> > > > > First, why don't we introduce an ACPI device flag (in the flags field of
> > > > > struct acpi_device) called eject_forbidden or something like this such that:
> > > > > 
> > > > > (1) It will be clear by default.
> > > > > (2) It may only be set by a driver's .add() routine if necessary.
> > > > > (3) Once set, it may only be cleared by the driver's .remove() routine if
> > > > >     it's safe to physically remove the device after the .remove().
> > > > > 
> > > > > Then, after the .remove() (which must be successful) has returned, and the
> > > > > flag is set, it will tell acpi_bus_remove() to return a specific error code
> > > > > (such as -EBUSY or -EAGAIN).  It doesn't matter if .remove() was called
> > > > > earlier, because if it left the flag set, there's no way to clear it afterward
> > > > > and acpi_bus_remove() will see it set anyway.  I think the struct acpi_device
> > > > > should be unregistered anyway if that error code is to be returned.
> > > > > 
> > > > > [By the way, do you know where we free the memory allocated for struct
> > > > >  acpi_device objects?]
> > > > > 
> > > > > Now if acpi_bus_trim() gets that error code from acpi_bus_remove(), it should
> > > > > store it, but continue the trimming normally and finally it should return that
> > > > > error code to acpi_bus_hot_remove_device().
> > > > 
> > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > fails). Trimming is not continued. 
> > > > 
> > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > more a general question, not specific to the eject_forbidden suggestion)
> > > 
> > > Your change makes sense to me.  At least until we have rollback code in
> > > place, we need to fail as soon as we hit an error.
> > 
> > Are you sure this makes sense?  What happens to the devices that we have
> > trimmed already and then there's an error?  Looks like they are just unusable
> > going forward, aren't they?
> 
> Yes, the devices trimmed already are released from the kernel, and their
> memory ranges become unusable.  This is bad.  But I do not think we
> should trim further to make more devices unusable after an error. 
> 
> 
> > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > we attempted to eject) and notify the firmware about the failure.
> > > > 
> > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > to implement, correct?
> > > 
> > > Yes, rollback is necessary.  But I do not think we need to include it
> > > into your patch, though.
> > 
> > As the first step, we should just trim everything and then return an error
> > code in my opinion.
> 
> But we cannot trim devices with kernel memory.

Well, let's put it this way: If we started a trim, we should just do it
completely, in which case we know we can go for the eject, or we should
roll it back completely.  Now, if you just break the trim on first error,
the complete rollback is kind of problematic.  It should be doable, but
it won't be easy.  On the other hand, if you go for the full trim,
doing a rollback is trivial, it's as though you have reinserted the whole
stuff.

Now, that need not harm functionality, and that's why I proposed the
eject_forbidden flag, so that .remove() can say "I'm not done, please
rollback", in which case the device can happily function going forward,
even if we don't rebind the driver to it.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 20:56             ` Toshi Kani
@ 2012-11-29 21:25               ` Rafael J. Wysocki
  2012-12-06 17:10                 ` Jiang Liu
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 21:25 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Hanjun Guo, Vasilis Liaskovitis, isimatu.yasuaki,
	wency, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Thursday, November 29, 2012 01:56:17 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
> > > > > On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
> > > > > > 1. Validate phase - Verify if the request is a supported operation.  All
> > > > > > known restrictions are verified at this phase.  For instance, if a
> > > > > > hot-remove request involves kernel memory, it is failed in this phase.
> > > > > > Since this phase makes no change, no rollback is necessary to fail.  
> > > > > 
> > > > > Actually, we can't do it this way, because the conditions may change between
> > > > > the check and the execution.  So the first phase needs to involve execution
> > > > > to some extent, although only as far as it remains reversible.
> > > > 
> > > > For memory hot-remove, we can check if the target memory ranges are
> > > > within ZONE_MOVABLE.  We should not allow user to change this setup
> > > > during hot-remove operation.  Other things may be to check if a target
> > > > node contains cpu0 (until it is supported), the console UART (assuming
> > > > we cannot delete it), etc.  We should avoid doing rollback as much as we
> > > > can.
> > > 
> > > Yes, we can make some checks upfront as an optimization and fail early if
> > > the conditions are not met, but for correctness we need to repeat those
> > > checks later anyway.  Once we've decided to go for the eject, the conditions
> > > must hold whatever happens.
> > 
> > Agreed.
> 
> BTW, it is not an optimization I am after for this phase.  There are
> many error cases during hot-plug operations.  It is difficult to assure
> that rollback is successful for every error condition in terms of
> testing and maintaining the code.  So, it is easier to fail beforehand
> when possible.

OK, but as I said it is necessary to ensure that the conditions will be met
in the next phases as well if we don't fail.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 21:23                                       ` Rafael J. Wysocki
@ 2012-11-29 21:46                                         ` Toshi Kani
  2012-11-29 22:11                                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 21:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vasilis Liaskovitis, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > > fails). Trimming is not continued. 
> > > > > 
> > > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > > more a general question, not specific to the eject_forbidden suggestion)
> > > > 
> > > > Your change makes sense to me.  At least until we have rollback code in
> > > > place, we need to fail as soon as we hit an error.
> > > 
> > > Are you sure this makes sense?  What happens to the devices that we have
> > > trimmed already and then there's an error?  Looks like they are just unusable
> > > going forward, aren't they?
> > 
> > Yes, the devices trimmed already are released from the kernel, and their
> > memory ranges become unusable.  This is bad.  But I do not think we
> > should trim further to make more devices unusable after an error. 
> > 
> > 
> > > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > > we attempted to eject) and notify the firmware about the failure.
> > > > > 
> > > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > > to implement, correct?
> > > > 
> > > > Yes, rollback is necessary.  But I do not think we need to include it
> > > > into your patch, though.
> > > 
> > > As the first step, we should just trim everything and then return an error
> > > code in my opinion.
> > 
> > But we cannot trim devices with kernel memory.
> 
> Well, let's put it this way: If we started a trim, we should just do it
> completely, in which case we know we can go for the eject, or we should
> roll it back completely.  Now, if you just break the trim on first error,
> the complete rollback is kind of problematic.  It should be doable, but
> it won't be easy.  On the other hand, if you go for the full trim,
> doing a rollback is trivial, it's as though you have reinserted the whole
> stuff.

acpi_bus_check_add() skips initialization when an ACPI device already
has its associated acpi_device.  So, I think it works either way.


> Now, that need not harm functionality, and that's why I proposed the
> eject_forbidden flag, so that .remove() can say "I'm not done, please
> rollback", in which case the device can happily function going forward,
> even if we don't rebind the driver to it.

A partially trimmed acpi_device is hard to rollback.  acpi_device should
be either trimmed completely or intact.  When a function failed to trim
an acpi_device, it needs to rollback its operation for the device before
returning an error.  This is because only the failed function has enough
context to rollback when an error occurred in the middle of its
procedure.

Thanks,
-Toshi  





^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 21:46                                         ` Toshi Kani
@ 2012-11-29 22:11                                           ` Rafael J. Wysocki
  2012-11-29 23:17                                             ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-29 22:11 UTC (permalink / raw)
  To: linux-acpi
  Cc: Toshi Kani, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 21:25 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, November 29, 2012 10:56:30 AM Toshi Kani wrote:
> > > > > On Thu, 2012-11-29 at 12:30 +0100, Vasilis Liaskovitis wrote:
> > > > > > Side-note: In the pre_remove patches, acpi_bus_trim actually returns on the
> > > > > > first error from acpi_bus_remove (e.g. when memory offlining in pre_remove
> > > > > > fails). Trimming is not continued. 
> > > > > > 
> > > > > > Normally, acpi_bus_trim keeps trimming as you say, and always returns the last
> > > > > > error. Is this the desired behaviour that we want to keep for bus_trim? (This is
> > > > > > more a general question, not specific to the eject_forbidden suggestion)
> > > > > 
> > > > > Your change makes sense to me.  At least until we have rollback code in
> > > > > place, we need to fail as soon as we hit an error.
> > > > 
> > > > Are you sure this makes sense?  What happens to the devices that we have
> > > > trimmed already and then there's an error?  Looks like they are just unusable
> > > > going forward, aren't they?
> > > 
> > > Yes, the devices trimmed already are released from the kernel, and their
> > > memory ranges become unusable.  This is bad.  But I do not think we
> > > should trim further to make more devices unusable after an error. 
> > > 
> > > 
> > > > > > > Now, if acpi_bus_hot_remove_device() gets that error code, it should just
> > > > > > > reverse the whole trimming (i.e. trigger acpi_bus_scan() from the device
> > > > > > > we attempted to eject) and notify the firmware about the failure.
> > > > > > 
> > > > > > sounds like this rollback needs to be implemented in any solution we choose
> > > > > > to implement, correct?
> > > > > 
> > > > > Yes, rollback is necessary.  But I do not think we need to include it
> > > > > into your patch, though.
> > > > 
> > > > As the first step, we should just trim everything and then return an error
> > > > code in my opinion.
> > > 
> > > But we cannot trim devices with kernel memory.
> > 
> > Well, let's put it this way: If we started a trim, we should just do it
> > completely, in which case we know we can go for the eject, or we should
> > roll it back completely.  Now, if you just break the trim on first error,
> > the complete rollback is kind of problematic.  It should be doable, but
> > it won't be easy.  On the other hand, if you go for the full trim,
> > doing a rollback is trivial, it's as though you have reinserted the whole
> > stuff.
> 
> acpi_bus_check_add() skips initialization when an ACPI device already
> has its associated acpi_device.  So, I think it works either way.

OK

> > Now, that need not harm functionality, and that's why I proposed the
> > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > rollback", in which case the device can happily function going forward,
> > even if we don't rebind the driver to it.
> 
> A partially trimmed acpi_device is hard to rollback.  acpi_device should
> be either trimmed completely or intact.

I may or may not agree, depending on what you mean by "trimmed". :-)

> When a function failed to trim
> an acpi_device, it needs to rollback its operation for the device before
> returning an error.

Unless it is .remove(), because .remove() is supposed to always succeed
(ie. unbind the driver from the device).  However, it may signal the caller
that something's fishy, by setting a flag in the device object, for example.

> This is because only the failed function has enough
> context to rollback when an error occurred in the middle of its
> procedure.

Not really.  If it actually removes the struct acpi_device then the caller
may run acpi_bus_scan() on that device if necessary.  There may be a problem
if the device has an associated physical node (or more of them), but that
requires special care anyway.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29  4:48     ` Hanjun Guo
@ 2012-11-29 22:27       ` Toshi Kani
  2012-12-03  4:25         ` Hanjun Guo
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 22:27 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> On 2012/11/29 2:41, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>> As discussed in https://patchwork.kernel.org/patch/1581581/
> >>> the driver core remove function needs to always succeed. This means we need
> >>> to know that the device can be successfully removed before acpi_bus_trim / 
> >>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> >>> or SCI-initiated eject of memory devices fail e.g with:
> >>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>>
> >>> since the ACPI core goes ahead and ejects the device regardless of whether the
> >>> the memory is still in use or not.
> >>>
> >>> For this reason a new acpi_device operation called prepare_remove is introduced.
> >>> This operation should be registered for acpi devices whose removal (from kernel
> >>> perspective) can fail.  Memory devices fall in this category.
> >>>
> >>> acpi_bus_remove() is changed to handle removal in 2 steps:
> >>> - preparation for removal i.e. perform part of removal that can fail. Should
> >>>   succeed for device and all its children.
> >>> - if above step was successfull, proceed to actual device removal
> >>
> >> Hi Vasilis,
> >> We met the same problem when we doing computer node hotplug, It is a good idea
> >> to introduce prepare_remove before actual device removal.
> >>
> >> I think we could do more in prepare_remove, such as rollback. In most cases, we can
> >> offline most of memory sections except kernel used pages now, should we rollback
> >> and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> 
> Yes, we have the same point of view with you. We handle this problem in the ACPI
> based hot-plug framework as following:
> 1) hot add / hot remove complete successfully if no error happens;
> 2) automatic rollback to the original state if meets some error ;
> 3) rollback to the original if hot-plug operation cancelled by user ;

Cool!
 
> >> As you may know, the ACPI based hotplug framework we are working on already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >> 	acpi_op_add add;
> >> 	acpi_op_remove remove;
> >> 	acpi_op_start start;
> >> 	acpi_op_bind bind;
> >> 	acpi_op_unbind unbind;
> >> 	acpi_op_notify notify;
> >> #ifdef	CONFIG_ACPI_HOTPLUG
> >> 	struct acpihp_dev_ops *hp_ops;
> >> #endif	/* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can achieve
> >> a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> 
> Good idea ! we also implement a hot-plug operation in 3 phases:
> 1) acpihp_drv_pre_execute
> 2) acpihp_drv_execute
> 3) acpihp_drv_post_execute
> you may refer to :
> https://lkml.org/lkml/2012/11/4/79

Great.  Yes, I will take a look.
 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail. 
> 
> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
> 
> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>    when memory device remove;

Agreed.

> 2) Dependency check involved. For instance, if hot-add a memory device,
>    processor should be added first, otherwise it's not valid to this operation.

I think FW should be the one that assures such dependency.  That is,
when a memory device object is marked as present/enabled/functioning, it
should be ready for the OS to use.

> 3) Race condition check. if the device and its dependent device is in hot-plug
>    process, another request will be denied.

I agree that hot-plug operation should be serialized.  I think another
request should be either queued or denied based on the caller's intent
(i.e. wait-ok or no-wait). 

> No rollback is needed for the above checks.

Great.

> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> 
> In this phase, we introduce a state machine for the hot-plugble device,
> please refer to:
> https://lkml.org/lkml/2012/11/4/79
> 
> I think we have the same idea for the major framework, but the ACPI based
> hot-plug framework implement it differently in detail, right ?

Yes, I am surprised with the similarity.  What I described is something
we had implemented for other OS.  I am still studying how best we can
improve the Linux hotplug code. :)

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 22:11                                           ` Rafael J. Wysocki
@ 2012-11-29 23:17                                             ` Toshi Kani
  2012-11-30  0:13                                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-11-29 23:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > 
> > > Well, let's put it this way: If we started a trim, we should just do it
> > > completely, in which case we know we can go for the eject, or we should
> > > roll it back completely.  Now, if you just break the trim on first error,
> > > the complete rollback is kind of problematic.  It should be doable, but
> > > it won't be easy.  On the other hand, if you go for the full trim,
> > > doing a rollback is trivial, it's as though you have reinserted the whole
> > > stuff.
> > 
> > acpi_bus_check_add() skips initialization when an ACPI device already
> > has its associated acpi_device.  So, I think it works either way.
> 
> OK
> 
> > > Now, that need not harm functionality, and that's why I proposed the
> > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > rollback", in which case the device can happily function going forward,
> > > even if we don't rebind the driver to it.
> > 
> > A partially trimmed acpi_device is hard to rollback.  acpi_device should
> > be either trimmed completely or intact.
> 
> I may or may not agree, depending on what you mean by "trimmed". :-)
> 
> > When a function failed to trim
> > an acpi_device, it needs to rollback its operation for the device before
> > returning an error.
> 
> Unless it is .remove(), because .remove() is supposed to always succeed
> (ie. unbind the driver from the device).  However, it may signal the caller
> that something's fishy, by setting a flag in the device object, for example.

Right, .remove() cannot fail.  We still need to check if we should
continue to use .remove(), though.

As for the flag, are you thinking that we call acpi_bus_trim() with
rmdevice false first, so that it won't remove acpi_device?

> > This is because only the failed function has enough
> > context to rollback when an error occurred in the middle of its
> > procedure.
> 
> Not really.  If it actually removes the struct acpi_device then the caller
> may run acpi_bus_scan() on that device if necessary.  There may be a problem
> if the device has an associated physical node (or more of them), but that
> requires special care anyway.

Well, hot-remove to a device fails when there is a reason to fail.  IOW,
such reason prevented the device to be removed safely.  So, I think we
need to put it back to the original state in this case.  Removing it by
ignoring the cause of failure sounds unsafe to me.  Some status/data may
be left un-deleted as a result.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 23:17                                             ` Toshi Kani
@ 2012-11-30  0:13                                               ` Rafael J. Wysocki
  2012-11-30  1:09                                                 ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-11-30  0:13 UTC (permalink / raw)
  To: Toshi Kani
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thursday, November 29, 2012 04:17:19 PM Toshi Kani wrote:
> On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> > On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > > 
> > > > Well, let's put it this way: If we started a trim, we should just do it
> > > > completely, in which case we know we can go for the eject, or we should
> > > > roll it back completely.  Now, if you just break the trim on first error,
> > > > the complete rollback is kind of problematic.  It should be doable, but
> > > > it won't be easy.  On the other hand, if you go for the full trim,
> > > > doing a rollback is trivial, it's as though you have reinserted the whole
> > > > stuff.
> > > 
> > > acpi_bus_check_add() skips initialization when an ACPI device already
> > > has its associated acpi_device.  So, I think it works either way.
> > 
> > OK
> > 
> > > > Now, that need not harm functionality, and that's why I proposed the
> > > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > > rollback", in which case the device can happily function going forward,
> > > > even if we don't rebind the driver to it.
> > > 
> > > A partially trimmed acpi_device is hard to rollback.  acpi_device should
> > > be either trimmed completely or intact.
> > 
> > I may or may not agree, depending on what you mean by "trimmed". :-)
> > 
> > > When a function failed to trim
> > > an acpi_device, it needs to rollback its operation for the device before
> > > returning an error.
> > 
> > Unless it is .remove(), because .remove() is supposed to always succeed
> > (ie. unbind the driver from the device).  However, it may signal the caller
> > that something's fishy, by setting a flag in the device object, for example.
> 
> Right, .remove() cannot fail.  We still need to check if we should
> continue to use .remove(), though.
> 
> As for the flag, are you thinking that we call acpi_bus_trim() with
> rmdevice false first, so that it won't remove acpi_device?

I'm not sure if that's going to help.

Definitely, .remove() should just unbind the driver from the device.
That's what it's supposed to do.  Still, it may leave some information for
the caller in the device structure itself.  For example, "I have unbound
from the device, but it is not safe to remove it physically".

I'm now thinking that we may need to rework the trimming so that
.remove() is called for all drivers first and the struct acpi_device
objects are not removed at this stage.  Then, if .remove() from one
driver signals the situation like above, the routine will have to
rebind the drivers that have been unbound and we're done.

After that stage, when all drivers have been unbound, we should be
able to go for full eject.  First, we can drop all struct acpi_device
objects in the relevant subtree and then we can run _EJ0.

> > > This is because only the failed function has enough
> > > context to rollback when an error occurred in the middle of its
> > > procedure.
> > 
> > Not really.  If it actually removes the struct acpi_device then the caller
> > may run acpi_bus_scan() on that device if necessary.  There may be a problem
> > if the device has an associated physical node (or more of them), but that
> > requires special care anyway.
> 
> Well, hot-remove to a device fails when there is a reason to fail.  IOW,
> such reason prevented the device to be removed safely.  So, I think we
> need to put it back to the original state in this case.  Removing it by
> ignoring the cause of failure sounds unsafe to me.  Some status/data may
> be left un-deleted as a result.

Again, I may or may not agree with that, depending on whether you're talking
about physical devices or about struct acpi_device objects.

Anyway, I agree that removing struct acpi_device objects may not be worth the
effort if we're going to re-create them in a while, because that may be costly.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-30  0:13                                               ` Rafael J. Wysocki
@ 2012-11-30  1:09                                                 ` Toshi Kani
  0 siblings, 0 replies; 92+ messages in thread
From: Toshi Kani @ 2012-11-30  1:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Vasilis Liaskovitis, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Fri, 2012-11-30 at 01:13 +0100, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 04:17:19 PM Toshi Kani wrote:
> > On Thu, 2012-11-29 at 23:11 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, November 29, 2012 02:46:44 PM Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, November 29, 2012 01:38:39 PM Toshi Kani wrote:
> > > > > Now, that need not harm functionality, and that's why I proposed the
> > > > > eject_forbidden flag, so that .remove() can say "I'm not done, please
> > > > > rollback", in which case the device can happily function going forward,
> > > > > even if we don't rebind the driver to it.
> > > > 
> > > > A partially trimmed acpi_device is hard to rollback.  acpi_device should
> > > > be either trimmed completely or intact.
> > > 
> > > I may or may not agree, depending on what you mean by "trimmed". :-)
> > > 
> > > > When a function failed to trim
> > > > an acpi_device, it needs to rollback its operation for the device before
> > > > returning an error.
> > > 
> > > Unless it is .remove(), because .remove() is supposed to always succeed
> > > (ie. unbind the driver from the device).  However, it may signal the caller
> > > that something's fishy, by setting a flag in the device object, for example.
> > 
> > Right, .remove() cannot fail.  We still need to check if we should
> > continue to use .remove(), though.
> > 
> > As for the flag, are you thinking that we call acpi_bus_trim() with
> > rmdevice false first, so that it won't remove acpi_device?
> 
> I'm not sure if that's going to help.
> 
> Definitely, .remove() should just unbind the driver from the device.
> That's what it's supposed to do.  Still, it may leave some information for
> the caller in the device structure itself.  For example, "I have unbound
> from the device, but it is not safe to remove it physically".

Right.

> I'm now thinking that we may need to rework the trimming so that
> .remove() is called for all drivers first and the struct acpi_device
> objects are not removed at this stage.  Then, if .remove() from one
> driver signals the situation like above, the routine will have to
> rebind the drivers that have been unbound and we're done.
> 
> After that stage, when all drivers have been unbound, we should be
> able to go for full eject.  First, we can drop all struct acpi_device
> objects in the relevant subtree and then we can run _EJ0.

I agree that such approach is worth pursuing.

> > > > This is because only the failed function has enough
> > > > context to rollback when an error occurred in the middle of its
> > > > procedure.
> > > 
> > > Not really.  If it actually removes the struct acpi_device then the caller
> > > may run acpi_bus_scan() on that device if necessary.  There may be a problem
> > > if the device has an associated physical node (or more of them), but that
> > > requires special care anyway.
> > 
> > Well, hot-remove to a device fails when there is a reason to fail.  IOW,
> > such reason prevented the device to be removed safely.  So, I think we
> > need to put it back to the original state in this case.  Removing it by
> > ignoring the cause of failure sounds unsafe to me.  Some status/data may
> > be left un-deleted as a result.
> 
> Again, I may or may not agree with that, depending on whether you're talking
> about physical devices or about struct acpi_device objects.

Sorry, by "hot-remove a device", I was referring removing struct
acpi_device and off-lining its resource.  By "left un-delted", I was
referring its resource left un-deleted, such as memory ranges.

> Anyway, I agree that removing struct acpi_device objects may not be worth the
> effort if we're going to re-create them in a while, because that may be costly.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 22:27       ` Toshi Kani
@ 2012-12-03  4:25         ` Hanjun Guo
  2012-12-04  0:10           ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Hanjun Guo @ 2012-12-03  4:25 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On 2012/11/30 6:27, Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>> On 2012/11/29 2:41, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>>>> the driver core remove function needs to always succeed. This means we need
>>>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>>>> or SCI-initiated eject of memory devices fail e.g with:
>>>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>>>
>>>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>>>> the memory is still in use or not.
>>>>>
>>>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>>>> This operation should be registered for acpi devices whose removal (from kernel
>>>>> perspective) can fail.  Memory devices fall in this category.
>>>>>
>>>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>>>   succeed for device and all its children.
>>>>> - if above step was successfull, proceed to actual device removal
>>>>
>>>> Hi Vasilis,
>>>> We met the same problem when we doing computer node hotplug, It is a good idea
>>>> to introduce prepare_remove before actual device removal.
>>>>
>>>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>>>> offline most of memory sections except kernel used pages now, should we rollback
>>>> and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>
>> Yes, we have the same point of view with you. We handle this problem in the ACPI
>> based hot-plug framework as following:
>> 1) hot add / hot remove complete successfully if no error happens;
>> 2) automatic rollback to the original state if meets some error ;
>> 3) rollback to the original if hot-plug operation cancelled by user ;
> 
> Cool!
>  
>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>
>>>> We introduce hp_ops in struct acpi_device_ops:
>>>> struct acpi_device_ops {
>>>> 	acpi_op_add add;
>>>> 	acpi_op_remove remove;
>>>> 	acpi_op_start start;
>>>> 	acpi_op_bind bind;
>>>> 	acpi_op_unbind unbind;
>>>> 	acpi_op_notify notify;
>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>> 	struct acpihp_dev_ops *hp_ops;
>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>> };
>>>>
>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>> 2) release(): reclaim device from running system
>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>> 5) unconfigure(): remove devices from running system
>>>> 6) post_unconfigure(): free resources used by devices
>>>>
>>>> In this way, we can easily rollback if error happens.
>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>> a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>
>> Good idea ! we also implement a hot-plug operation in 3 phases:
>> 1) acpihp_drv_pre_execute
>> 2) acpihp_drv_execute
>> 3) acpihp_drv_post_execute
>> you may refer to :
>> https://lkml.org/lkml/2012/11/4/79
> 
> Great.  Yes, I will take a look.

Thanks, any comments are welcomed :)

>  
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail. 
>>
>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>
>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>>    when memory device remove;
> 
> Agreed.
> 
>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>    processor should be added first, otherwise it's not valid to this operation.
> 
> I think FW should be the one that assures such dependency.  That is,
> when a memory device object is marked as present/enabled/functioning, it
> should be ready for the OS to use.

Yes, BIOS should do something for the dependency, because BIOS knows the
actual hardware topology. The ACPI specification provides _EDL method to
tell OS the eject device list, but still has no method to tell OS the add device
list now.

For some cases, OS should analyze the dependency in the validate phase. For example,
when hot remove a node (container device), OS should analyze the dependency to get
the remove order as following:
1) Host bridge;
2) Memory devices;
3) Processor devices;
4) Container device itself;

In this way, we can check that all the devices are hot-plugble or not under the
container device before execute phase, and further more, we can remove devices
in order to avoid some crash problems.

> 
>> 3) Race condition check. if the device and its dependent device is in hot-plug
>>    process, another request will be denied.
> 
> I agree that hot-plug operation should be serialized.  I think another
> request should be either queued or denied based on the caller's intent
> (i.e. wait-ok or no-wait). 
> 
>> No rollback is needed for the above checks.
> 
> Great.
> 
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>
>> In this phase, we introduce a state machine for the hot-plugble device,
>> please refer to:
>> https://lkml.org/lkml/2012/11/4/79
>>
>> I think we have the same idea for the major framework, but the ACPI based
>> hot-plug framework implement it differently in detail, right ?
> 
> Yes, I am surprised with the similarity.  What I described is something
> we had implemented for other OS.  I am still studying how best we can
> improve the Linux hotplug code. :)

Great! your experience is very appreciable for me. I think we can share ideas
to achieve a better solution for Linux hotplug code. :)

Thanks
 Hanjun

> 
> Thanks,
> -Toshi
> 
> 
> .
> 



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-03  4:25         ` Hanjun Guo
@ 2012-12-04  0:10           ` Toshi Kani
  2012-12-04  9:16             ` Hanjun Guo
  2012-12-06 16:40             ` Jiang Liu
  0 siblings, 2 replies; 92+ messages in thread
From: Toshi Kani @ 2012-12-04  0:10 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> On 2012/11/30 6:27, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> >> On 2012/11/29 2:41, Toshi Kani wrote:
> >>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>>> As you may know, the ACPI based hotplug framework we are working on already addressed
> >>>> this problem, and the way we slove this problem is a bit like yours.
> >>>>
> >>>> We introduce hp_ops in struct acpi_device_ops:
> >>>> struct acpi_device_ops {
> >>>> 	acpi_op_add add;
> >>>> 	acpi_op_remove remove;
> >>>> 	acpi_op_start start;
> >>>> 	acpi_op_bind bind;
> >>>> 	acpi_op_unbind unbind;
> >>>> 	acpi_op_notify notify;
> >>>> #ifdef	CONFIG_ACPI_HOTPLUG
> >>>> 	struct acpihp_dev_ops *hp_ops;
> >>>> #endif	/* CONFIG_ACPI_HOTPLUG */
> >>>> };
> >>>>
> >>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >>>> 1) pre_release(): optional step to mark device going to be removed/busy
> >>>> 2) release(): reclaim device from running system
> >>>> 3) post_release(): rollback if cancelled by user or error happened
> >>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >>>> 5) unconfigure(): remove devices from running system
> >>>> 6) post_unconfigure(): free resources used by devices
> >>>>
> >>>> In this way, we can easily rollback if error happens.
> >>>> How do you think of this solution, any suggestion ? I think we can achieve
> >>>> a better way for sharing ideas. :)
> >>>
> >>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>> operation should be composed with the following 3 phases.
> >>
> >> Good idea ! we also implement a hot-plug operation in 3 phases:
> >> 1) acpihp_drv_pre_execute
> >> 2) acpihp_drv_execute
> >> 3) acpihp_drv_post_execute
> >> you may refer to :
> >> https://lkml.org/lkml/2012/11/4/79
> > 
> > Great.  Yes, I will take a look.
> 
> Thanks, any comments are welcomed :)

If I read the code right, the framework calls ACPI drivers differently
at boot-time and hot-add as follows.  That is, the new entry points are
called at hot-add only, but .add() is called at both cases.  This
requires .add() to work differently.

Boot    : .add()
Hot-Add : .add(), .pre_configure(), configure(), etc.

I think the boot-time and hot-add initialization should be done
consistently.  While there is difficulty with the current boot sequence,
the framework should be designed to allow them consistent, not make them
diverged.

> >>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>> known restrictions are verified at this phase.  For instance, if a
> >>> hot-remove request involves kernel memory, it is failed in this phase.
> >>> Since this phase makes no change, no rollback is necessary to fail. 
> >>
> >> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
> >>
> >> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
> >>    when memory device remove;
> > 
> > Agreed.
> > 
> >> 2) Dependency check involved. For instance, if hot-add a memory device,
> >>    processor should be added first, otherwise it's not valid to this operation.
> > 
> > I think FW should be the one that assures such dependency.  That is,
> > when a memory device object is marked as present/enabled/functioning, it
> > should be ready for the OS to use.
> 
> Yes, BIOS should do something for the dependency, because BIOS knows the
> actual hardware topology. 

Right.

> The ACPI specification provides _EDL method to
> tell OS the eject device list, but still has no method to tell OS the add device
> list now.

Yes, but I do not think the OS needs special handling for add...

> For some cases, OS should analyze the dependency in the validate phase. For example,
> when hot remove a node (container device), OS should analyze the dependency to get
> the remove order as following:
> 1) Host bridge;
> 2) Memory devices;
> 3) Processor devices;
> 4) Container device itself;

This may be off-topic, but how do you plan to delete I/O devices under a
node?  Are you planning to delete all I/O devices along with the node?

On other OS, we made a separate step called I/O chassis delete, which
off-lines all I/O devices under the node, and is required before a node
hot-remove.  It basically triggers PCIe hot-remove to detach drivers
from all devices.  It does not eject the devices so that they do not
have to be on hot-plug slots.  This step runs user-space scripts to
verify if the devices can be off-lined without disrupting user's
applications, and provides comprehensive reports if any of them are in
use.  Not sure if Linux's PCI hot-remove has such check, but I thought
I'd mention it. :)

> In this way, we can check that all the devices are hot-plugble or not under the
> container device before execute phase, and further more, we can remove devices
> in order to avoid some crash problems.

Yes, we should check if all the resources under the node can be
off-lined at validate phase.  (note, all the devices do not have to have
_EJ0 if that's what you meant by hot-pluggable.)
 
> >> 3) Race condition check. if the device and its dependent device is in hot-plug
> >>    process, another request will be denied.
> > 
> > I agree that hot-plug operation should be serialized.  I think another
> > request should be either queued or denied based on the caller's intent
> > (i.e. wait-ok or no-wait). 
> > 
> >> No rollback is needed for the above checks.
> > 
> > Great.
> > 
> >>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> >>> rolled-back in case of error or cancel.
> >>
> >> In this phase, we introduce a state machine for the hot-plugble device,
> >> please refer to:
> >> https://lkml.org/lkml/2012/11/4/79
> >>
> >> I think we have the same idea for the major framework, but the ACPI based
> >> hot-plug framework implement it differently in detail, right ?
> > 
> > Yes, I am surprised with the similarity.  What I described is something
> > we had implemented for other OS.  I am still studying how best we can
> > improve the Linux hotplug code. :)
> 
> Great! your experience is very appreciable for me. I think we can share ideas
> to achieve a better solution for Linux hotplug code. :)

Sounds great.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-04  0:10           ` Toshi Kani
@ 2012-12-04  9:16             ` Hanjun Guo
  2012-12-04 23:23               ` Toshi Kani
  2012-12-06 16:40             ` Jiang Liu
  1 sibling, 1 reply; 92+ messages in thread
From: Hanjun Guo @ 2012-12-04  9:16 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On 2012/12/4 8:10, Toshi Kani wrote:
> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>> On 2012/11/30 6:27, Toshi Kani wrote:
>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>>>> On 2012/11/29 2:41, Toshi Kani wrote:
>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>>>
>>>>>> We introduce hp_ops in struct acpi_device_ops:
>>>>>> struct acpi_device_ops {
>>>>>> 	acpi_op_add add;
>>>>>> 	acpi_op_remove remove;
>>>>>> 	acpi_op_start start;
>>>>>> 	acpi_op_bind bind;
>>>>>> 	acpi_op_unbind unbind;
>>>>>> 	acpi_op_notify notify;
>>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>>>> 	struct acpihp_dev_ops *hp_ops;
>>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>>>> };
>>>>>>
>>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>>>> 2) release(): reclaim device from running system
>>>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>>>> 5) unconfigure(): remove devices from running system
>>>>>> 6) post_unconfigure(): free resources used by devices
>>>>>>
>>>>>> In this way, we can easily rollback if error happens.
>>>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>>>> a better way for sharing ideas. :)
>>>>>
>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>>> operation should be composed with the following 3 phases.
>>>>
>>>> Good idea ! we also implement a hot-plug operation in 3 phases:
>>>> 1) acpihp_drv_pre_execute
>>>> 2) acpihp_drv_execute
>>>> 3) acpihp_drv_post_execute
>>>> you may refer to :
>>>> https://lkml.org/lkml/2012/11/4/79
>>>
>>> Great.  Yes, I will take a look.
>>
>> Thanks, any comments are welcomed :)
> 
> If I read the code right, the framework calls ACPI drivers differently
> at boot-time and hot-add as follows.  That is, the new entry points are
> called at hot-add only, but .add() is called at both cases.  This
> requires .add() to work differently.

Hi Toshi,
Thanks for your comments!

> 
> Boot    : .add()

Actually, at boot time: .add(), .start()

> Hot-Add : .add(), .pre_configure(), configure(), etc.

Yes, we did it as you said in the framework. We use .pre_configure(), configure(),
and post_configure() to instead of .start() for better error handling and recovery.

> 
> I think the boot-time and hot-add initialization should be done
> consistently.  While there is difficulty with the current boot sequence,
> the framework should be designed to allow them consistent, not make them
> diverged.
> 
>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>> Since this phase makes no change, no rollback is necessary to fail. 
>>>>
>>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>>>
>>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>>>>    when memory device remove;
>>>
>>> Agreed.
>>>
>>>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>>>    processor should be added first, otherwise it's not valid to this operation.
>>>
>>> I think FW should be the one that assures such dependency.  That is,
>>> when a memory device object is marked as present/enabled/functioning, it
>>> should be ready for the OS to use.
>>
>> Yes, BIOS should do something for the dependency, because BIOS knows the
>> actual hardware topology. 
> 
> Right.
> 
>> The ACPI specification provides _EDL method to
>> tell OS the eject device list, but still has no method to tell OS the add device
>> list now.
> 
> Yes, but I do not think the OS needs special handling for add...

Hmm, how about trigger a hot add operation by OS ? we have eject interface for OS, but
have no add interface now, do you think this feature is useful? If it is, I think OS
should analyze the dependency first and tell the user.

> 
>> For some cases, OS should analyze the dependency in the validate phase. For example,
>> when hot remove a node (container device), OS should analyze the dependency to get
>> the remove order as following:
>> 1) Host bridge;
>> 2) Memory devices;
>> 3) Processor devices;
>> 4) Container device itself;
> 
> This may be off-topic, but how do you plan to delete I/O devices under a
> node?  Are you planning to delete all I/O devices along with the node?

Yes, we delete all I/O devices under the node. we delete I/O devices as
following steps:
1) Offline PCI devices;
2) Offline IOAPIC and IOMMU;
and offline I/O devices no matter in use or not.

> 
> On other OS, we made a separate step called I/O chassis delete, which
> off-lines all I/O devices under the node, and is required before a node
> hot-remove.  It basically triggers PCIe hot-remove to detach drivers
> from all devices.  It does not eject the devices so that they do not
> have to be on hot-plug slots.  This step runs user-space scripts to
> verify if the devices can be off-lined without disrupting user's
> applications, and provides comprehensive reports if any of them are in

Great! we also have a plan to implement this feature.

> use.  Not sure if Linux's PCI hot-remove has such check, but I thought
> I'd mention it. :)

Have no such check, I'm sure :)

> 
>> In this way, we can check that all the devices are hot-plugble or not under the
>> container device before execute phase, and further more, we can remove devices
>> in order to avoid some crash problems.
> 
> Yes, we should check if all the resources under the node can be
> off-lined at validate phase.  (note, all the devices do not have to have
> _EJ0 if that's what you meant by hot-pluggable.)

Yes, agreed. For node hotplug, no need for all the devices have _EJ0 method.

Thanks
 Hanjun

>  
>>>> 3) Race condition check. if the device and its dependent device is in hot-plug
>>>>    process, another request will be denied.
>>>
>>> I agree that hot-plug operation should be serialized.  I think another
>>> request should be either queued or denied based on the caller's intent
>>> (i.e. wait-ok or no-wait). 
>>>
>>>> No rollback is needed for the above checks.
>>>
>>> Great.
>>>
>>>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>>>> rolled-back in case of error or cancel.
>>>>
>>>> In this phase, we introduce a state machine for the hot-plugble device,
>>>> please refer to:
>>>> https://lkml.org/lkml/2012/11/4/79
>>>>
>>>> I think we have the same idea for the major framework, but the ACPI based
>>>> hot-plug framework implement it differently in detail, right ?
>>>
>>> Yes, I am surprised with the similarity.  What I described is something
>>> we had implemented for other OS.  I am still studying how best we can
>>> improve the Linux hotplug code. :)
>>
>> Great! your experience is very appreciable for me. I think we can share ideas
>> to achieve a better solution for Linux hotplug code. :)
> 
> Sounds great.
> 
> Thanks,
> -Toshi
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> .
> 



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-04  9:16             ` Hanjun Guo
@ 2012-12-04 23:23               ` Toshi Kani
  2012-12-05 12:10                 ` Hanjun Guo
  2012-12-06 16:47                 ` Jiang Liu
  0 siblings, 2 replies; 92+ messages in thread
From: Toshi Kani @ 2012-12-04 23:23 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
> On 2012/12/4 8:10, Toshi Kani wrote:
> > On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >> On 2012/11/30 6:27, Toshi Kani wrote:
> >>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> >>>> On 2012/11/29 2:41, Toshi Kani wrote:
> >>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
> >>>>>> this problem, and the way we slove this problem is a bit like yours.
> >>>>>>
> >>>>>> We introduce hp_ops in struct acpi_device_ops:
> >>>>>> struct acpi_device_ops {
> >>>>>> 	acpi_op_add add;
> >>>>>> 	acpi_op_remove remove;
> >>>>>> 	acpi_op_start start;
> >>>>>> 	acpi_op_bind bind;
> >>>>>> 	acpi_op_unbind unbind;
> >>>>>> 	acpi_op_notify notify;
> >>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
> >>>>>> 	struct acpihp_dev_ops *hp_ops;
> >>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
> >>>>>> };
> >>>>>>
> >>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
> >>>>>> 2) release(): reclaim device from running system
> >>>>>> 3) post_release(): rollback if cancelled by user or error happened
> >>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >>>>>> 5) unconfigure(): remove devices from running system
> >>>>>> 6) post_unconfigure(): free resources used by devices
> >>>>>>
> >>>>>> In this way, we can easily rollback if error happens.
> >>>>>> How do you think of this solution, any suggestion ? I think we can achieve
> >>>>>> a better way for sharing ideas. :)
> >>>>>
> >>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>>>> operation should be composed with the following 3 phases.
> >>>>
> >>>> Good idea ! we also implement a hot-plug operation in 3 phases:
> >>>> 1) acpihp_drv_pre_execute
> >>>> 2) acpihp_drv_execute
> >>>> 3) acpihp_drv_post_execute
> >>>> you may refer to :
> >>>> https://lkml.org/lkml/2012/11/4/79
> >>>
> >>> Great.  Yes, I will take a look.
> >>
> >> Thanks, any comments are welcomed :)
> > 
> > If I read the code right, the framework calls ACPI drivers differently
> > at boot-time and hot-add as follows.  That is, the new entry points are
> > called at hot-add only, but .add() is called at both cases.  This
> > requires .add() to work differently.
> 
> Hi Toshi,
> Thanks for your comments!
> 
> > 
> > Boot    : .add()
> 
> Actually, at boot time: .add(), .start()

Right.

> > Hot-Add : .add(), .pre_configure(), configure(), etc.
> 
> Yes, we did it as you said in the framework. We use .pre_configure(), configure(),
> and post_configure() to instead of .start() for better error handling and recovery.

I think we should have hot-plug interfaces at the module level, not at
the ACPI-internal level.  In this way, the interfaces can be
platform-neutral and allow any modules to register, which makes it more
consistent with the boot-up sequence.  It can also allow ordering of the
sequence among the registered modules.  Right now, we initiate all
procedures from ACPI during hot-plug, which I think is inflexible and
steps into other module's role.

I am also concerned about the slot handling, which is the core piece of
the infrastructure and only allows hot-plug operations on ACPI objects
where slot objects are previously created by checking _EJ0.  The
infrastructure should allow hot-plug operations on any objects, and it
should not be dependent on the slot design.

I have some rough idea, and it may be easier to review / explain if I
make some code changes.  So, let me prototype it, and send it you all if
that works out.  Hopefully, it won't take too long.

> > I think the boot-time and hot-add initialization should be done
> > consistently.  While there is difficulty with the current boot sequence,
> > the framework should be designed to allow them consistent, not make them
> > diverged.
> > 
> >>>>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>>>> known restrictions are verified at this phase.  For instance, if a
> >>>>> hot-remove request involves kernel memory, it is failed in this phase.
> >>>>> Since this phase makes no change, no rollback is necessary to fail. 
> >>>>
> >>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
> >>>>
> >>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
> >>>>    when memory device remove;
> >>>
> >>> Agreed.
> >>>
> >>>> 2) Dependency check involved. For instance, if hot-add a memory device,
> >>>>    processor should be added first, otherwise it's not valid to this operation.
> >>>
> >>> I think FW should be the one that assures such dependency.  That is,
> >>> when a memory device object is marked as present/enabled/functioning, it
> >>> should be ready for the OS to use.
> >>
> >> Yes, BIOS should do something for the dependency, because BIOS knows the
> >> actual hardware topology. 
> > 
> > Right.
> > 
> >> The ACPI specification provides _EDL method to
> >> tell OS the eject device list, but still has no method to tell OS the add device
> >> list now.
> > 
> > Yes, but I do not think the OS needs special handling for add...
> 
> Hmm, how about trigger a hot add operation by OS ? we have eject interface for OS, but
> have no add interface now, do you think this feature is useful? If it is, I think OS
> should analyze the dependency first and tell the user.

The OS can eject an ACPI device because a target device is owned by the
OS (i.e. enabled).  For hot-add, a target ACPI device is not owned by
the OS (i.e. disabled).  Therefore, the OS is not supposed to change its
state.  So, I do not think we should support a hot-add operation by the
OS.
 
> >> For some cases, OS should analyze the dependency in the validate phase. For example,
> >> when hot remove a node (container device), OS should analyze the dependency to get
> >> the remove order as following:
> >> 1) Host bridge;
> >> 2) Memory devices;
> >> 3) Processor devices;
> >> 4) Container device itself;
> > 
> > This may be off-topic, but how do you plan to delete I/O devices under a
> > node?  Are you planning to delete all I/O devices along with the node?
> 
> Yes, we delete all I/O devices under the node. we delete I/O devices as
> following steps:
> 1) Offline PCI devices;
> 2) Offline IOAPIC and IOMMU;
> and offline I/O devices no matter in use or not.

Oh, off-lining no matter what would be problematic for enterprise
customers... 
 
> > On other OS, we made a separate step called I/O chassis delete, which
> > off-lines all I/O devices under the node, and is required before a node
> > hot-remove.  It basically triggers PCIe hot-remove to detach drivers
> > from all devices.  It does not eject the devices so that they do not
> > have to be on hot-plug slots.  This step runs user-space scripts to
> > verify if the devices can be off-lined without disrupting user's
> > applications, and provides comprehensive reports if any of them are in
> 
> Great! we also have a plan to implement this feature.

That's great!

> > use.  Not sure if Linux's PCI hot-remove has such check, but I thought
> > I'd mention it. :)
> 
> Have no such check, I'm sure :)
> 
> > 
> >> In this way, we can check that all the devices are hot-plugble or not under the
> >> container device before execute phase, and further more, we can remove devices
> >> in order to avoid some crash problems.
> > 
> > Yes, we should check if all the resources under the node can be
> > off-lined at validate phase.  (note, all the devices do not have to have
> > _EJ0 if that's what you meant by hot-pluggable.)
> 
> Yes, agreed. For node hotplug, no need for all the devices have _EJ0 method.

Right.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-04 23:23               ` Toshi Kani
@ 2012-12-05 12:10                 ` Hanjun Guo
  2012-12-05 22:31                   ` Toshi Kani
  2012-12-06 16:47                 ` Jiang Liu
  1 sibling, 1 reply; 92+ messages in thread
From: Hanjun Guo @ 2012-12-05 12:10 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On 2012/12/5 7:23, Toshi Kani wrote:
> On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
>> On 2012/12/4 8:10, Toshi Kani wrote:
>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>>>> On 2012/11/30 6:27, Toshi Kani wrote:
>>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>>>>>> On 2012/11/29 2:41, Toshi Kani wrote:
>>>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>>>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>>>>>
>>>>>>>> We introduce hp_ops in struct acpi_device_ops:
>>>>>>>> struct acpi_device_ops {
>>>>>>>> 	acpi_op_add add;
>>>>>>>> 	acpi_op_remove remove;
>>>>>>>> 	acpi_op_start start;
>>>>>>>> 	acpi_op_bind bind;
>>>>>>>> 	acpi_op_unbind unbind;
>>>>>>>> 	acpi_op_notify notify;
>>>>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>>>>>> 	struct acpihp_dev_ops *hp_ops;
>>>>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>>>>>> };
>>>>>>>>
>>>>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>>>>>> 2) release(): reclaim device from running system
>>>>>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>>>>>> 5) unconfigure(): remove devices from running system
>>>>>>>> 6) post_unconfigure(): free resources used by devices
>>>>>>>>
>>>>>>>> In this way, we can easily rollback if error happens.
>>>>>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>>>>>> a better way for sharing ideas. :)
>>>>>>>
>>>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>>>>> operation should be composed with the following 3 phases.
>>>>>>
>>>>>> Good idea ! we also implement a hot-plug operation in 3 phases:
>>>>>> 1) acpihp_drv_pre_execute
>>>>>> 2) acpihp_drv_execute
>>>>>> 3) acpihp_drv_post_execute
>>>>>> you may refer to :
>>>>>> https://lkml.org/lkml/2012/11/4/79
>>>>>
>>>>> Great.  Yes, I will take a look.
>>>>
>>>> Thanks, any comments are welcomed :)
>>>
>>> If I read the code right, the framework calls ACPI drivers differently
>>> at boot-time and hot-add as follows.  That is, the new entry points are
>>> called at hot-add only, but .add() is called at both cases.  This
>>> requires .add() to work differently.
>>
>> Hi Toshi,
>> Thanks for your comments!
>>
>>>
>>> Boot    : .add()
>>
>> Actually, at boot time: .add(), .start()
> 
> Right.
> 
>>> Hot-Add : .add(), .pre_configure(), configure(), etc.
>>
>> Yes, we did it as you said in the framework. We use .pre_configure(), configure(),
>> and post_configure() to instead of .start() for better error handling and recovery.
> 
> I think we should have hot-plug interfaces at the module level, not at
> the ACPI-internal level.  In this way, the interfaces can be
> platform-neutral and allow any modules to register, which makes it more
> consistent with the boot-up sequence.  It can also allow ordering of the
> sequence among the registered modules.  Right now, we initiate all
> procedures from ACPI during hot-plug, which I think is inflexible and
> steps into other module's role.
> 
> I am also concerned about the slot handling, which is the core piece of
> the infrastructure and only allows hot-plug operations on ACPI objects
> where slot objects are previously created by checking _EJ0.  The
> infrastructure should allow hot-plug operations on any objects, and it
> should not be dependent on the slot design.
> 
> I have some rough idea, and it may be easier to review / explain if I
> make some code changes.  So, let me prototype it, and send it you all if
> that works out.  Hopefully, it won't take too long.

Great! If any thing I can do, please let me know it.

> 
>>> I think the boot-time and hot-add initialization should be done
>>> consistently.  While there is difficulty with the current boot sequence,
>>> the framework should be designed to allow them consistent, not make them
>>> diverged.
>>>
>>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>>>> Since this phase makes no change, no rollback is necessary to fail. 
>>>>>>
>>>>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>>>>>
>>>>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>>>>>>    when memory device remove;
>>>>>
>>>>> Agreed.
>>>>>
>>>>>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>>>>>    processor should be added first, otherwise it's not valid to this operation.
>>>>>
>>>>> I think FW should be the one that assures such dependency.  That is,
>>>>> when a memory device object is marked as present/enabled/functioning, it
>>>>> should be ready for the OS to use.
>>>>
>>>> Yes, BIOS should do something for the dependency, because BIOS knows the
>>>> actual hardware topology. 
>>>
>>> Right.
>>>
>>>> The ACPI specification provides _EDL method to
>>>> tell OS the eject device list, but still has no method to tell OS the add device
>>>> list now.
>>>
>>> Yes, but I do not think the OS needs special handling for add...
>>
>> Hmm, how about trigger a hot add operation by OS ? we have eject interface for OS, but
>> have no add interface now, do you think this feature is useful? If it is, I think OS
>> should analyze the dependency first and tell the user.
> 
> The OS can eject an ACPI device because a target device is owned by the
> OS (i.e. enabled).  For hot-add, a target ACPI device is not owned by
> the OS (i.e. disabled).  Therefore, the OS is not supposed to change its
> state.  So, I do not think we should support a hot-add operation by the
> OS.
>  
>>>> For some cases, OS should analyze the dependency in the validate phase. For example,
>>>> when hot remove a node (container device), OS should analyze the dependency to get
>>>> the remove order as following:
>>>> 1) Host bridge;
>>>> 2) Memory devices;
>>>> 3) Processor devices;
>>>> 4) Container device itself;
>>>
>>> This may be off-topic, but how do you plan to delete I/O devices under a
>>> node?  Are you planning to delete all I/O devices along with the node?
>>
>> Yes, we delete all I/O devices under the node. we delete I/O devices as
>> following steps:
>> 1) Offline PCI devices;
>> 2) Offline IOAPIC and IOMMU;
>> and offline I/O devices no matter in use or not.
> 
> Oh, off-lining no matter what would be problematic for enterprise
> customers... 

Agreed. I think we should do more in user space to check such things, not in the kernel.

Thanks
Hanjun





^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-05 12:10                 ` Hanjun Guo
@ 2012-12-05 22:31                   ` Toshi Kani
  0 siblings, 0 replies; 92+ messages in thread
From: Toshi Kani @ 2012-12-05 22:31 UTC (permalink / raw)
  To: Hanjun Guo
  Cc: Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki, wency, rjw,
	lenb, gregkh, linux-kernel, linux-mm, Tang Chen, Liujiang,
	Huxinwei

On Wed, 2012-12-05 at 20:10 +0800, Hanjun Guo wrote:
> On 2012/12/5 7:23, Toshi Kani wrote:
> > On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
> >> On 2012/12/4 8:10, Toshi Kani wrote:
> >>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >>>> On 2012/11/30 6:27, Toshi Kani wrote:
> >>>
> >>> If I read the code right, the framework calls ACPI drivers differently
> >>> at boot-time and hot-add as follows.  That is, the new entry points are
> >>> called at hot-add only, but .add() is called at both cases.  This
> >>> requires .add() to work differently.
> >>
> >> Hi Toshi,
> >> Thanks for your comments!
> >>
> >>>
> >>> Boot    : .add()
> >>
> >> Actually, at boot time: .add(), .start()
> > 
> > Right.
> > 
> >>> Hot-Add : .add(), .pre_configure(), configure(), etc.
> >>
> >> Yes, we did it as you said in the framework. We use .pre_configure(), configure(),
> >> and post_configure() to instead of .start() for better error handling and recovery.
> > 
> > I think we should have hot-plug interfaces at the module level, not at
> > the ACPI-internal level.  In this way, the interfaces can be
> > platform-neutral and allow any modules to register, which makes it more
> > consistent with the boot-up sequence.  It can also allow ordering of the
> > sequence among the registered modules.  Right now, we initiate all
> > procedures from ACPI during hot-plug, which I think is inflexible and
> > steps into other module's role.
> > 
> > I am also concerned about the slot handling, which is the core piece of
> > the infrastructure and only allows hot-plug operations on ACPI objects
> > where slot objects are previously created by checking _EJ0.  The
> > infrastructure should allow hot-plug operations on any objects, and it
> > should not be dependent on the slot design.
> > 
> > I have some rough idea, and it may be easier to review / explain if I
> > make some code changes.  So, let me prototype it, and send it you all if
> > that works out.  Hopefully, it won't take too long.
> 
> Great! If any thing I can do, please let me know it.

Cool.  Yes, if the prototype turns out to be a good one, we can work
together to improve it. :)
 
Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-11-29 17:44                               ` Toshi Kani
@ 2012-12-06  9:30                                 ` Vasilis Liaskovitis
  2012-12-06 12:50                                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Vasilis Liaskovitis @ 2012-12-06  9:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

Hi,
On Thu, Nov 29, 2012 at 10:44:11AM -0700, Toshi Kani wrote:
> On Thu, 2012-11-29 at 12:04 +0100, Vasilis Liaskovitis wrote:
> 
> Yes, that's what I had in mind along with device_lock().  I think the
> lock is necessary to close the window.
> http://www.spinics.net/lists/linux-mm/msg46973.html
> 
> But as I mentioned in other email, I prefer option 3 with
> suppress_bind_attrs.  So, yes, please take a look to see how it works
> out.

I tested the suppress_bind_attrs and it works by simply setting it to true
before driver registration e.g. 

--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -783,7 +783,8 @@ int acpi_bus_register_driver(struct acpi_driver *driver)
 	driver->drv.name = driver->name;
 	driver->drv.bus = &acpi_bus_type;
 	driver->drv.owner = driver->owner;
-
+    if (!strcmp(driver->class, "memory"))
+        driver->drv.suppress_bind_attrs = true;
 	ret = driver_register(&driver->drv);
 	return ret;
 }

No bind/unbind sysfs files are created when using this, as expected.
I assume we only want to suppress for acpi_memhotplug
(class=ACPI_MEMORY_DEVICE_CLASS i.e. "memory") devices.

Is there agreement on what acpi_bus_trim behaviour and rollback (if any) we
want to have for the current ACPI framework (partial trim or full trim on
failure)?

thanks,

- Vasilis


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-12-06  9:30                                 ` Vasilis Liaskovitis
@ 2012-12-06 12:50                                   ` Rafael J. Wysocki
  2012-12-06 15:41                                     ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-12-06 12:50 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Toshi Kani, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thursday, December 06, 2012 10:30:19 AM Vasilis Liaskovitis wrote:
> Hi,
> On Thu, Nov 29, 2012 at 10:44:11AM -0700, Toshi Kani wrote:
> > On Thu, 2012-11-29 at 12:04 +0100, Vasilis Liaskovitis wrote:
> > 
> > Yes, that's what I had in mind along with device_lock().  I think the
> > lock is necessary to close the window.
> > http://www.spinics.net/lists/linux-mm/msg46973.html
> > 
> > But as I mentioned in other email, I prefer option 3 with
> > suppress_bind_attrs.  So, yes, please take a look to see how it works
> > out.
> 
> I tested the suppress_bind_attrs and it works by simply setting it to true
> before driver registration e.g. 
> 
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -783,7 +783,8 @@ int acpi_bus_register_driver(struct acpi_driver *driver)
>  	driver->drv.name = driver->name;
>  	driver->drv.bus = &acpi_bus_type;
>  	driver->drv.owner = driver->owner;
> -
> +    if (!strcmp(driver->class, "memory"))
> +        driver->drv.suppress_bind_attrs = true;
>  	ret = driver_register(&driver->drv);
>  	return ret;
>  }
> 
> No bind/unbind sysfs files are created when using this, as expected.
> I assume we only want to suppress for acpi_memhotplug
> (class=ACPI_MEMORY_DEVICE_CLASS i.e. "memory") devices.
> 
> Is there agreement on what acpi_bus_trim behaviour and rollback (if any) we
> want to have for the current ACPI framework (partial trim or full trim on
> failure)?

Last time I suggested to split the trimming so that first we only unbind
drivers (and roll back that part, ie. rebind the drivers on errors) and
next we remove the struct acpi_device objects, just before doing the actual
eject.  So there would be two walks of the hierarchy below the device we want
to eject, one for driver unbinding (that can be rolled back) and one for the
actual removal.

Toshi Kani seemed to agree with that and there were no follow-ups.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-12-06 12:50                                   ` Rafael J. Wysocki
@ 2012-12-06 15:41                                     ` Toshi Kani
  2012-12-06 20:32                                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-12-06 15:41 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Vasilis Liaskovitis, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thu, 2012-12-06 at 13:50 +0100, Rafael J. Wysocki wrote:
> On Thursday, December 06, 2012 10:30:19 AM Vasilis Liaskovitis wrote:
> > Hi,
> > On Thu, Nov 29, 2012 at 10:44:11AM -0700, Toshi Kani wrote:
> > > On Thu, 2012-11-29 at 12:04 +0100, Vasilis Liaskovitis wrote:
> > > 
> > > Yes, that's what I had in mind along with device_lock().  I think the
> > > lock is necessary to close the window.
> > > http://www.spinics.net/lists/linux-mm/msg46973.html
> > > 
> > > But as I mentioned in other email, I prefer option 3 with
> > > suppress_bind_attrs.  So, yes, please take a look to see how it works
> > > out.
> > 
> > I tested the suppress_bind_attrs and it works by simply setting it to true
> > before driver registration e.g. 
> > 
> > --- a/drivers/acpi/scan.c
> > +++ b/drivers/acpi/scan.c
> > @@ -783,7 +783,8 @@ int acpi_bus_register_driver(struct acpi_driver *driver)
> >  	driver->drv.name = driver->name;
> >  	driver->drv.bus = &acpi_bus_type;
> >  	driver->drv.owner = driver->owner;
> > -
> > +    if (!strcmp(driver->class, "memory"))
> > +        driver->drv.suppress_bind_attrs = true;
> >  	ret = driver_register(&driver->drv);
> >  	return ret;
> >  }
> > 
> > No bind/unbind sysfs files are created when using this, as expected.
> > I assume we only want to suppress for acpi_memhotplug
> > (class=ACPI_MEMORY_DEVICE_CLASS i.e. "memory") devices.
> > 
> > Is there agreement on what acpi_bus_trim behaviour and rollback (if any) we
> > want to have for the current ACPI framework (partial trim or full trim on
> > failure)?
> 
> Last time I suggested to split the trimming so that first we only unbind
> drivers (and roll back that part, ie. rebind the drivers on errors) and
> next we remove the struct acpi_device objects, just before doing the actual
> eject.  So there would be two walks of the hierarchy below the device we want
> to eject, one for driver unbinding (that can be rolled back) and one for the
> actual removal.
> 
> Toshi Kani seemed to agree with that and there were no follow-ups.

I was hoping to have a short term solution to fix the panic on
attempting to delete a kernel memory range, assuming that the memory
hot-plug feature is going to make into 3.8.  It's a blocker issue for
testing the feature.  Now that the VM patchset does not seem to make
into 3.8, I think we can step back and focus on a long term solution
toward 3.9.

I agree that we should separate resource online/offlining step and
acpi_device creation/deletion step.  It can address the panic and make
rollback easier to handle.  For 3.9, we should have a better framework
in place to handle it in general.  So, I am currently working on a
framework proposal, and hopefully able to send it out in a week or so.

Lastly, thanks Vasilis for testing the suppress_bind_attrs change.  I
think we may still need it for 3.9.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-28 18:41   ` Toshi Kani
  2012-11-29  4:48     ` Hanjun Guo
  2012-11-29 10:15     ` Rafael J. Wysocki
@ 2012-12-06 16:00     ` Jiang Liu
  2012-12-06 16:03       ` Toshi Kani
  2 siblings, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 16:00 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On 11/29/2012 02:41 AM, Toshi Kani wrote:
> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>> the driver core remove function needs to always succeed. This means we need
>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>> or SCI-initiated eject of memory devices fail e.g with:
>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>
>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>> the memory is still in use or not.
>>>
>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>> This operation should be registered for acpi devices whose removal (from kernel
>>> perspective) can fail.  Memory devices fall in this category.
>>>
>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>   succeed for device and all its children.
>>> - if above step was successfull, proceed to actual device removal
>>
>> Hi Vasilis,
>> We met the same problem when we doing computer node hotplug, It is a good idea
>> to introduce prepare_remove before actual device removal.
>>
>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>> offline most of memory sections except kernel used pages now, should we rollback
>> and online the memory sections when prepare_remove failed ?
> 
> I think hot-plug operation should have all-or-nothing semantics.  That
> is, an operation should either complete successfully, or rollback to the
> original state.
> 
>> As you may know, the ACPI based hotplug framework we are working on already addressed
>> this problem, and the way we slove this problem is a bit like yours.
>>
>> We introduce hp_ops in struct acpi_device_ops:
>> struct acpi_device_ops {
>> 	acpi_op_add add;
>> 	acpi_op_remove remove;
>> 	acpi_op_start start;
>> 	acpi_op_bind bind;
>> 	acpi_op_unbind unbind;
>> 	acpi_op_notify notify;
>> #ifdef	CONFIG_ACPI_HOTPLUG
>> 	struct acpihp_dev_ops *hp_ops;
>> #endif	/* CONFIG_ACPI_HOTPLUG */
>> };
>>
>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>> 1) pre_release(): optional step to mark device going to be removed/busy
>> 2) release(): reclaim device from running system
>> 3) post_release(): rollback if cancelled by user or error happened
>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>> 5) unconfigure(): remove devices from running system
>> 6) post_unconfigure(): free resources used by devices
>>
>> In this way, we can easily rollback if error happens.
>> How do you think of this solution, any suggestion ? I think we can achieve
>> a better way for sharing ideas. :)
> 
> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> have not looked at all your changes yet..), but in my mind, a hot-plug
> operation should be composed with the following 3 phases.
> 
> 1. Validate phase - Verify if the request is a supported operation.  All
> known restrictions are verified at this phase.  For instance, if a
> hot-remove request involves kernel memory, it is failed in this phase.
> Since this phase makes no change, no rollback is necessary to fail.  
> 
> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> rolled-back in case of error or cancel.
> 
> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> instance, eject operation is performed at this phase.  
Hi Toshi,
	There are one more step needed. Linux provides sysfs interfaces to
online/offline CPU/memory sections, so we need to protect from concurrent
operations from those interfaces when doing physical hotplug. Think about
following sequence:
Thread 1
1. validate conditions for hot-removal
2. offline memory section A
3.						online memory section A			
4. offline memory section B
5 hot-remove memory device hosting A and B.
Regards!
Gerry
> 
> 
> Thanks,
> -Toshi
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:00     ` Jiang Liu
@ 2012-12-06 16:03       ` Toshi Kani
  2012-12-06 16:25         ` Jiang Liu
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-12-06 16:03 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
> On 11/29/2012 02:41 AM, Toshi Kani wrote:
> > On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>> As discussed in https://patchwork.kernel.org/patch/1581581/
> >>> the driver core remove function needs to always succeed. This means we need
> >>> to know that the device can be successfully removed before acpi_bus_trim / 
> >>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
> >>> or SCI-initiated eject of memory devices fail e.g with:
> >>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
> >>>
> >>> since the ACPI core goes ahead and ejects the device regardless of whether the
> >>> the memory is still in use or not.
> >>>
> >>> For this reason a new acpi_device operation called prepare_remove is introduced.
> >>> This operation should be registered for acpi devices whose removal (from kernel
> >>> perspective) can fail.  Memory devices fall in this category.
> >>>
> >>> acpi_bus_remove() is changed to handle removal in 2 steps:
> >>> - preparation for removal i.e. perform part of removal that can fail. Should
> >>>   succeed for device and all its children.
> >>> - if above step was successfull, proceed to actual device removal
> >>
> >> Hi Vasilis,
> >> We met the same problem when we doing computer node hotplug, It is a good idea
> >> to introduce prepare_remove before actual device removal.
> >>
> >> I think we could do more in prepare_remove, such as rollback. In most cases, we can
> >> offline most of memory sections except kernel used pages now, should we rollback
> >> and online the memory sections when prepare_remove failed ?
> > 
> > I think hot-plug operation should have all-or-nothing semantics.  That
> > is, an operation should either complete successfully, or rollback to the
> > original state.
> > 
> >> As you may know, the ACPI based hotplug framework we are working on already addressed
> >> this problem, and the way we slove this problem is a bit like yours.
> >>
> >> We introduce hp_ops in struct acpi_device_ops:
> >> struct acpi_device_ops {
> >> 	acpi_op_add add;
> >> 	acpi_op_remove remove;
> >> 	acpi_op_start start;
> >> 	acpi_op_bind bind;
> >> 	acpi_op_unbind unbind;
> >> 	acpi_op_notify notify;
> >> #ifdef	CONFIG_ACPI_HOTPLUG
> >> 	struct acpihp_dev_ops *hp_ops;
> >> #endif	/* CONFIG_ACPI_HOTPLUG */
> >> };
> >>
> >> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >> 1) pre_release(): optional step to mark device going to be removed/busy
> >> 2) release(): reclaim device from running system
> >> 3) post_release(): rollback if cancelled by user or error happened
> >> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >> 5) unconfigure(): remove devices from running system
> >> 6) post_unconfigure(): free resources used by devices
> >>
> >> In this way, we can easily rollback if error happens.
> >> How do you think of this solution, any suggestion ? I think we can achieve
> >> a better way for sharing ideas. :)
> > 
> > Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> > have not looked at all your changes yet..), but in my mind, a hot-plug
> > operation should be composed with the following 3 phases.
> > 
> > 1. Validate phase - Verify if the request is a supported operation.  All
> > known restrictions are verified at this phase.  For instance, if a
> > hot-remove request involves kernel memory, it is failed in this phase.
> > Since this phase makes no change, no rollback is necessary to fail.  
> > 
> > 2. Execute phase - Perform hot-add / hot-remove operation that can be
> > rolled-back in case of error or cancel.
> > 
> > 3. Commit phase - Perform the final hot-add / hot-remove operation that
> > cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> > instance, eject operation is performed at this phase.  
> Hi Toshi,
> 	There are one more step needed. Linux provides sysfs interfaces to
> online/offline CPU/memory sections, so we need to protect from concurrent
> operations from those interfaces when doing physical hotplug. Think about
> following sequence:
> Thread 1
> 1. validate conditions for hot-removal
> 2. offline memory section A
> 3.						online memory section A			
> 4. offline memory section B
> 5 hot-remove memory device hosting A and B.

Hi Gerry,

I agree.  And I am working on a proposal that tries to address this
issue by integrating both sysfs and hotplug operations into a framework.


Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:03       ` Toshi Kani
@ 2012-12-06 16:25         ` Jiang Liu
  2012-12-06 16:31           ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 16:25 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On 12/07/2012 12:03 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
>> On 11/29/2012 02:41 AM, Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>>>> the driver core remove function needs to always succeed. This means we need
>>>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>>>> or SCI-initiated eject of memory devices fail e.g with:
>>>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>>>
>>>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>>>> the memory is still in use or not.
>>>>>
>>>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>>>> This operation should be registered for acpi devices whose removal (from kernel
>>>>> perspective) can fail.  Memory devices fall in this category.
>>>>>
>>>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>>>   succeed for device and all its children.
>>>>> - if above step was successfull, proceed to actual device removal
>>>>
>>>> Hi Vasilis,
>>>> We met the same problem when we doing computer node hotplug, It is a good idea
>>>> to introduce prepare_remove before actual device removal.
>>>>
>>>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>>>> offline most of memory sections except kernel used pages now, should we rollback
>>>> and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>>
>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>
>>>> We introduce hp_ops in struct acpi_device_ops:
>>>> struct acpi_device_ops {
>>>> 	acpi_op_add add;
>>>> 	acpi_op_remove remove;
>>>> 	acpi_op_start start;
>>>> 	acpi_op_bind bind;
>>>> 	acpi_op_unbind unbind;
>>>> 	acpi_op_notify notify;
>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>> 	struct acpihp_dev_ops *hp_ops;
>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>> };
>>>>
>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>> 2) release(): reclaim device from running system
>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>> 5) unconfigure(): remove devices from running system
>>>> 6) post_unconfigure(): free resources used by devices
>>>>
>>>> In this way, we can easily rollback if error happens.
>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>> a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>>
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>>
>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>>> instance, eject operation is performed at this phase.  
>> Hi Toshi,
>> 	There are one more step needed. Linux provides sysfs interfaces to
>> online/offline CPU/memory sections, so we need to protect from concurrent
>> operations from those interfaces when doing physical hotplug. Think about
>> following sequence:
>> Thread 1
>> 1. validate conditions for hot-removal
>> 2. offline memory section A
>> 3.						online memory section A			
>> 4. offline memory section B
>> 5 hot-remove memory device hosting A and B.
> 
> Hi Gerry,
> 
> I agree.  And I am working on a proposal that tries to address this
> issue by integrating both sysfs and hotplug operations into a framework.
Hi Toshi,
	But the sysfs for CPU and memory online/offline are platform independent
interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
sure whether it's feasible to merge them. For example we still need offline interface
to stop using faulty CPUs on platform without physical hotplug capabilities.
	We have solved this by adding a "busy" flag to the device, so the sysfs
will just return -EBUSY if the busy flag is set.

Regards!
Gerry

> 
> 
> Thanks,
> -Toshi
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:25         ` Jiang Liu
@ 2012-12-06 16:31           ` Toshi Kani
  2012-12-06 16:52             ` Jiang Liu
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-12-06 16:31 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
> On 12/07/2012 12:03 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
> >> On 11/29/2012 02:41 AM, Toshi Kani wrote:
> >>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
 : 
> >>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>> operation should be composed with the following 3 phases.
> >>>
> >>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>> known restrictions are verified at this phase.  For instance, if a
> >>> hot-remove request involves kernel memory, it is failed in this phase.
> >>> Since this phase makes no change, no rollback is necessary to fail.  
> >>>
> >>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> >>> rolled-back in case of error or cancel.
> >>>
> >>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> >>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> >>> instance, eject operation is performed at this phase.  
> >> Hi Toshi,
> >> 	There are one more step needed. Linux provides sysfs interfaces to
> >> online/offline CPU/memory sections, so we need to protect from concurrent
> >> operations from those interfaces when doing physical hotplug. Think about
> >> following sequence:
> >> Thread 1
> >> 1. validate conditions for hot-removal
> >> 2. offline memory section A
> >> 3.						online memory section A			
> >> 4. offline memory section B
> >> 5 hot-remove memory device hosting A and B.
> > 
> > Hi Gerry,
> > 
> > I agree.  And I am working on a proposal that tries to address this
> > issue by integrating both sysfs and hotplug operations into a framework.
> Hi Toshi,
> 	But the sysfs for CPU and memory online/offline are platform independent
> interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
> sure whether it's feasible to merge them. For example we still need offline interface
> to stop using faulty CPUs on platform without physical hotplug capabilities.
> 	We have solved this by adding a "busy" flag to the device, so the sysfs
> will just return -EBUSY if the busy flag is set.

I am making the framework code platform-independent so that it can
handle both cases.  Well, I am still prototyping, so hopefully it will
work. :)

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-04  0:10           ` Toshi Kani
  2012-12-04  9:16             ` Hanjun Guo
@ 2012-12-06 16:40             ` Jiang Liu
  2012-12-06 20:30               ` Rafael J. Wysocki
  2012-12-07  2:57               ` Toshi Kani
  1 sibling, 2 replies; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 16:40 UTC (permalink / raw)
  To: Toshi Kani, Rafael J. Wysocki
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen,
	Liujiang, Huxinwei

On 12/04/2012 08:10 AM, Toshi Kani wrote:
> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>> On 2012/11/30 6:27, Toshi Kani wrote:
>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>>>> On 2012/11/29 2:41, Toshi Kani wrote:
>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>>>
>>>>>> We introduce hp_ops in struct acpi_device_ops:
>>>>>> struct acpi_device_ops {
>>>>>> 	acpi_op_add add;
>>>>>> 	acpi_op_remove remove;
>>>>>> 	acpi_op_start start;
>>>>>> 	acpi_op_bind bind;
>>>>>> 	acpi_op_unbind unbind;
>>>>>> 	acpi_op_notify notify;
>>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>>>> 	struct acpihp_dev_ops *hp_ops;
>>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>>>> };
>>>>>>
>>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>>>> 2) release(): reclaim device from running system
>>>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>>>> 5) unconfigure(): remove devices from running system
>>>>>> 6) post_unconfigure(): free resources used by devices
>>>>>>
>>>>>> In this way, we can easily rollback if error happens.
>>>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>>>> a better way for sharing ideas. :)
>>>>>
>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>>> operation should be composed with the following 3 phases.
>>>>
>>>> Good idea ! we also implement a hot-plug operation in 3 phases:
>>>> 1) acpihp_drv_pre_execute
>>>> 2) acpihp_drv_execute
>>>> 3) acpihp_drv_post_execute
>>>> you may refer to :
>>>> https://lkml.org/lkml/2012/11/4/79
>>>
>>> Great.  Yes, I will take a look.
>>
>> Thanks, any comments are welcomed :)
> 
> If I read the code right, the framework calls ACPI drivers differently
> at boot-time and hot-add as follows.  That is, the new entry points are
> called at hot-add only, but .add() is called at both cases.  This
> requires .add() to work differently.
> 
> Boot    : .add()
> Hot-Add : .add(), .pre_configure(), configure(), etc.
> 
> I think the boot-time and hot-add initialization should be done
> consistently.  While there is difficulty with the current boot sequence,
> the framework should be designed to allow them consistent, not make them
> diverged.
Hi Toshi,
	We have separated hotplug operations from driver binding/unbinding interface
due to following considerations.
1) Physical CPU and memory devices are initialized/used before the ACPI subsystem
   is initialized. So under normal case, .add() of processor and acpi_memhotplug only
   figures out information about device already in working state instead of starting
   the device.
2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
   if .remove() of CPU and memory drivers do really remove the CPU/memory device
   from the system. And the ACPI processor driver also implements CPU PM funcitonality
   other than hotplug.

And recently Rafael has mentioned that he has a long term view to get rid of the
concept of "ACPI device". If that happens, we could easily move the hotplug
logic from ACPI device drivers into the hotplug framework if the hotplug logic
is separated from the .add()/.remove() callbacks. Actually we could even move all
hotplug only logic into the hotplug framework and don't rely on any ACPI device
driver any more. So we could get rid of all these messy things. We could achieve
that by:
1) moving code shared by ACPI device drivers and the hotplug framework into the core.
2) moving hotplug only code to the framework.

Hi Rafael, what's your thoughts here?

> 
>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>> Since this phase makes no change, no rollback is necessary to fail. 
>>>>
>>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>>>
>>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>>>>    when memory device remove;
>>>
>>> Agreed.
>>>
>>>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>>>    processor should be added first, otherwise it's not valid to this operation.
>>>
>>> I think FW should be the one that assures such dependency.  That is,
>>> when a memory device object is marked as present/enabled/functioning, it
>>> should be ready for the OS to use.
>>
>> Yes, BIOS should do something for the dependency, because BIOS knows the
>> actual hardware topology. 
> 
> Right.
> 
>> The ACPI specification provides _EDL method to
>> tell OS the eject device list, but still has no method to tell OS the add device
>> list now.
> 
> Yes, but I do not think the OS needs special handling for add...
We have a plan to support triggering hot-adding events from OS provided interfaces,
so we also need to solve dependency issues when handling requests from those interfaces.
For need to power on the physical processor before powering on a memory device if
the memory device is attached to a physical processor.

> 
>> For some cases, OS should analyze the dependency in the validate phase. For example,
>> when hot remove a node (container device), OS should analyze the dependency to get
>> the remove order as following:
>> 1) Host bridge;
>> 2) Memory devices;
>> 3) Processor devices;
>> 4) Container device itself;
> 
> This may be off-topic, but how do you plan to delete I/O devices under a
> node?  Are you planning to delete all I/O devices along with the node?
> 
> On other OS, we made a separate step called I/O chassis delete, which
> off-lines all I/O devices under the node, and is required before a node
> hot-remove.  It basically triggers PCIe hot-remove to detach drivers
> from all devices.  It does not eject the devices so that they do not
> have to be on hot-plug slots.  This step runs user-space scripts to
> verify if the devices can be off-lined without disrupting user's
> applications, and provides comprehensive reports if any of them are in
> use.  Not sure if Linux's PCI hot-remove has such check, but I thought
> I'd mention it. :)
Yinghai is working on PCI host bridge hotplug, which just stops all PCI devices
under the host bridge. That's really a little dangerous and we do need help
from userspace to check whether the hot-removal operaitons is fatal, 
e.g. removing PCI device hosting the rootfs.

So in our framework, we have an option to relay hotplug event from firmware
to userspace, so the userspace has a chance to reject the hotplug operations
if it may cause unacceptable disturbance to userspace services.

> 
>> In this way, we can check that all the devices are hot-plugble or not under the
>> container device before execute phase, and further more, we can remove devices
>> in order to avoid some crash problems.
> 
> Yes, we should check if all the resources under the node can be
> off-lined at validate phase.  (note, all the devices do not have to have
> _EJ0 if that's what you meant by hot-pluggable.)
>  
>>>> 3) Race condition check. if the device and its dependent device is in hot-plug
>>>>    process, another request will be denied.
>>>
>>> I agree that hot-plug operation should be serialized.  I think another
>>> request should be either queued or denied based on the caller's intent
>>> (i.e. wait-ok or no-wait). 
>>>
>>>> No rollback is needed for the above checks.
>>>
>>> Great.
>>>
>>>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>>>> rolled-back in case of error or cancel.
>>>>
>>>> In this phase, we introduce a state machine for the hot-plugble device,
>>>> please refer to:
>>>> https://lkml.org/lkml/2012/11/4/79
>>>>
>>>> I think we have the same idea for the major framework, but the ACPI based
>>>> hot-plug framework implement it differently in detail, right ?
>>>
>>> Yes, I am surprised with the similarity.  What I described is something
>>> we had implemented for other OS.  I am still studying how best we can
>>> improve the Linux hotplug code. :)
>>
>> Great! your experience is very appreciable for me. I think we can share ideas
>> to achieve a better solution for Linux hotplug code. :)
> 
> Sounds great.
> 
> Thanks,
> -Toshi
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-04 23:23               ` Toshi Kani
  2012-12-05 12:10                 ` Hanjun Guo
@ 2012-12-06 16:47                 ` Jiang Liu
  2012-12-07  2:25                   ` Toshi Kani
  1 sibling, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 16:47 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen,
	Liujiang, Huxinwei

On 12/05/2012 07:23 AM, Toshi Kani wrote:
> On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
>> On 2012/12/4 8:10, Toshi Kani wrote:
>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>>>> On 2012/11/30 6:27, Toshi Kani wrote:
>>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>>>>>> On 2012/11/29 2:41, Toshi Kani wrote:
>>>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>>>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>>>>>
>>>>>>>> We introduce hp_ops in struct acpi_device_ops:
>>>>>>>> struct acpi_device_ops {
>>>>>>>> 	acpi_op_add add;
>>>>>>>> 	acpi_op_remove remove;
>>>>>>>> 	acpi_op_start start;
>>>>>>>> 	acpi_op_bind bind;
>>>>>>>> 	acpi_op_unbind unbind;
>>>>>>>> 	acpi_op_notify notify;
>>>>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>>>>>> 	struct acpihp_dev_ops *hp_ops;
>>>>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>>>>>> };
>>>>>>>>
>>>>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>>>>>> 2) release(): reclaim device from running system
>>>>>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>>>>>> 5) unconfigure(): remove devices from running system
>>>>>>>> 6) post_unconfigure(): free resources used by devices
>>>>>>>>
>>>>>>>> In this way, we can easily rollback if error happens.
>>>>>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>>>>>> a better way for sharing ideas. :)
>>>>>>>
>>>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>>>>> operation should be composed with the following 3 phases.
>>>>>>
>>>>>> Good idea ! we also implement a hot-plug operation in 3 phases:
>>>>>> 1) acpihp_drv_pre_execute
>>>>>> 2) acpihp_drv_execute
>>>>>> 3) acpihp_drv_post_execute
>>>>>> you may refer to :
>>>>>> https://lkml.org/lkml/2012/11/4/79
>>>>>
>>>>> Great.  Yes, I will take a look.
>>>>
>>>> Thanks, any comments are welcomed :)
>>>
>>> If I read the code right, the framework calls ACPI drivers differently
>>> at boot-time and hot-add as follows.  That is, the new entry points are
>>> called at hot-add only, but .add() is called at both cases.  This
>>> requires .add() to work differently.
>>
>> Hi Toshi,
>> Thanks for your comments!
>>
>>>
>>> Boot    : .add()
>>
>> Actually, at boot time: .add(), .start()
> 
> Right.
> 
>>> Hot-Add : .add(), .pre_configure(), configure(), etc.
>>
>> Yes, we did it as you said in the framework. We use .pre_configure(), configure(),
>> and post_configure() to instead of .start() for better error handling and recovery.
> 
> I think we should have hot-plug interfaces at the module level, not at
> the ACPI-internal level.  In this way, the interfaces can be
> platform-neutral and allow any modules to register, which makes it more
> consistent with the boot-up sequence.  It can also allow ordering of the
> sequence among the registered modules.  Right now, we initiate all
> procedures from ACPI during hot-plug, which I think is inflexible and
> steps into other module's role.
> 
> I am also concerned about the slot handling, which is the core piece of
> the infrastructure and only allows hot-plug operations on ACPI objects
> where slot objects are previously created by checking _EJ0.  The
> infrastructure should allow hot-plug operations on any objects, and it
> should not be dependent on the slot design.
> 
> I have some rough idea, and it may be easier to review / explain if I
> make some code changes.  So, let me prototype it, and send it you all if
> that works out.  Hopefully, it won't take too long.
> 
>>> I think the boot-time and hot-add initialization should be done
>>> consistently.  While there is difficulty with the current boot sequence,
>>> the framework should be designed to allow them consistent, not make them
>>> diverged.
>>>
>>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>>>> Since this phase makes no change, no rollback is necessary to fail. 
>>>>>>
>>>>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>>>>>
>>>>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>>>>>>    when memory device remove;
>>>>>
>>>>> Agreed.
>>>>>
>>>>>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>>>>>    processor should be added first, otherwise it's not valid to this operation.
>>>>>
>>>>> I think FW should be the one that assures such dependency.  That is,
>>>>> when a memory device object is marked as present/enabled/functioning, it
>>>>> should be ready for the OS to use.
>>>>
>>>> Yes, BIOS should do something for the dependency, because BIOS knows the
>>>> actual hardware topology. 
>>>
>>> Right.
>>>
>>>> The ACPI specification provides _EDL method to
>>>> tell OS the eject device list, but still has no method to tell OS the add device
>>>> list now.
>>>
>>> Yes, but I do not think the OS needs special handling for add...
>>
>> Hmm, how about trigger a hot add operation by OS ? we have eject interface for OS, but
>> have no add interface now, do you think this feature is useful? If it is, I think OS
>> should analyze the dependency first and tell the user.
> 
> The OS can eject an ACPI device because a target device is owned by the
> OS (i.e. enabled).  For hot-add, a target ACPI device is not owned by
> the OS (i.e. disabled).  Therefore, the OS is not supposed to change its
> state.  So, I do not think we should support a hot-add operation by the
> OS.
We depends on the firmware to provide an interface to actually hot-add the device.
The sequence is:
1) user trigger hot-add request by sysfs interfaces.
2) hotplug framework validates conditions for hot-adding (dependency)
3) hotplug framework invokes firmware interfaces to request a hot-adding operation.
4) firmware sends an ACPI notificaitons after powering on/initializing the device
5) OS adds the devices into running system.

>  
>>>> For some cases, OS should analyze the dependency in the validate phase. For example,
>>>> when hot remove a node (container device), OS should analyze the dependency to get
>>>> the remove order as following:
>>>> 1) Host bridge;
>>>> 2) Memory devices;
>>>> 3) Processor devices;
>>>> 4) Container device itself;
>>>
>>> This may be off-topic, but how do you plan to delete I/O devices under a
>>> node?  Are you planning to delete all I/O devices along with the node?
>>
>> Yes, we delete all I/O devices under the node. we delete I/O devices as
>> following steps:
>> 1) Offline PCI devices;
>> 2) Offline IOAPIC and IOMMU;
>> and offline I/O devices no matter in use or not.
> 
> Oh, off-lining no matter what would be problematic for enterprise
> customers... 
>  
>>> On other OS, we made a separate step called I/O chassis delete, which
>>> off-lines all I/O devices under the node, and is required before a node
>>> hot-remove.  It basically triggers PCIe hot-remove to detach drivers
>>> from all devices.  It does not eject the devices so that they do not
>>> have to be on hot-plug slots.  This step runs user-space scripts to
>>> verify if the devices can be off-lined without disrupting user's
>>> applications, and provides comprehensive reports if any of them are in
>>
>> Great! we also have a plan to implement this feature.
> 
> That's great!
> 
>>> use.  Not sure if Linux's PCI hot-remove has such check, but I thought
>>> I'd mention it. :)
>>
>> Have no such check, I'm sure :)
>>
>>>
>>>> In this way, we can check that all the devices are hot-plugble or not under the
>>>> container device before execute phase, and further more, we can remove devices
>>>> in order to avoid some crash problems.
>>>
>>> Yes, we should check if all the resources under the node can be
>>> off-lined at validate phase.  (note, all the devices do not have to have
>>> _EJ0 if that's what you meant by hot-pluggable.)
>>
>> Yes, agreed. For node hotplug, no need for all the devices have _EJ0 method.
> 
> Right.
> 
> Thanks,
> -Toshi
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:31           ` Toshi Kani
@ 2012-12-06 16:52             ` Jiang Liu
  2012-12-06 17:09               ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 16:52 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On 12/07/2012 12:31 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
>> On 12/07/2012 12:03 AM, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
>>>> On 11/29/2012 02:41 AM, Toshi Kani wrote:
>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>  : 
>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>>> operation should be composed with the following 3 phases.
>>>>>
>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>> Since this phase makes no change, no rollback is necessary to fail.  
>>>>>
>>>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>>>> rolled-back in case of error or cancel.
>>>>>
>>>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>>>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>>>>> instance, eject operation is performed at this phase.  
>>>> Hi Toshi,
>>>> 	There are one more step needed. Linux provides sysfs interfaces to
>>>> online/offline CPU/memory sections, so we need to protect from concurrent
>>>> operations from those interfaces when doing physical hotplug. Think about
>>>> following sequence:
>>>> Thread 1
>>>> 1. validate conditions for hot-removal
>>>> 2. offline memory section A
>>>> 3.						online memory section A			
>>>> 4. offline memory section B
>>>> 5 hot-remove memory device hosting A and B.
>>>
>>> Hi Gerry,
>>>
>>> I agree.  And I am working on a proposal that tries to address this
>>> issue by integrating both sysfs and hotplug operations into a framework.
>> Hi Toshi,
>> 	But the sysfs for CPU and memory online/offline are platform independent
>> interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
>> sure whether it's feasible to merge them. For example we still need offline interface
>> to stop using faulty CPUs on platform without physical hotplug capabilities.
>> 	We have solved this by adding a "busy" flag to the device, so the sysfs
>> will just return -EBUSY if the busy flag is set.
> 
> I am making the framework code platform-independent so that it can
> handle both cases.  Well, I am still prototyping, so hopefully it will
> work. :)
Do you mean implementing a framework to manage hotplug of any type of devices?
That sounds like a huge plan:)

Otherwise there may be a gap. CPU online/offline interface deals with logical
CPU, and hotplug driver deals with physical devices(processor). They may be different
by related objects.

> 
> Thanks,
> -Toshi
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 10:15     ` Rafael J. Wysocki
  2012-11-29 11:36       ` Vasilis Liaskovitis
  2012-11-29 17:03       ` Toshi Kani
@ 2012-12-06 16:56       ` Jiang Liu
  2 siblings, 0 replies; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 16:56 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: linux-acpi, Toshi Kani, Hanjun Guo, Vasilis Liaskovitis,
	isimatu.yasuaki, wency, lenb, gregkh, linux-kernel, linux-mm,
	Tang Chen

On 11/29/2012 06:15 PM, Rafael J. Wysocki wrote:
> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>>> the driver core remove function needs to always succeed. This means we need
>>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>>> or SCI-initiated eject of memory devices fail e.g with:
>>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>>
>>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>>> the memory is still in use or not.
>>>>
>>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>>> This operation should be registered for acpi devices whose removal (from kernel
>>>> perspective) can fail.  Memory devices fall in this category.
>>>>
>>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>>   succeed for device and all its children.
>>>> - if above step was successfull, proceed to actual device removal
>>>
>>> Hi Vasilis,
>>> We met the same problem when we doing computer node hotplug, It is a good idea
>>> to introduce prepare_remove before actual device removal.
>>>
>>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>>> offline most of memory sections except kernel used pages now, should we rollback
>>> and online the memory sections when prepare_remove failed ?
>>
>> I think hot-plug operation should have all-or-nothing semantics.  That
>> is, an operation should either complete successfully, or rollback to the
>> original state.
> 
> That's correct.
> 
>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>> this problem, and the way we slove this problem is a bit like yours.
>>>
>>> We introduce hp_ops in struct acpi_device_ops:
>>> struct acpi_device_ops {
>>> 	acpi_op_add add;
>>> 	acpi_op_remove remove;
>>> 	acpi_op_start start;
>>> 	acpi_op_bind bind;
>>> 	acpi_op_unbind unbind;
>>> 	acpi_op_notify notify;
>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>> 	struct acpihp_dev_ops *hp_ops;
>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>> };
>>>
>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>> 2) release(): reclaim device from running system
>>> 3) post_release(): rollback if cancelled by user or error happened
>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>> 5) unconfigure(): remove devices from running system
>>> 6) post_unconfigure(): free resources used by devices
>>>
>>> In this way, we can easily rollback if error happens.
>>> How do you think of this solution, any suggestion ? I think we can achieve
>>> a better way for sharing ideas. :)
>>
>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>> have not looked at all your changes yet..), but in my mind, a hot-plug
>> operation should be composed with the following 3 phases.
>>
>> 1. Validate phase - Verify if the request is a supported operation.  All
>> known restrictions are verified at this phase.  For instance, if a
>> hot-remove request involves kernel memory, it is failed in this phase.
>> Since this phase makes no change, no rollback is necessary to fail.  
> 
> Actually, we can't do it this way, because the conditions may change between
> the check and the execution.  So the first phase needs to involve execution
> to some extent, although only as far as it remains reversible.
Hi Rafael,
	A possible way to solve this issue is:
1) mark device busy
2) check condition and mark device as normal if condition check fails.
3) reclaim the device and mark device as normal if reclaim fails.
4) remove the device.

> 
>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>> rolled-back in case of error or cancel.
> 
> I would just merge 1 and 2.
> 
>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>> instance, eject operation is performed at this phase.  
> 
> Yup.
> 
> Thanks,
> Rafael
> 
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 11:36       ` Vasilis Liaskovitis
@ 2012-12-06 16:59         ` Jiang Liu
  0 siblings, 0 replies; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 16:59 UTC (permalink / raw)
  To: Vasilis Liaskovitis
  Cc: Rafael J. Wysocki, linux-acpi, Toshi Kani, Hanjun Guo,
	isimatu.yasuaki, wency, lenb, gregkh, linux-kernel, linux-mm,
	Tang Chen

On 11/29/2012 07:36 PM, Vasilis Liaskovitis wrote:
> On Thu, Nov 29, 2012 at 11:15:31AM +0100, Rafael J. Wysocki wrote:
>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>> We met the same problem when we doing computer node hotplug, It is a good idea
>>>> to introduce prepare_remove before actual device removal.
>>>>
>>>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>>>> offline most of memory sections except kernel used pages now, should we rollback
>>>> and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>
>> That's correct.
>>
>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>
>>>> We introduce hp_ops in struct acpi_device_ops:
>>>> struct acpi_device_ops {
>>>> 	acpi_op_add add;
>>>> 	acpi_op_remove remove;
>>>> 	acpi_op_start start;
>>>> 	acpi_op_bind bind;
>>>> 	acpi_op_unbind unbind;
>>>> 	acpi_op_notify notify;
>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>> 	struct acpihp_dev_ops *hp_ops;
>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>> };
>>>>
>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>> 2) release(): reclaim device from running system
>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>> 5) unconfigure(): remove devices from running system
>>>> 6) post_unconfigure(): free resources used by devices
>>>>
>>>> In this way, we can easily rollback if error happens.
>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>> a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>
>> Actually, we can't do it this way, because the conditions may change between
>> the check and the execution.  So the first phase needs to involve execution
>> to some extent, although only as far as it remains reversible.
>>
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>
>> I would just merge 1 and 2.
> 
> I agree steps 1 and 2 can be merged, at least for the current ACPI framework.
> E.g. for memory hotplug, the mm function we call for memory removal
> (remove_memory) handles both these steps.
> 
> The new ACPI framework could perhaps expand the operations as Hanjun described,
> if it makes sense.
Hi Vasilis,
	We have worked some prototypes to split the memory hotplug logic in mem_hotplug.c
into minor steps, so it would be easier for error handling/cancellation. But we still
need to improve the code quality and merge with changes from Fujitsu.
Regards!

> 
> thanks,
> 
> - Vasilis
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 17:03       ` Toshi Kani
  2012-11-29 20:30         ` Rafael J. Wysocki
@ 2012-12-06 17:01         ` Jiang Liu
  1 sibling, 0 replies; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 17:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, linux-acpi, Hanjun Guo, Vasilis Liaskovitis,
	isimatu.yasuaki, wency, lenb, gregkh, linux-kernel, linux-mm,
	Tang Chen

On 11/30/2012 01:03 AM, Toshi Kani wrote:
> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>>>> the driver core remove function needs to always succeed. This means we need
>>>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>>>> or SCI-initiated eject of memory devices fail e.g with:
>>>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>>>
>>>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>>>> the memory is still in use or not.
>>>>>
>>>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>>>> This operation should be registered for acpi devices whose removal (from kernel
>>>>> perspective) can fail.  Memory devices fall in this category.
>>>>>
>>>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>>>   succeed for device and all its children.
>>>>> - if above step was successfull, proceed to actual device removal
>>>>
>>>> Hi Vasilis,
>>>> We met the same problem when we doing computer node hotplug, It is a good idea
>>>> to introduce prepare_remove before actual device removal.
>>>>
>>>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>>>> offline most of memory sections except kernel used pages now, should we rollback
>>>> and online the memory sections when prepare_remove failed ?
>>>
>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>> is, an operation should either complete successfully, or rollback to the
>>> original state.
>>
>> That's correct.
>>
>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>
>>>> We introduce hp_ops in struct acpi_device_ops:
>>>> struct acpi_device_ops {
>>>> 	acpi_op_add add;
>>>> 	acpi_op_remove remove;
>>>> 	acpi_op_start start;
>>>> 	acpi_op_bind bind;
>>>> 	acpi_op_unbind unbind;
>>>> 	acpi_op_notify notify;
>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>> 	struct acpihp_dev_ops *hp_ops;
>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>> };
>>>>
>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>> 2) release(): reclaim device from running system
>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>> 5) unconfigure(): remove devices from running system
>>>> 6) post_unconfigure(): free resources used by devices
>>>>
>>>> In this way, we can easily rollback if error happens.
>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>> a better way for sharing ideas. :)
>>>
>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>> operation should be composed with the following 3 phases.
>>>
>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>> known restrictions are verified at this phase.  For instance, if a
>>> hot-remove request involves kernel memory, it is failed in this phase.
>>> Since this phase makes no change, no rollback is necessary to fail.  
>>
>> Actually, we can't do it this way, because the conditions may change between
>> the check and the execution.  So the first phase needs to involve execution
>> to some extent, although only as far as it remains reversible.
> 
> For memory hot-remove, we can check if the target memory ranges are
> within ZONE_MOVABLE.  We should not allow user to change this setup
> during hot-remove operation.  Other things may be to check if a target
> node contains cpu0 (until it is supported), the console UART (assuming
> we cannot delete it), etc.  We should avoid doing rollback as much as we
> can.
Fengguang from Intel is working on a patchset to hot-remove CPU0(BSP)
on x86 platforms and he has posted several versions. Maybe we could eventually
remove CPU0 on x86.

> 
> Thanks,
> -Toshi
> 
> 
>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>> rolled-back in case of error or cancel.
>>
>> I would just merge 1 and 2.
>>
>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>>> instance, eject operation is performed at this phase.  
>>
>> Yup.
>>
>> Thanks,
>> Rafael
>>
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 20:30         ` Rafael J. Wysocki
  2012-11-29 20:39           ` Toshi Kani
@ 2012-12-06 17:07           ` Jiang Liu
  1 sibling, 0 replies; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 17:07 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, linux-acpi, Hanjun Guo, Vasilis Liaskovitis,
	isimatu.yasuaki, wency, lenb, gregkh, linux-kernel, linux-mm,
	Tang Chen

On 11/30/2012 04:30 AM, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
>> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
>>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>>> As discussed in https://patchwork.kernel.org/patch/1581581/
>>>>>> the driver core remove function needs to always succeed. This means we need
>>>>>> to know that the device can be successfully removed before acpi_bus_trim / 
>>>>>> acpi_bus_hot_remove_device are called. This can cause panics when OSPM-initiated
>>>>>> or SCI-initiated eject of memory devices fail e.g with:
>>>>>> echo 1 >/sys/bus/pci/devices/PNP0C80:XX/eject
>>>>>>
>>>>>> since the ACPI core goes ahead and ejects the device regardless of whether the
>>>>>> the memory is still in use or not.
>>>>>>
>>>>>> For this reason a new acpi_device operation called prepare_remove is introduced.
>>>>>> This operation should be registered for acpi devices whose removal (from kernel
>>>>>> perspective) can fail.  Memory devices fall in this category.
>>>>>>
>>>>>> acpi_bus_remove() is changed to handle removal in 2 steps:
>>>>>> - preparation for removal i.e. perform part of removal that can fail. Should
>>>>>>   succeed for device and all its children.
>>>>>> - if above step was successfull, proceed to actual device removal
>>>>>
>>>>> Hi Vasilis,
>>>>> We met the same problem when we doing computer node hotplug, It is a good idea
>>>>> to introduce prepare_remove before actual device removal.
>>>>>
>>>>> I think we could do more in prepare_remove, such as rollback. In most cases, we can
>>>>> offline most of memory sections except kernel used pages now, should we rollback
>>>>> and online the memory sections when prepare_remove failed ?
>>>>
>>>> I think hot-plug operation should have all-or-nothing semantics.  That
>>>> is, an operation should either complete successfully, or rollback to the
>>>> original state.
>>>
>>> That's correct.
>>>
>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>>
>>>>> We introduce hp_ops in struct acpi_device_ops:
>>>>> struct acpi_device_ops {
>>>>> 	acpi_op_add add;
>>>>> 	acpi_op_remove remove;
>>>>> 	acpi_op_start start;
>>>>> 	acpi_op_bind bind;
>>>>> 	acpi_op_unbind unbind;
>>>>> 	acpi_op_notify notify;
>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>>> 	struct acpihp_dev_ops *hp_ops;
>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>>> };
>>>>>
>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>>> 2) release(): reclaim device from running system
>>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>>> 5) unconfigure(): remove devices from running system
>>>>> 6) post_unconfigure(): free resources used by devices
>>>>>
>>>>> In this way, we can easily rollback if error happens.
>>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>>> a better way for sharing ideas. :)
>>>>
>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>> operation should be composed with the following 3 phases.
>>>>
>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>> known restrictions are verified at this phase.  For instance, if a
>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>> Since this phase makes no change, no rollback is necessary to fail.  
>>>
>>> Actually, we can't do it this way, because the conditions may change between
>>> the check and the execution.  So the first phase needs to involve execution
>>> to some extent, although only as far as it remains reversible.
>>
>> For memory hot-remove, we can check if the target memory ranges are
>> within ZONE_MOVABLE.  We should not allow user to change this setup
>> during hot-remove operation.  Other things may be to check if a target
>> node contains cpu0 (until it is supported), the console UART (assuming
>> we cannot delete it), etc.  We should avoid doing rollback as much as we
>> can.
> 
> Yes, we can make some checks upfront as an optimization and fail early if
> the conditions are not met, but for correctness we need to repeat those
> checks later anyway.  Once we've decided to go for the eject, the conditions
> must hold whatever happens.
Hi Rafael,
	Another reason for us to split hotplug operations into minor/tiny
steps is to support cancellation other than error handling. Theoretical
it may take infinite time to hot-remove a memory device, so we should provide
an interface for user to cancel ongoing hot-removal operations. Currently that's
done by timeout in the memory hot-remove code path, but I think it not the 
best solutions. We should provide choices to users:
1) wait for ever to remove a hot-removal operation
2) cancel an ongoing hot-removal operation if it takes too long

Regards!
Gerry
> 
> Thanks,
> Rafael
> 
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:52             ` Jiang Liu
@ 2012-12-06 17:09               ` Toshi Kani
  2012-12-06 17:30                 ` Jiang Liu
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-12-06 17:09 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
> On 12/07/2012 12:31 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
> >> On 12/07/2012 12:03 AM, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
> >>>> On 11/29/2012 02:41 AM, Toshi Kani wrote:
> >>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >  : 
> >>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>>>> operation should be composed with the following 3 phases.
> >>>>>
> >>>>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>>>> known restrictions are verified at this phase.  For instance, if a
> >>>>> hot-remove request involves kernel memory, it is failed in this phase.
> >>>>> Since this phase makes no change, no rollback is necessary to fail.  
> >>>>>
> >>>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> >>>>> rolled-back in case of error or cancel.
> >>>>>
> >>>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> >>>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> >>>>> instance, eject operation is performed at this phase.  
> >>>> Hi Toshi,
> >>>> 	There are one more step needed. Linux provides sysfs interfaces to
> >>>> online/offline CPU/memory sections, so we need to protect from concurrent
> >>>> operations from those interfaces when doing physical hotplug. Think about
> >>>> following sequence:
> >>>> Thread 1
> >>>> 1. validate conditions for hot-removal
> >>>> 2. offline memory section A
> >>>> 3.						online memory section A			
> >>>> 4. offline memory section B
> >>>> 5 hot-remove memory device hosting A and B.
> >>>
> >>> Hi Gerry,
> >>>
> >>> I agree.  And I am working on a proposal that tries to address this
> >>> issue by integrating both sysfs and hotplug operations into a framework.
> >> Hi Toshi,
> >> 	But the sysfs for CPU and memory online/offline are platform independent
> >> interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
> >> sure whether it's feasible to merge them. For example we still need offline interface
> >> to stop using faulty CPUs on platform without physical hotplug capabilities.
> >> 	We have solved this by adding a "busy" flag to the device, so the sysfs
> >> will just return -EBUSY if the busy flag is set.
> > 
> > I am making the framework code platform-independent so that it can
> > handle both cases.  Well, I am still prototyping, so hopefully it will
> > work. :)
> Do you mean implementing a framework to manage hotplug of any type of devices?
> That sounds like a huge plan:)
> 
> Otherwise there may be a gap. CPU online/offline interface deals with logical
> CPU, and hotplug driver deals with physical devices(processor). They may be different
> by related objects.

Actually it is not a huge plan.  The framework I am thinking of is to
enable a hotplug sequencer something analogous to do_initcalls() at the
boot sequence.  I am not doing any huge re-work.  That said, I am
currently testing my theory, so I won't promise anything, either. :)

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-11-29 21:25               ` Rafael J. Wysocki
@ 2012-12-06 17:10                 ` Jiang Liu
  0 siblings, 0 replies; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 17:10 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, linux-acpi, Hanjun Guo, Vasilis Liaskovitis,
	isimatu.yasuaki, wency, lenb, gregkh, linux-kernel, linux-mm,
	Tang Chen

On 11/30/2012 05:25 AM, Rafael J. Wysocki wrote:
> On Thursday, November 29, 2012 01:56:17 PM Toshi Kani wrote:
>> On Thu, 2012-11-29 at 13:39 -0700, Toshi Kani wrote:
>>> On Thu, 2012-11-29 at 21:30 +0100, Rafael J. Wysocki wrote:
>>>> On Thursday, November 29, 2012 10:03:12 AM Toshi Kani wrote:
>>>>> On Thu, 2012-11-29 at 11:15 +0100, Rafael J. Wysocki wrote:
>>>>>> On Wednesday, November 28, 2012 11:41:36 AM Toshi Kani wrote:
>>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>>>> Since this phase makes no change, no rollback is necessary to fail.  
>>>>>>
>>>>>> Actually, we can't do it this way, because the conditions may change between
>>>>>> the check and the execution.  So the first phase needs to involve execution
>>>>>> to some extent, although only as far as it remains reversible.
>>>>>
>>>>> For memory hot-remove, we can check if the target memory ranges are
>>>>> within ZONE_MOVABLE.  We should not allow user to change this setup
>>>>> during hot-remove operation.  Other things may be to check if a target
>>>>> node contains cpu0 (until it is supported), the console UART (assuming
>>>>> we cannot delete it), etc.  We should avoid doing rollback as much as we
>>>>> can.
>>>>
>>>> Yes, we can make some checks upfront as an optimization and fail early if
>>>> the conditions are not met, but for correctness we need to repeat those
>>>> checks later anyway.  Once we've decided to go for the eject, the conditions
>>>> must hold whatever happens.
>>>
>>> Agreed.
>>
>> BTW, it is not an optimization I am after for this phase.  There are
>> many error cases during hot-plug operations.  It is difficult to assure
>> that rollback is successful for every error condition in terms of
>> testing and maintaining the code.  So, it is easier to fail beforehand
>> when possible.
> 
> OK, but as I said it is necessary to ensure that the conditions will be met
> in the next phases as well if we don't fail.
Yes, that's absolutely an requirement. Otherwise QA people will call you
when doing stress tests.

> 
> Thanks,
> Rafael
> 
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 17:30                 ` Jiang Liu
@ 2012-12-06 17:28                   ` Toshi Kani
  0 siblings, 0 replies; 92+ messages in thread
From: Toshi Kani @ 2012-12-06 17:28 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On Fri, 2012-12-07 at 01:30 +0800, Jiang Liu wrote:
> On 12/07/2012 01:09 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
> >> On 12/07/2012 12:31 AM, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
> >>>> On 12/07/2012 12:03 AM, Toshi Kani wrote:
> >>>>> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
> >>>>>> On 11/29/2012 02:41 AM, Toshi Kani wrote:
> >>>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >>>  : 
> >>>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>>>>>> operation should be composed with the following 3 phases.
> >>>>>>>
> >>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>>>>>> known restrictions are verified at this phase.  For instance, if a
> >>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
> >>>>>>> Since this phase makes no change, no rollback is necessary to fail.  
> >>>>>>>
> >>>>>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
> >>>>>>> rolled-back in case of error or cancel.
> >>>>>>>
> >>>>>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
> >>>>>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
> >>>>>>> instance, eject operation is performed at this phase.  
> >>>>>> Hi Toshi,
> >>>>>> 	There are one more step needed. Linux provides sysfs interfaces to
> >>>>>> online/offline CPU/memory sections, so we need to protect from concurrent
> >>>>>> operations from those interfaces when doing physical hotplug. Think about
> >>>>>> following sequence:
> >>>>>> Thread 1
> >>>>>> 1. validate conditions for hot-removal
> >>>>>> 2. offline memory section A
> >>>>>> 3.						online memory section A			
> >>>>>> 4. offline memory section B
> >>>>>> 5 hot-remove memory device hosting A and B.
> >>>>>
> >>>>> Hi Gerry,
> >>>>>
> >>>>> I agree.  And I am working on a proposal that tries to address this
> >>>>> issue by integrating both sysfs and hotplug operations into a framework.
> >>>> Hi Toshi,
> >>>> 	But the sysfs for CPU and memory online/offline are platform independent
> >>>> interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
> >>>> sure whether it's feasible to merge them. For example we still need offline interface
> >>>> to stop using faulty CPUs on platform without physical hotplug capabilities.
> >>>> 	We have solved this by adding a "busy" flag to the device, so the sysfs
> >>>> will just return -EBUSY if the busy flag is set.
> >>>
> >>> I am making the framework code platform-independent so that it can
> >>> handle both cases.  Well, I am still prototyping, so hopefully it will
> >>> work. :)
> >> Do you mean implementing a framework to manage hotplug of any type of devices?
> >> That sounds like a huge plan:)
> >>
> >> Otherwise there may be a gap. CPU online/offline interface deals with logical
> >> CPU, and hotplug driver deals with physical devices(processor). They may be different
> >> by related objects.
> > 
> > Actually it is not a huge plan.  The framework I am thinking of is to
> > enable a hotplug sequencer something analogous to do_initcalls() at the
> > boot sequence.  I am not doing any huge re-work.  That said, I am
> > currently testing my theory, so I won't promise anything, either. :)
> Please do give us an update when you get any progress:)

Yes, will do.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 17:09               ` Toshi Kani
@ 2012-12-06 17:30                 ` Jiang Liu
  2012-12-06 17:28                   ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-06 17:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen

On 12/07/2012 01:09 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:52 +0800, Jiang Liu wrote:
>> On 12/07/2012 12:31 AM, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 00:25 +0800, Jiang Liu wrote:
>>>> On 12/07/2012 12:03 AM, Toshi Kani wrote:
>>>>> On Fri, 2012-12-07 at 00:00 +0800, Jiang Liu wrote:
>>>>>> On 11/29/2012 02:41 AM, Toshi Kani wrote:
>>>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>  : 
>>>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>>>>> operation should be composed with the following 3 phases.
>>>>>>>
>>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>>>> Since this phase makes no change, no rollback is necessary to fail.  
>>>>>>>
>>>>>>> 2. Execute phase - Perform hot-add / hot-remove operation that can be
>>>>>>> rolled-back in case of error or cancel.
>>>>>>>
>>>>>>> 3. Commit phase - Perform the final hot-add / hot-remove operation that
>>>>>>> cannot be rolled-back.  No error / cancel is allowed in this phase.  For
>>>>>>> instance, eject operation is performed at this phase.  
>>>>>> Hi Toshi,
>>>>>> 	There are one more step needed. Linux provides sysfs interfaces to
>>>>>> online/offline CPU/memory sections, so we need to protect from concurrent
>>>>>> operations from those interfaces when doing physical hotplug. Think about
>>>>>> following sequence:
>>>>>> Thread 1
>>>>>> 1. validate conditions for hot-removal
>>>>>> 2. offline memory section A
>>>>>> 3.						online memory section A			
>>>>>> 4. offline memory section B
>>>>>> 5 hot-remove memory device hosting A and B.
>>>>>
>>>>> Hi Gerry,
>>>>>
>>>>> I agree.  And I am working on a proposal that tries to address this
>>>>> issue by integrating both sysfs and hotplug operations into a framework.
>>>> Hi Toshi,
>>>> 	But the sysfs for CPU and memory online/offline are platform independent
>>>> interfaces, and the ACPI based hotplug is platform dependent interfaces. I'm not
>>>> sure whether it's feasible to merge them. For example we still need offline interface
>>>> to stop using faulty CPUs on platform without physical hotplug capabilities.
>>>> 	We have solved this by adding a "busy" flag to the device, so the sysfs
>>>> will just return -EBUSY if the busy flag is set.
>>>
>>> I am making the framework code platform-independent so that it can
>>> handle both cases.  Well, I am still prototyping, so hopefully it will
>>> work. :)
>> Do you mean implementing a framework to manage hotplug of any type of devices?
>> That sounds like a huge plan:)
>>
>> Otherwise there may be a gap. CPU online/offline interface deals with logical
>> CPU, and hotplug driver deals with physical devices(processor). They may be different
>> by related objects.
> 
> Actually it is not a huge plan.  The framework I am thinking of is to
> enable a hotplug sequencer something analogous to do_initcalls() at the
> boot sequence.  I am not doing any huge re-work.  That said, I am
> currently testing my theory, so I won't promise anything, either. :)
Please do give us an update when you get any progress:)

> 
> Thanks,
> -Toshi
> 
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:40             ` Jiang Liu
@ 2012-12-06 20:30               ` Rafael J. Wysocki
  2012-12-07  2:57               ` Toshi Kani
  1 sibling, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-12-06 20:30 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Toshi Kani, Hanjun Guo, Vasilis Liaskovitis, linux-acpi,
	isimatu.yasuaki, wency, lenb, gregkh, linux-kernel, linux-mm,
	Tang Chen, Liujiang, Huxinwei

On Friday, December 07, 2012 12:40:48 AM Jiang Liu wrote:
> On 12/04/2012 08:10 AM, Toshi Kani wrote:
> > On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >> On 2012/11/30 6:27, Toshi Kani wrote:
> >>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> >>>> On 2012/11/29 2:41, Toshi Kani wrote:
> >>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
> >>>>>> this problem, and the way we slove this problem is a bit like yours.
> >>>>>>
> >>>>>> We introduce hp_ops in struct acpi_device_ops:
> >>>>>> struct acpi_device_ops {
> >>>>>> 	acpi_op_add add;
> >>>>>> 	acpi_op_remove remove;
> >>>>>> 	acpi_op_start start;
> >>>>>> 	acpi_op_bind bind;
> >>>>>> 	acpi_op_unbind unbind;
> >>>>>> 	acpi_op_notify notify;
> >>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
> >>>>>> 	struct acpihp_dev_ops *hp_ops;
> >>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
> >>>>>> };
> >>>>>>
> >>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
> >>>>>> 2) release(): reclaim device from running system
> >>>>>> 3) post_release(): rollback if cancelled by user or error happened
> >>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >>>>>> 5) unconfigure(): remove devices from running system
> >>>>>> 6) post_unconfigure(): free resources used by devices
> >>>>>>
> >>>>>> In this way, we can easily rollback if error happens.
> >>>>>> How do you think of this solution, any suggestion ? I think we can achieve
> >>>>>> a better way for sharing ideas. :)
> >>>>>
> >>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>>>> operation should be composed with the following 3 phases.
> >>>>
> >>>> Good idea ! we also implement a hot-plug operation in 3 phases:
> >>>> 1) acpihp_drv_pre_execute
> >>>> 2) acpihp_drv_execute
> >>>> 3) acpihp_drv_post_execute
> >>>> you may refer to :
> >>>> https://lkml.org/lkml/2012/11/4/79
> >>>
> >>> Great.  Yes, I will take a look.
> >>
> >> Thanks, any comments are welcomed :)
> > 
> > If I read the code right, the framework calls ACPI drivers differently
> > at boot-time and hot-add as follows.  That is, the new entry points are
> > called at hot-add only, but .add() is called at both cases.  This
> > requires .add() to work differently.
> > 
> > Boot    : .add()
> > Hot-Add : .add(), .pre_configure(), configure(), etc.
> > 
> > I think the boot-time and hot-add initialization should be done
> > consistently.  While there is difficulty with the current boot sequence,
> > the framework should be designed to allow them consistent, not make them
> > diverged.
> Hi Toshi,
> 	We have separated hotplug operations from driver binding/unbinding interface
> due to following considerations.
> 1) Physical CPU and memory devices are initialized/used before the ACPI subsystem
>    is initialized. So under normal case, .add() of processor and acpi_memhotplug only
>    figures out information about device already in working state instead of starting
>    the device.
> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
>    if .remove() of CPU and memory drivers do really remove the CPU/memory device
>    from the system. And the ACPI processor driver also implements CPU PM funcitonality
>    other than hotplug.
> 
> And recently Rafael has mentioned that he has a long term view to get rid of the
> concept of "ACPI device". If that happens, we could easily move the hotplug
> logic from ACPI device drivers into the hotplug framework if the hotplug logic
> is separated from the .add()/.remove() callbacks. Actually we could even move all
> hotplug only logic into the hotplug framework and don't rely on any ACPI device
> driver any more. So we could get rid of all these messy things. We could achieve
> that by:
> 1) moving code shared by ACPI device drivers and the hotplug framework into the core.
> 2) moving hotplug only code to the framework.
> 
> Hi Rafael, what's your thoughts here?

I think that sounds good at the high level, but we need to get there
incrementally.  This way it will be easier to maintain backwards
compatibility and follow the changes.  Also, it will be easier for all of
the interested people from different companies to participate in the
development and make sure that everyones needs are going to be met this
way.

At this point, I'd like to see where the Toshi Kani's proposal is going to
take us.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario
  2012-12-06 15:41                                     ` Toshi Kani
@ 2012-12-06 20:32                                       ` Rafael J. Wysocki
  0 siblings, 0 replies; 92+ messages in thread
From: Rafael J. Wysocki @ 2012-12-06 20:32 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Vasilis Liaskovitis, linux-acpi, Wen Congyang, Wen Congyang,
	isimatu.yasuaki, lenb, gregkh, linux-kernel, linux-mm

On Thursday, December 06, 2012 08:41:29 AM Toshi Kani wrote:
> On Thu, 2012-12-06 at 13:50 +0100, Rafael J. Wysocki wrote:
> > On Thursday, December 06, 2012 10:30:19 AM Vasilis Liaskovitis wrote:
> > > Hi,
> > > On Thu, Nov 29, 2012 at 10:44:11AM -0700, Toshi Kani wrote:
> > > > On Thu, 2012-11-29 at 12:04 +0100, Vasilis Liaskovitis wrote:
> > > > 
> > > > Yes, that's what I had in mind along with device_lock().  I think the
> > > > lock is necessary to close the window.
> > > > http://www.spinics.net/lists/linux-mm/msg46973.html
> > > > 
> > > > But as I mentioned in other email, I prefer option 3 with
> > > > suppress_bind_attrs.  So, yes, please take a look to see how it works
> > > > out.
> > > 
> > > I tested the suppress_bind_attrs and it works by simply setting it to true
> > > before driver registration e.g. 
> > > 
> > > --- a/drivers/acpi/scan.c
> > > +++ b/drivers/acpi/scan.c
> > > @@ -783,7 +783,8 @@ int acpi_bus_register_driver(struct acpi_driver *driver)
> > >  	driver->drv.name = driver->name;
> > >  	driver->drv.bus = &acpi_bus_type;
> > >  	driver->drv.owner = driver->owner;
> > > -
> > > +    if (!strcmp(driver->class, "memory"))
> > > +        driver->drv.suppress_bind_attrs = true;
> > >  	ret = driver_register(&driver->drv);
> > >  	return ret;
> > >  }
> > > 
> > > No bind/unbind sysfs files are created when using this, as expected.
> > > I assume we only want to suppress for acpi_memhotplug
> > > (class=ACPI_MEMORY_DEVICE_CLASS i.e. "memory") devices.
> > > 
> > > Is there agreement on what acpi_bus_trim behaviour and rollback (if any) we
> > > want to have for the current ACPI framework (partial trim or full trim on
> > > failure)?
> > 
> > Last time I suggested to split the trimming so that first we only unbind
> > drivers (and roll back that part, ie. rebind the drivers on errors) and
> > next we remove the struct acpi_device objects, just before doing the actual
> > eject.  So there would be two walks of the hierarchy below the device we want
> > to eject, one for driver unbinding (that can be rolled back) and one for the
> > actual removal.
> > 
> > Toshi Kani seemed to agree with that and there were no follow-ups.
> 
> I was hoping to have a short term solution to fix the panic on
> attempting to delete a kernel memory range, assuming that the memory
> hot-plug feature is going to make into 3.8.  It's a blocker issue for
> testing the feature.  Now that the VM patchset does not seem to make
> into 3.8, I think we can step back and focus on a long term solution
> toward 3.9.
> 
> I agree that we should separate resource online/offlining step and
> acpi_device creation/deletion step.  It can address the panic and make
> rollback easier to handle.  For 3.9, we should have a better framework
> in place to handle it in general.  So, I am currently working on a
> framework proposal, and hopefully able to send it out in a week or so.

Cool, thanks for doing this!

> Lastly, thanks Vasilis for testing the suppress_bind_attrs change.  I
> think we may still need it for 3.9.

Well, we'll see. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:47                 ` Jiang Liu
@ 2012-12-07  2:25                   ` Toshi Kani
  0 siblings, 0 replies; 92+ messages in thread
From: Toshi Kani @ 2012-12-07  2:25 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Hanjun Guo, Vasilis Liaskovitis, linux-acpi, isimatu.yasuaki,
	wency, rjw, lenb, gregkh, linux-kernel, linux-mm, Tang Chen,
	Liujiang, Huxinwei

On Fri, 2012-12-07 at 00:47 +0800, Jiang Liu wrote:
> On 12/05/2012 07:23 AM, Toshi Kani wrote:
> > On Tue, 2012-12-04 at 17:16 +0800, Hanjun Guo wrote:
> >> On 2012/12/4 8:10, Toshi Kani wrote:
> >>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >>>> On 2012/11/30 6:27, Toshi Kani wrote:
> >>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> >>>>>> On 2012/11/29 2:41, Toshi Kani wrote:
:
> >>>> The ACPI specification provides _EDL method to
> >>>> tell OS the eject device list, but still has no method to tell OS the add device
> >>>> list now.
> >>>
> >>> Yes, but I do not think the OS needs special handling for add...
> >>
> >> Hmm, how about trigger a hot add operation by OS ? we have eject interface for OS, but
> >> have no add interface now, do you think this feature is useful? If it is, I think OS
> >> should analyze the dependency first and tell the user.
> > 
> > The OS can eject an ACPI device because a target device is owned by the
> > OS (i.e. enabled).  For hot-add, a target ACPI device is not owned by
> > the OS (i.e. disabled).  Therefore, the OS is not supposed to change its
> > state.  So, I do not think we should support a hot-add operation by the
> > OS.
> We depends on the firmware to provide an interface to actually hot-add the device.
> The sequence is:
> 1) user trigger hot-add request by sysfs interfaces.
> 2) hotplug framework validates conditions for hot-adding (dependency)
> 3) hotplug framework invokes firmware interfaces to request a hot-adding operation.
> 4) firmware sends an ACPI notificaitons after powering on/initializing the device
> 5) OS adds the devices into running system.

Interesting...  In this sequence, I think FW must validate and check the
dependency before sending a SCI.  FW owns unassigned resources and is
responsible for the procedure necessary to enable resources on the
platform.  Such steps are basically platform-specific.  So, I do not
think the common OS code should step into such business.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-06 16:40             ` Jiang Liu
  2012-12-06 20:30               ` Rafael J. Wysocki
@ 2012-12-07  2:57               ` Toshi Kani
  2012-12-07  5:57                 ` Jiang Liu
  1 sibling, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-12-07  2:57 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Rafael J. Wysocki, Hanjun Guo, Vasilis Liaskovitis, linux-acpi,
	isimatu.yasuaki, wency, lenb, gregkh, linux-kernel, linux-mm,
	Tang Chen, Liujiang, Huxinwei

On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
> On 12/04/2012 08:10 AM, Toshi Kani wrote:
> > On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >> On 2012/11/30 6:27, Toshi Kani wrote:
> >>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
> >>>> On 2012/11/29 2:41, Toshi Kani wrote:
> >>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
> >>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
> >>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
> >>>>>> this problem, and the way we slove this problem is a bit like yours.
> >>>>>>
> >>>>>> We introduce hp_ops in struct acpi_device_ops:
> >>>>>> struct acpi_device_ops {
> >>>>>> 	acpi_op_add add;
> >>>>>> 	acpi_op_remove remove;
> >>>>>> 	acpi_op_start start;
> >>>>>> 	acpi_op_bind bind;
> >>>>>> 	acpi_op_unbind unbind;
> >>>>>> 	acpi_op_notify notify;
> >>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
> >>>>>> 	struct acpihp_dev_ops *hp_ops;
> >>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
> >>>>>> };
> >>>>>>
> >>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
> >>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
> >>>>>> 2) release(): reclaim device from running system
> >>>>>> 3) post_release(): rollback if cancelled by user or error happened
> >>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
> >>>>>> 5) unconfigure(): remove devices from running system
> >>>>>> 6) post_unconfigure(): free resources used by devices
> >>>>>>
> >>>>>> In this way, we can easily rollback if error happens.
> >>>>>> How do you think of this solution, any suggestion ? I think we can achieve
> >>>>>> a better way for sharing ideas. :)
> >>>>>
> >>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
> >>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
> >>>>> operation should be composed with the following 3 phases.
> >>>>
> >>>> Good idea ! we also implement a hot-plug operation in 3 phases:
> >>>> 1) acpihp_drv_pre_execute
> >>>> 2) acpihp_drv_execute
> >>>> 3) acpihp_drv_post_execute
> >>>> you may refer to :
> >>>> https://lkml.org/lkml/2012/11/4/79
> >>>
> >>> Great.  Yes, I will take a look.
> >>
> >> Thanks, any comments are welcomed :)
> > 
> > If I read the code right, the framework calls ACPI drivers differently
> > at boot-time and hot-add as follows.  That is, the new entry points are
> > called at hot-add only, but .add() is called at both cases.  This
> > requires .add() to work differently.
> > 
> > Boot    : .add()
> > Hot-Add : .add(), .pre_configure(), configure(), etc.
> > 
> > I think the boot-time and hot-add initialization should be done
> > consistently.  While there is difficulty with the current boot sequence,
> > the framework should be designed to allow them consistent, not make them
> > diverged.
> Hi Toshi,
> 	We have separated hotplug operations from driver binding/unbinding interface
> due to following considerations.
> 1) Physical CPU and memory devices are initialized/used before the ACPI subsystem
>    is initialized. So under normal case, .add() of processor and acpi_memhotplug only
>    figures out information about device already in working state instead of starting
>    the device.

I agree that the current boot sequence is not very hot-plug friendly...

> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
>    if .remove() of CPU and memory drivers do really remove the CPU/memory device
>    from the system. And the ACPI processor driver also implements CPU PM funcitonality
>    other than hotplug.

Agreed.

> And recently Rafael has mentioned that he has a long term view to get rid of the
> concept of "ACPI device". If that happens, we could easily move the hotplug
> logic from ACPI device drivers into the hotplug framework if the hotplug logic
> is separated from the .add()/.remove() callbacks. Actually we could even move all
> hotplug only logic into the hotplug framework and don't rely on any ACPI device
> driver any more. So we could get rid of all these messy things. We could achieve
> that by:
> 1) moving code shared by ACPI device drivers and the hotplug framework into the core.
> 2) moving hotplug only code to the framework.

Yes, the framework should allow such future work.  I also think that the
framework itself should be independent from such ACPI issue.  Ideally,
it should be able to support non-ACPI platforms.

> Hi Rafael, what's your thoughts here?
> 
> > 
> >>>>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>>>> known restrictions are verified at this phase.  For instance, if a
> >>>>> hot-remove request involves kernel memory, it is failed in this phase.
> >>>>> Since this phase makes no change, no rollback is necessary to fail. 
> >>>>
> >>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
> >>>>
> >>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
> >>>>    when memory device remove;
> >>>
> >>> Agreed.
> >>>
> >>>> 2) Dependency check involved. For instance, if hot-add a memory device,
> >>>>    processor should be added first, otherwise it's not valid to this operation.
> >>>
> >>> I think FW should be the one that assures such dependency.  That is,
> >>> when a memory device object is marked as present/enabled/functioning, it
> >>> should be ready for the OS to use.
> >>
> >> Yes, BIOS should do something for the dependency, because BIOS knows the
> >> actual hardware topology. 
> > 
> > Right.
> > 
> >> The ACPI specification provides _EDL method to
> >> tell OS the eject device list, but still has no method to tell OS the add device
> >> list now.
> > 
> > Yes, but I do not think the OS needs special handling for add...
> We have a plan to support triggering hot-adding events from OS provided interfaces,
> so we also need to solve dependency issues when handling requests from those interfaces.
> For need to power on the physical processor before powering on a memory device if
> the memory device is attached to a physical processor.

I am afraid that this issue is platform-specific, and I am not sure if
there is a common way to handle such things in general.  I'd recommend
to work with FW folks to implement such platform-specific validation
code in FW.

> >> For some cases, OS should analyze the dependency in the validate phase. For example,
> >> when hot remove a node (container device), OS should analyze the dependency to get
> >> the remove order as following:
> >> 1) Host bridge;
> >> 2) Memory devices;
> >> 3) Processor devices;
> >> 4) Container device itself;
> > 
> > This may be off-topic, but how do you plan to delete I/O devices under a
> > node?  Are you planning to delete all I/O devices along with the node?
> > 
> > On other OS, we made a separate step called I/O chassis delete, which
> > off-lines all I/O devices under the node, and is required before a node
> > hot-remove.  It basically triggers PCIe hot-remove to detach drivers
> > from all devices.  It does not eject the devices so that they do not
> > have to be on hot-plug slots.  This step runs user-space scripts to
> > verify if the devices can be off-lined without disrupting user's
> > applications, and provides comprehensive reports if any of them are in
> > use.  Not sure if Linux's PCI hot-remove has such check, but I thought
> > I'd mention it. :)
> Yinghai is working on PCI host bridge hotplug, which just stops all PCI devices
> under the host bridge. That's really a little dangerous and we do need help
> from userspace to check whether the hot-removal operaitons is fatal, 
> e.g. removing PCI device hosting the rootfs.

Agreed.

> So in our framework, we have an option to relay hotplug event from firmware
> to userspace, so the userspace has a chance to reject the hotplug operations
> if it may cause unacceptable disturbance to userspace services.

I think validation from user-space is necessary for deleting I/O
devices.  For CPU and memory, the kernel check works fine.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-07  2:57               ` Toshi Kani
@ 2012-12-07  5:57                 ` Jiang Liu
  2012-12-08  1:08                   ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-07  5:57 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Jiang Liu, Rafael J. Wysocki, Hanjun Guo, Vasilis Liaskovitis,
	linux-acpi, isimatu.yasuaki, wency, lenb, gregkh, linux-kernel,
	linux-mm, Tang Chen, Huxinwei

On 2012-12-7 10:57, Toshi Kani wrote:
> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
>> On 12/04/2012 08:10 AM, Toshi Kani wrote:
>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>>>> On 2012/11/30 6:27, Toshi Kani wrote:
>>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>>>>>> On 2012/11/29 2:41, Toshi Kani wrote:
>>>>>>> On Wed, 2012-11-28 at 19:05 +0800, Hanjun Guo wrote:
>>>>>>>> On 2012/11/24 1:50, Vasilis Liaskovitis wrote:
>>>>>>>> As you may know, the ACPI based hotplug framework we are working on already addressed
>>>>>>>> this problem, and the way we slove this problem is a bit like yours.
>>>>>>>>
>>>>>>>> We introduce hp_ops in struct acpi_device_ops:
>>>>>>>> struct acpi_device_ops {
>>>>>>>> 	acpi_op_add add;
>>>>>>>> 	acpi_op_remove remove;
>>>>>>>> 	acpi_op_start start;
>>>>>>>> 	acpi_op_bind bind;
>>>>>>>> 	acpi_op_unbind unbind;
>>>>>>>> 	acpi_op_notify notify;
>>>>>>>> #ifdef	CONFIG_ACPI_HOTPLUG
>>>>>>>> 	struct acpihp_dev_ops *hp_ops;
>>>>>>>> #endif	/* CONFIG_ACPI_HOTPLUG */
>>>>>>>> };
>>>>>>>>
>>>>>>>> in hp_ops, we divide the prepare_remove into six small steps, that is:
>>>>>>>> 1) pre_release(): optional step to mark device going to be removed/busy
>>>>>>>> 2) release(): reclaim device from running system
>>>>>>>> 3) post_release(): rollback if cancelled by user or error happened
>>>>>>>> 4) pre_unconfigure(): optional step to solve possible dependency issue
>>>>>>>> 5) unconfigure(): remove devices from running system
>>>>>>>> 6) post_unconfigure(): free resources used by devices
>>>>>>>>
>>>>>>>> In this way, we can easily rollback if error happens.
>>>>>>>> How do you think of this solution, any suggestion ? I think we can achieve
>>>>>>>> a better way for sharing ideas. :)
>>>>>>>
>>>>>>> Yes, sharing idea is good. :)  I do not know if we need all 6 steps (I
>>>>>>> have not looked at all your changes yet..), but in my mind, a hot-plug
>>>>>>> operation should be composed with the following 3 phases.
>>>>>>
>>>>>> Good idea ! we also implement a hot-plug operation in 3 phases:
>>>>>> 1) acpihp_drv_pre_execute
>>>>>> 2) acpihp_drv_execute
>>>>>> 3) acpihp_drv_post_execute
>>>>>> you may refer to :
>>>>>> https://lkml.org/lkml/2012/11/4/79
>>>>>
>>>>> Great.  Yes, I will take a look.
>>>>
>>>> Thanks, any comments are welcomed :)
>>>
>>> If I read the code right, the framework calls ACPI drivers differently
>>> at boot-time and hot-add as follows.  That is, the new entry points are
>>> called at hot-add only, but .add() is called at both cases.  This
>>> requires .add() to work differently.
>>>
>>> Boot    : .add()
>>> Hot-Add : .add(), .pre_configure(), configure(), etc.
>>>
>>> I think the boot-time and hot-add initialization should be done
>>> consistently.  While there is difficulty with the current boot sequence,
>>> the framework should be designed to allow them consistent, not make them
>>> diverged.
>> Hi Toshi,
>> 	We have separated hotplug operations from driver binding/unbinding interface
>> due to following considerations.
>> 1) Physical CPU and memory devices are initialized/used before the ACPI subsystem
>>    is initialized. So under normal case, .add() of processor and acpi_memhotplug only
>>    figures out information about device already in working state instead of starting
>>    the device.
> 
> I agree that the current boot sequence is not very hot-plug friendly...
> 
>> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
>>    if .remove() of CPU and memory drivers do really remove the CPU/memory device
>>    from the system. And the ACPI processor driver also implements CPU PM funcitonality
>>    other than hotplug.
> 
> Agreed.
> 
>> And recently Rafael has mentioned that he has a long term view to get rid of the
>> concept of "ACPI device". If that happens, we could easily move the hotplug
>> logic from ACPI device drivers into the hotplug framework if the hotplug logic
>> is separated from the .add()/.remove() callbacks. Actually we could even move all
>> hotplug only logic into the hotplug framework and don't rely on any ACPI device
>> driver any more. So we could get rid of all these messy things. We could achieve
>> that by:
>> 1) moving code shared by ACPI device drivers and the hotplug framework into the core.
>> 2) moving hotplug only code to the framework.
> 
> Yes, the framework should allow such future work.  I also think that the
> framework itself should be independent from such ACPI issue.  Ideally,
> it should be able to support non-ACPI platforms.
The same point here. The ACPI based hotplug framework is designed as:
1) an ACPI based hotplug slot driver to handle platform specific logic.
   Platform may provide platform specific slot drivers to discover, manage
   hotplug slots. We have provided a default implementation of slot driver
   according to the ACPI spec.
2) an ACPI based hotplug manager driver, which is a platform independent
   driver and manages all hotplug slot created by the slot driver.

We haven't gone further enough to provide an ACPI independent hotplug framework
because we only have experience with x86 and Itanium, both are ACPI based.
We may try to implement an ACPI independent hotplug framework by pushing all
ACPI specific logic into the slot driver, I think it's doable. But we need
suggestions from experts of other architectures, such as SPARC and Power.
But seems Power already have some sorts of hotplug framework, right?

> 
>> Hi Rafael, what's your thoughts here?
>>
>>>
>>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>>>> Since this phase makes no change, no rollback is necessary to fail. 
>>>>>>
>>>>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>>>>>
>>>>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>>>>>>    when memory device remove;
>>>>>
>>>>> Agreed.
>>>>>
>>>>>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>>>>>    processor should be added first, otherwise it's not valid to this operation.
>>>>>
>>>>> I think FW should be the one that assures such dependency.  That is,
>>>>> when a memory device object is marked as present/enabled/functioning, it
>>>>> should be ready for the OS to use.
>>>>
>>>> Yes, BIOS should do something for the dependency, because BIOS knows the
>>>> actual hardware topology. 
>>>
>>> Right.
>>>
>>>> The ACPI specification provides _EDL method to
>>>> tell OS the eject device list, but still has no method to tell OS the add device
>>>> list now.
>>>
>>> Yes, but I do not think the OS needs special handling for add...
>> We have a plan to support triggering hot-adding events from OS provided interfaces,
>> so we also need to solve dependency issues when handling requests from those interfaces.
>> For need to power on the physical processor before powering on a memory device if
>> the memory device is attached to a physical processor.
> 
> I am afraid that this issue is platform-specific, and I am not sure if
> there is a common way to handle such things in general.  I'd recommend
> to work with FW folks to implement such platform-specific validation
> code in FW.
You are right, we may rely on firmware to validate the dependency.

> 
>>>> For some cases, OS should analyze the dependency in the validate phase. For example,
>>>> when hot remove a node (container device), OS should analyze the dependency to get
>>>> the remove order as following:
>>>> 1) Host bridge;
>>>> 2) Memory devices;
>>>> 3) Processor devices;
>>>> 4) Container device itself;
>>>
>>> This may be off-topic, but how do you plan to delete I/O devices under a
>>> node?  Are you planning to delete all I/O devices along with the node?
>>>
>>> On other OS, we made a separate step called I/O chassis delete, which
>>> off-lines all I/O devices under the node, and is required before a node
>>> hot-remove.  It basically triggers PCIe hot-remove to detach drivers
>>> from all devices.  It does not eject the devices so that they do not
>>> have to be on hot-plug slots.  This step runs user-space scripts to
>>> verify if the devices can be off-lined without disrupting user's
>>> applications, and provides comprehensive reports if any of them are in
>>> use.  Not sure if Linux's PCI hot-remove has such check, but I thought
>>> I'd mention it. :)
>> Yinghai is working on PCI host bridge hotplug, which just stops all PCI devices
>> under the host bridge. That's really a little dangerous and we do need help
>> from userspace to check whether the hot-removal operaitons is fatal, 
>> e.g. removing PCI device hosting the rootfs.
> 
> Agreed.
> 
>> So in our framework, we have an option to relay hotplug event from firmware
>> to userspace, so the userspace has a chance to reject the hotplug operations
>> if it may cause unacceptable disturbance to userspace services.
> 
> I think validation from user-space is necessary for deleting I/O
> devices.  For CPU and memory, the kernel check works fine.
Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
etc for cpu and memory hot-removal. Especially for telecom applications, they
have strong dependency on cgroup/cpuisol to guarantee latency.

Regards!
Gerry

> 
> Thanks,
> -Toshi
> 
> 
> .
> 



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-07  5:57                 ` Jiang Liu
@ 2012-12-08  1:08                   ` Toshi Kani
  2012-12-11 14:34                     ` Jiang Liu
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-12-08  1:08 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Jiang Liu, Rafael J. Wysocki, Hanjun Guo, Vasilis Liaskovitis,
	linux-acpi, isimatu.yasuaki, wency, lenb, gregkh, linux-kernel,
	linux-mm, Tang Chen, Huxinwei

On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
> On 2012-12-7 10:57, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
> >> On 12/04/2012 08:10 AM, Toshi Kani wrote:
> >>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >>>> On 2012/11/30 6:27, Toshi Kani wrote:
> >>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 :
> >>>
> >>> If I read the code right, the framework calls ACPI drivers differently
> >>> at boot-time and hot-add as follows.  That is, the new entry points are
> >>> called at hot-add only, but .add() is called at both cases.  This
> >>> requires .add() to work differently.
> >>>
> >>> Boot    : .add()
> >>> Hot-Add : .add(), .pre_configure(), configure(), etc.
> >>>
> >>> I think the boot-time and hot-add initialization should be done
> >>> consistently.  While there is difficulty with the current boot sequence,
> >>> the framework should be designed to allow them consistent, not make them
> >>> diverged.
> >> Hi Toshi,
> >> 	We have separated hotplug operations from driver binding/unbinding interface
> >> due to following considerations.
> >> 1) Physical CPU and memory devices are initialized/used before the ACPI subsystem
> >>    is initialized. So under normal case, .add() of processor and acpi_memhotplug only
> >>    figures out information about device already in working state instead of starting
> >>    the device.
> > 
> > I agree that the current boot sequence is not very hot-plug friendly...
> > 
> >> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
> >>    if .remove() of CPU and memory drivers do really remove the CPU/memory device
> >>    from the system. And the ACPI processor driver also implements CPU PM funcitonality
> >>    other than hotplug.
> > 
> > Agreed.
> > 
> >> And recently Rafael has mentioned that he has a long term view to get rid of the
> >> concept of "ACPI device". If that happens, we could easily move the hotplug
> >> logic from ACPI device drivers into the hotplug framework if the hotplug logic
> >> is separated from the .add()/.remove() callbacks. Actually we could even move all
> >> hotplug only logic into the hotplug framework and don't rely on any ACPI device
> >> driver any more. So we could get rid of all these messy things. We could achieve
> >> that by:
> >> 1) moving code shared by ACPI device drivers and the hotplug framework into the core.
> >> 2) moving hotplug only code to the framework.
> > 
> > Yes, the framework should allow such future work.  I also think that the
> > framework itself should be independent from such ACPI issue.  Ideally,
> > it should be able to support non-ACPI platforms.
> The same point here. The ACPI based hotplug framework is designed as:
> 1) an ACPI based hotplug slot driver to handle platform specific logic.
>    Platform may provide platform specific slot drivers to discover, manage
>    hotplug slots. We have provided a default implementation of slot driver
>    according to the ACPI spec.

The ACPI spec does not define that _EJ0 is required to receive a hot-add
request, i.e. bus/device check.  This is a major issue.  Since Windows
only supports hot-add, I think there are platforms that only support
hot-add today.

> 2) an ACPI based hotplug manager driver, which is a platform independent
>    driver and manages all hotplug slot created by the slot driver.

It is surely impressive work, but I think is is a bit overdoing.  I
expect hot-pluggable servers come with management console and/or GUI
where a user can manage hardware units and initiate hot-plug operations.
I do not think the kernel needs to step into such area since it tends to
be platform-specific. 

> We haven't gone further enough to provide an ACPI independent hotplug framework
> because we only have experience with x86 and Itanium, both are ACPI based.
> We may try to implement an ACPI independent hotplug framework by pushing all
> ACPI specific logic into the slot driver, I think it's doable. But we need
> suggestions from experts of other architectures, such as SPARC and Power.
> But seems Power already have some sorts of hotplug framework, right?

I do not know about the Linux hot-plug support on other architectures.
PA-RISC SuperDome also supports Node hot-plug, but it is not supported
by Linux.  Since ARM is getting used by servers, I would not surprise if
there will be an ARM based server with hot-plug support in future.

> >> Hi Rafael, what's your thoughts here?
> >>
> >>>
> >>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
> >>>>>>> known restrictions are verified at this phase.  For instance, if a
> >>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
> >>>>>>> Since this phase makes no change, no rollback is necessary to fail. 
> >>>>>>
> >>>>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
> >>>>>>
> >>>>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
> >>>>>>    when memory device remove;
> >>>>>
> >>>>> Agreed.
> >>>>>
> >>>>>> 2) Dependency check involved. For instance, if hot-add a memory device,
> >>>>>>    processor should be added first, otherwise it's not valid to this operation.
> >>>>>
> >>>>> I think FW should be the one that assures such dependency.  That is,
> >>>>> when a memory device object is marked as present/enabled/functioning, it
> >>>>> should be ready for the OS to use.
> >>>>
> >>>> Yes, BIOS should do something for the dependency, because BIOS knows the
> >>>> actual hardware topology. 
> >>>
> >>> Right.
> >>>
> >>>> The ACPI specification provides _EDL method to
> >>>> tell OS the eject device list, but still has no method to tell OS the add device
> >>>> list now.
> >>>
> >>> Yes, but I do not think the OS needs special handling for add...
> >> We have a plan to support triggering hot-adding events from OS provided interfaces,
> >> so we also need to solve dependency issues when handling requests from those interfaces.
> >> For need to power on the physical processor before powering on a memory device if
> >> the memory device is attached to a physical processor.
> > 
> > I am afraid that this issue is platform-specific, and I am not sure if
> > there is a common way to handle such things in general.  I'd recommend
> > to work with FW folks to implement such platform-specific validation
> > code in FW.
> You are right, we may rely on firmware to validate the dependency.

Great!

> >>>> For some cases, OS should analyze the dependency in the validate phase. For example,
> >>>> when hot remove a node (container device), OS should analyze the dependency to get
> >>>> the remove order as following:
> >>>> 1) Host bridge;
> >>>> 2) Memory devices;
> >>>> 3) Processor devices;
> >>>> 4) Container device itself;
> >>>
> >>> This may be off-topic, but how do you plan to delete I/O devices under a
> >>> node?  Are you planning to delete all I/O devices along with the node?
> >>>
> >>> On other OS, we made a separate step called I/O chassis delete, which
> >>> off-lines all I/O devices under the node, and is required before a node
> >>> hot-remove.  It basically triggers PCIe hot-remove to detach drivers
> >>> from all devices.  It does not eject the devices so that they do not
> >>> have to be on hot-plug slots.  This step runs user-space scripts to
> >>> verify if the devices can be off-lined without disrupting user's
> >>> applications, and provides comprehensive reports if any of them are in
> >>> use.  Not sure if Linux's PCI hot-remove has such check, but I thought
> >>> I'd mention it. :)
> >> Yinghai is working on PCI host bridge hotplug, which just stops all PCI devices
> >> under the host bridge. That's really a little dangerous and we do need help
> >> from userspace to check whether the hot-removal operaitons is fatal, 
> >> e.g. removing PCI device hosting the rootfs.
> > 
> > Agreed.
> > 
> >> So in our framework, we have an option to relay hotplug event from firmware
> >> to userspace, so the userspace has a chance to reject the hotplug operations
> >> if it may cause unacceptable disturbance to userspace services.
> > 
> > I think validation from user-space is necessary for deleting I/O
> > devices.  For CPU and memory, the kernel check works fine.
> Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
> etc for cpu and memory hot-removal. Especially for telecom applications, they
> have strong dependency on cgroup/cpuisol to guarantee latency.

I have not looked at the code, but isn't these cpu attributes managed in
the kernel?

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-08  1:08                   ` Toshi Kani
@ 2012-12-11 14:34                     ` Jiang Liu
  2012-12-13 14:42                       ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-11 14:34 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Jiang Liu, Rafael J. Wysocki, Hanjun Guo, Vasilis Liaskovitis,
	linux-acpi, isimatu.yasuaki, wency, lenb, gregkh, linux-kernel,
	linux-mm, Tang Chen, Huxinwei

On 12/08/2012 09:08 AM, Toshi Kani wrote:
> On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
>> On 2012-12-7 10:57, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
>>>> On 12/04/2012 08:10 AM, Toshi Kani wrote:
>>>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>>>>>> On 2012/11/30 6:27, Toshi Kani wrote:
>>>>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>  :
>>>>>
>>>>> If I read the code right, the framework calls ACPI drivers differently
>>>>> at boot-time and hot-add as follows.  That is, the new entry points are
>>>>> called at hot-add only, but .add() is called at both cases.  This
>>>>> requires .add() to work differently.
>>>>>
>>>>> Boot    : .add()
>>>>> Hot-Add : .add(), .pre_configure(), configure(), etc.
>>>>>
>>>>> I think the boot-time and hot-add initialization should be done
>>>>> consistently.  While there is difficulty with the current boot sequence,
>>>>> the framework should be designed to allow them consistent, not make them
>>>>> diverged.
>>>> Hi Toshi,
>>>> 	We have separated hotplug operations from driver binding/unbinding interface
>>>> due to following considerations.
>>>> 1) Physical CPU and memory devices are initialized/used before the ACPI subsystem
>>>>    is initialized. So under normal case, .add() of processor and acpi_memhotplug only
>>>>    figures out information about device already in working state instead of starting
>>>>    the device.
>>>
>>> I agree that the current boot sequence is not very hot-plug friendly...
>>>
>>>> 2) It's impossible to rmmod the processor and acpi_memhotplug driver at runtime 
>>>>    if .remove() of CPU and memory drivers do really remove the CPU/memory device
>>>>    from the system. And the ACPI processor driver also implements CPU PM funcitonality
>>>>    other than hotplug.
>>>
>>> Agreed.
>>>
>>>> And recently Rafael has mentioned that he has a long term view to get rid of the
>>>> concept of "ACPI device". If that happens, we could easily move the hotplug
>>>> logic from ACPI device drivers into the hotplug framework if the hotplug logic
>>>> is separated from the .add()/.remove() callbacks. Actually we could even move all
>>>> hotplug only logic into the hotplug framework and don't rely on any ACPI device
>>>> driver any more. So we could get rid of all these messy things. We could achieve
>>>> that by:
>>>> 1) moving code shared by ACPI device drivers and the hotplug framework into the core.
>>>> 2) moving hotplug only code to the framework.
>>>
>>> Yes, the framework should allow such future work.  I also think that the
>>> framework itself should be independent from such ACPI issue.  Ideally,
>>> it should be able to support non-ACPI platforms.
>> The same point here. The ACPI based hotplug framework is designed as:
>> 1) an ACPI based hotplug slot driver to handle platform specific logic.
>>    Platform may provide platform specific slot drivers to discover, manage
>>    hotplug slots. We have provided a default implementation of slot driver
>>    according to the ACPI spec.
> 
> The ACPI spec does not define that _EJ0 is required to receive a hot-add
> request, i.e. bus/device check.  This is a major issue.  Since Windows
> only supports hot-add, I think there are platforms that only support
> hot-add today.
> 
>> 2) an ACPI based hotplug manager driver, which is a platform independent
>>    driver and manages all hotplug slot created by the slot driver.
> 
> It is surely impressive work, but I think is is a bit overdoing.  I
> expect hot-pluggable servers come with management console and/or GUI
> where a user can manage hardware units and initiate hot-plug operations.
> I do not think the kernel needs to step into such area since it tends to
> be platform-specific. 
One of the major usages of this feature is for testing. 
It will be hard for OSVs and OEMs to verify hotplug functionalities if it could
only be tested by physical hotplug or through management console. So to pave the
way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
auto stress tests for hotplug functionalities.

> 
>> We haven't gone further enough to provide an ACPI independent hotplug framework
>> because we only have experience with x86 and Itanium, both are ACPI based.
>> We may try to implement an ACPI independent hotplug framework by pushing all
>> ACPI specific logic into the slot driver, I think it's doable. But we need
>> suggestions from experts of other architectures, such as SPARC and Power.
>> But seems Power already have some sorts of hotplug framework, right?
> 
> I do not know about the Linux hot-plug support on other architectures.
> PA-RISC SuperDome also supports Node hot-plug, but it is not supported
> by Linux.  Since ARM is getting used by servers, I would not surprise if
> there will be an ARM based server with hot-plug support in future.
Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
in the future.

> 
>>>> Hi Rafael, what's your thoughts here?
>>>>
>>>>>
>>>>>>>>> 1. Validate phase - Verify if the request is a supported operation.  All
>>>>>>>>> known restrictions are verified at this phase.  For instance, if a
>>>>>>>>> hot-remove request involves kernel memory, it is failed in this phase.
>>>>>>>>> Since this phase makes no change, no rollback is necessary to fail. 
>>>>>>>>
>>>>>>>> Yes, we have done this in acpihp_drv_pre_execute, and check following things:
>>>>>>>>
>>>>>>>> 1) Hot-plugble or not. the instance kernel memory you mentioned is also checked
>>>>>>>>    when memory device remove;
>>>>>>>
>>>>>>> Agreed.
>>>>>>>
>>>>>>>> 2) Dependency check involved. For instance, if hot-add a memory device,
>>>>>>>>    processor should be added first, otherwise it's not valid to this operation.
>>>>>>>
>>>>>>> I think FW should be the one that assures such dependency.  That is,
>>>>>>> when a memory device object is marked as present/enabled/functioning, it
>>>>>>> should be ready for the OS to use.
>>>>>>
>>>>>> Yes, BIOS should do something for the dependency, because BIOS knows the
>>>>>> actual hardware topology. 
>>>>>
>>>>> Right.
>>>>>
>>>>>> The ACPI specification provides _EDL method to
>>>>>> tell OS the eject device list, but still has no method to tell OS the add device
>>>>>> list now.
>>>>>
>>>>> Yes, but I do not think the OS needs special handling for add...
>>>> We have a plan to support triggering hot-adding events from OS provided interfaces,
>>>> so we also need to solve dependency issues when handling requests from those interfaces.
>>>> For need to power on the physical processor before powering on a memory device if
>>>> the memory device is attached to a physical processor.
>>>
>>> I am afraid that this issue is platform-specific, and I am not sure if
>>> there is a common way to handle such things in general.  I'd recommend
>>> to work with FW folks to implement such platform-specific validation
>>> code in FW.
>> You are right, we may rely on firmware to validate the dependency.
> 
> Great!
> 
>>>>>> For some cases, OS should analyze the dependency in the validate phase. For example,
>>>>>> when hot remove a node (container device), OS should analyze the dependency to get
>>>>>> the remove order as following:
>>>>>> 1) Host bridge;
>>>>>> 2) Memory devices;
>>>>>> 3) Processor devices;
>>>>>> 4) Container device itself;
>>>>>
>>>>> This may be off-topic, but how do you plan to delete I/O devices under a
>>>>> node?  Are you planning to delete all I/O devices along with the node?
>>>>>
>>>>> On other OS, we made a separate step called I/O chassis delete, which
>>>>> off-lines all I/O devices under the node, and is required before a node
>>>>> hot-remove.  It basically triggers PCIe hot-remove to detach drivers
>>>>> from all devices.  It does not eject the devices so that they do not
>>>>> have to be on hot-plug slots.  This step runs user-space scripts to
>>>>> verify if the devices can be off-lined without disrupting user's
>>>>> applications, and provides comprehensive reports if any of them are in
>>>>> use.  Not sure if Linux's PCI hot-remove has such check, but I thought
>>>>> I'd mention it. :)
>>>> Yinghai is working on PCI host bridge hotplug, which just stops all PCI devices
>>>> under the host bridge. That's really a little dangerous and we do need help
>>>> from userspace to check whether the hot-removal operaitons is fatal, 
>>>> e.g. removing PCI device hosting the rootfs.
>>>
>>> Agreed.
>>>
>>>> So in our framework, we have an option to relay hotplug event from firmware
>>>> to userspace, so the userspace has a chance to reject the hotplug operations
>>>> if it may cause unacceptable disturbance to userspace services.
>>>
>>> I think validation from user-space is necessary for deleting I/O
>>> devices.  For CPU and memory, the kernel check works fine.
>> Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
>> etc for cpu and memory hot-removal. Especially for telecom applications, they
>> have strong dependency on cgroup/cpuisol to guarantee latency.
> 
> I have not looked at the code, but isn't these cpu attributes managed in
> the kernel?
Some Telecom applications want to run in an deterministic environment, so they
depend on cpuisol/cpuset to provide such an environment. If hotplug event happens,
these Telecom application should be notified so they have a chance to redistribute
the workload.

> 
> Thanks,
> -Toshi
> 
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-11 14:34                     ` Jiang Liu
@ 2012-12-13 14:42                       ` Toshi Kani
  2012-12-13 15:15                         ` Jiang Liu
  0 siblings, 1 reply; 92+ messages in thread
From: Toshi Kani @ 2012-12-13 14:42 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Jiang Liu, Rafael J. Wysocki, Hanjun Guo, Vasilis Liaskovitis,
	linux-acpi, isimatu.yasuaki, wency, lenb, gregkh, linux-kernel,
	linux-mm, Tang Chen, Huxinwei

On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
> On 12/08/2012 09:08 AM, Toshi Kani wrote:
> > On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
> >> On 2012-12-7 10:57, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
> >>>> On 12/04/2012 08:10 AM, Toshi Kani wrote:
> >>>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
> >>>>>> On 2012/11/30 6:27, Toshi Kani wrote:
> >>>>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
 :
> >>> Yes, the framework should allow such future work.  I also think that the
> >>> framework itself should be independent from such ACPI issue.  Ideally,
> >>> it should be able to support non-ACPI platforms.
> >> The same point here. The ACPI based hotplug framework is designed as:
> >> 1) an ACPI based hotplug slot driver to handle platform specific logic.
> >>    Platform may provide platform specific slot drivers to discover, manage
> >>    hotplug slots. We have provided a default implementation of slot driver
> >>    according to the ACPI spec.
> > 
> > The ACPI spec does not define that _EJ0 is required to receive a hot-add
> > request, i.e. bus/device check.  This is a major issue.  Since Windows
> > only supports hot-add, I think there are platforms that only support
> > hot-add today.
> > 
> >> 2) an ACPI based hotplug manager driver, which is a platform independent
> >>    driver and manages all hotplug slot created by the slot driver.
> > 
> > It is surely impressive work, but I think is is a bit overdoing.  I
> > expect hot-pluggable servers come with management console and/or GUI
> > where a user can manage hardware units and initiate hot-plug operations.
> > I do not think the kernel needs to step into such area since it tends to
> > be platform-specific. 
> One of the major usages of this feature is for testing. 
> It will be hard for OSVs and OEMs to verify hotplug functionalities if it could
> only be tested by physical hotplug or through management console. So to pave the
> way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
> auto stress tests for hotplug functionalities.

Yes, but such OS->FW interface is platform-specific.  Some platforms use
IPMI for the OS to communicate with the management console.  In this
case, an OEM-specific command can be used to request a hotplug through
IPMI.  Some platforms may also support test programs to run on the
management console for validations.

For early development testing, Yinghai's SCI emulation patch can be used
to emulate hotplug events from the OS.  It would be part of the kernel
debugging features once this patch is accepted. 

 
> >> We haven't gone further enough to provide an ACPI independent hotplug framework
> >> because we only have experience with x86 and Itanium, both are ACPI based.
> >> We may try to implement an ACPI independent hotplug framework by pushing all
> >> ACPI specific logic into the slot driver, I think it's doable. But we need
> >> suggestions from experts of other architectures, such as SPARC and Power.
> >> But seems Power already have some sorts of hotplug framework, right?
> > 
> > I do not know about the Linux hot-plug support on other architectures.
> > PA-RISC SuperDome also supports Node hot-plug, but it is not supported
> > by Linux.  Since ARM is getting used by servers, I would not surprise if
> > there will be an ARM based server with hot-plug support in future.
> Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
> in the future.

That's good to know.

 :
> >>>> So in our framework, we have an option to relay hotplug event from firmware
> >>>> to userspace, so the userspace has a chance to reject the hotplug operations
> >>>> if it may cause unacceptable disturbance to userspace services.
> >>>
> >>> I think validation from user-space is necessary for deleting I/O
> >>> devices.  For CPU and memory, the kernel check works fine.
> >> Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
> >> etc for cpu and memory hot-removal. Especially for telecom applications, they
> >> have strong dependency on cgroup/cpuisol to guarantee latency.
> > 
> > I have not looked at the code, but isn't these cpu attributes managed in
> > the kernel?
> Some Telecom applications want to run in an deterministic environment, so they
> depend on cpuisol/cpuset to provide such an environment. If hotplug event happens,
> these Telecom application should be notified so they have a chance to redistribute
> the workload.

I agree that we need to generate an event that can be subscribed by
those applications, so that they can react quickly on the change.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-13 14:42                       ` Toshi Kani
@ 2012-12-13 15:15                         ` Jiang Liu
  2012-12-15  1:19                           ` Toshi Kani
  0 siblings, 1 reply; 92+ messages in thread
From: Jiang Liu @ 2012-12-13 15:15 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Jiang Liu, Rafael J. Wysocki, Hanjun Guo, Vasilis Liaskovitis,
	linux-acpi, isimatu.yasuaki, wency, lenb, gregkh, linux-kernel,
	linux-mm, Tang Chen, Huxinwei

On 12/13/2012 10:42 PM, Toshi Kani wrote:
> On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
>> On 12/08/2012 09:08 AM, Toshi Kani wrote:
>>> On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
>>>> On 2012-12-7 10:57, Toshi Kani wrote:
>>>>> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
>>>>>> On 12/04/2012 08:10 AM, Toshi Kani wrote:
>>>>>>> On Mon, 2012-12-03 at 12:25 +0800, Hanjun Guo wrote:
>>>>>>>> On 2012/11/30 6:27, Toshi Kani wrote:
>>>>>>>>> On Thu, 2012-11-29 at 12:48 +0800, Hanjun Guo wrote:
>  :
>>>>> Yes, the framework should allow such future work.  I also think that the
>>>>> framework itself should be independent from such ACPI issue.  Ideally,
>>>>> it should be able to support non-ACPI platforms.
>>>> The same point here. The ACPI based hotplug framework is designed as:
>>>> 1) an ACPI based hotplug slot driver to handle platform specific logic.
>>>>    Platform may provide platform specific slot drivers to discover, manage
>>>>    hotplug slots. We have provided a default implementation of slot driver
>>>>    according to the ACPI spec.
>>>
>>> The ACPI spec does not define that _EJ0 is required to receive a hot-add
>>> request, i.e. bus/device check.  This is a major issue.  Since Windows
>>> only supports hot-add, I think there are platforms that only support
>>> hot-add today.
>>>
>>>> 2) an ACPI based hotplug manager driver, which is a platform independent
>>>>    driver and manages all hotplug slot created by the slot driver.
>>>
>>> It is surely impressive work, but I think is is a bit overdoing.  I
>>> expect hot-pluggable servers come with management console and/or GUI
>>> where a user can manage hardware units and initiate hot-plug operations.
>>> I do not think the kernel needs to step into such area since it tends to
>>> be platform-specific. 
>> One of the major usages of this feature is for testing. 
>> It will be hard for OSVs and OEMs to verify hotplug functionalities if it could
>> only be tested by physical hotplug or through management console. So to pave the
>> way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
>> auto stress tests for hotplug functionalities.
> 
> Yes, but such OS->FW interface is platform-specific.  Some platforms use
> IPMI for the OS to communicate with the management console.  In this
> case, an OEM-specific command can be used to request a hotplug through
> IPMI.  Some platforms may also support test programs to run on the
> management console for validations.
> 
> For early development testing, Yinghai's SCI emulation patch can be used
> to emulate hotplug events from the OS.  It would be part of the kernel
> debugging features once this patch is accepted. 
Hi Toshi,
	ACPI 5.0 has provided some mechanism to normalize the way to issue
RAS related requests to firmware. I hope ACPI 5.x will define some standardized
ways based on the PCC defined in 5.0. If needed, we may provide platform
specific methods for them too.
Regards!
Gerry

> 
>  
>>>> We haven't gone further enough to provide an ACPI independent hotplug framework
>>>> because we only have experience with x86 and Itanium, both are ACPI based.
>>>> We may try to implement an ACPI independent hotplug framework by pushing all
>>>> ACPI specific logic into the slot driver, I think it's doable. But we need
>>>> suggestions from experts of other architectures, such as SPARC and Power.
>>>> But seems Power already have some sorts of hotplug framework, right?
>>>
>>> I do not know about the Linux hot-plug support on other architectures.
>>> PA-RISC SuperDome also supports Node hot-plug, but it is not supported
>>> by Linux.  Since ARM is getting used by servers, I would not surprise if
>>> there will be an ARM based server with hot-plug support in future.
>> Seems ARM is on the way to adopt ACPI, so may be we could support ARM servers
>> in the future.
> 
> That's good to know.
> 
>  :
>>>>>> So in our framework, we have an option to relay hotplug event from firmware
>>>>>> to userspace, so the userspace has a chance to reject the hotplug operations
>>>>>> if it may cause unacceptable disturbance to userspace services.
>>>>>
>>>>> I think validation from user-space is necessary for deleting I/O
>>>>> devices.  For CPU and memory, the kernel check works fine.
>>>> Agreed. But we may need help from userspace to handle cgroup/cpuset/cpuisol
>>>> etc for cpu and memory hot-removal. Especially for telecom applications, they
>>>> have strong dependency on cgroup/cpuisol to guarantee latency.
>>>
>>> I have not looked at the code, but isn't these cpu attributes managed in
>>> the kernel?
>> Some Telecom applications want to run in an deterministic environment, so they
>> depend on cpuisol/cpuset to provide such an environment. If hotplug event happens,
>> these Telecom application should be notified so they have a chance to redistribute
>> the workload.
> 
> I agree that we need to generate an event that can be subscribed by
> those applications, so that they can react quickly on the change.
> 
> Thanks,
> -Toshi
> 
> 


^ permalink raw reply	[flat|nested] 92+ messages in thread

* Re: [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation
  2012-12-13 15:15                         ` Jiang Liu
@ 2012-12-15  1:19                           ` Toshi Kani
  0 siblings, 0 replies; 92+ messages in thread
From: Toshi Kani @ 2012-12-15  1:19 UTC (permalink / raw)
  To: Jiang Liu
  Cc: Jiang Liu, Rafael J. Wysocki, Hanjun Guo, Vasilis Liaskovitis,
	linux-acpi, isimatu.yasuaki, wency, lenb, gregkh, linux-kernel,
	linux-mm, Tang Chen, Huxinwei

On Thu, 2012-12-13 at 23:15 +0800, Jiang Liu wrote:
> On 12/13/2012 10:42 PM, Toshi Kani wrote:
> > On Tue, 2012-12-11 at 22:34 +0800, Jiang Liu wrote:
> >> On 12/08/2012 09:08 AM, Toshi Kani wrote:
> >>> On Fri, 2012-12-07 at 13:57 +0800, Jiang Liu wrote:
> >>>> On 2012-12-7 10:57, Toshi Kani wrote:
> >>>>> On Fri, 2012-12-07 at 00:40 +0800, Jiang Liu wrote:
 :
> >>>
> >>>> 2) an ACPI based hotplug manager driver, which is a platform independent
> >>>>    driver and manages all hotplug slot created by the slot driver.
> >>>
> >>> It is surely impressive work, but I think is is a bit overdoing.  I
> >>> expect hot-pluggable servers come with management console and/or GUI
> >>> where a user can manage hardware units and initiate hot-plug operations.
> >>> I do not think the kernel needs to step into such area since it tends to
> >>> be platform-specific. 
> >> One of the major usages of this feature is for testing. 
> >> It will be hard for OSVs and OEMs to verify hotplug functionalities if it could
> >> only be tested by physical hotplug or through management console. So to pave the
> >> way for hotplug, we need to provide a mechanism for OEMs and OSVs to execute 
> >> auto stress tests for hotplug functionalities.
> > 
> > Yes, but such OS->FW interface is platform-specific.  Some platforms use
> > IPMI for the OS to communicate with the management console.  In this
> > case, an OEM-specific command can be used to request a hotplug through
> > IPMI.  Some platforms may also support test programs to run on the
> > management console for validations.
> > 
> > For early development testing, Yinghai's SCI emulation patch can be used
> > to emulate hotplug events from the OS.  It would be part of the kernel
> > debugging features once this patch is accepted. 
> Hi Toshi,
> 	ACPI 5.0 has provided some mechanism to normalize the way to issue
> RAS related requests to firmware. I hope ACPI 5.x will define some standardized
> ways based on the PCC defined in 5.0. If needed, we may provide platform
> specific methods for them too.

Thanks for the pointer!  Yeah, the spec purposely does not define the
command.  When we support PCC, we will need to provide a way for user
app or oem module to supply a payload. 

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 92+ messages in thread

end of thread, other threads:[~2012-12-15  1:29 UTC | newest]

Thread overview: 92+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-23 17:50 [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Vasilis Liaskovitis
2012-11-23 17:50 ` [RFC PATCH v3 1/3] acpi: Introduce prepare_remove operation in acpi_device_ops Vasilis Liaskovitis
2012-11-27  0:10   ` Toshi Kani
2012-11-27 18:36     ` Vasilis Liaskovitis
2012-11-27 23:18     ` Rafael J. Wysocki
2012-11-23 17:50 ` [RFC PATCH v3 2/3] acpi_memhotplug: Add prepare_remove operation Vasilis Liaskovitis
2012-11-24 16:23   ` Wen Congyang
2012-11-23 17:50 ` [RFC PATCH v3 3/3] acpi_memhotplug: Allow eject to proceed on rebind scenario Vasilis Liaskovitis
2012-11-24 16:20   ` Wen Congyang
2012-11-26  8:36     ` Vasilis Liaskovitis
2012-11-26  9:11       ` Wen Congyang
2012-11-27  0:19         ` Toshi Kani
2012-11-27 18:32           ` Vasilis Liaskovitis
2012-11-27 22:03             ` Toshi Kani
2012-11-27 23:41               ` Rafael J. Wysocki
2012-11-28 16:01                 ` Toshi Kani
2012-11-28 18:40                   ` Rafael J. Wysocki
2012-11-28 21:02                     ` Toshi Kani
2012-11-28 21:40                       ` Rafael J. Wysocki
2012-11-28 21:40                         ` Toshi Kani
2012-11-28 22:01                           ` Rafael J. Wysocki
2012-11-28 22:04                             ` Toshi Kani
2012-11-28 22:21                               ` Rafael J. Wysocki
2012-11-28 22:16                                 ` Toshi Kani
2012-11-28 22:39                                   ` Rafael J. Wysocki
2012-11-28 22:46                                     ` Greg KH
2012-11-28 23:05                                       ` Rafael J. Wysocki
2012-11-28 23:10                                         ` Greg KH
2012-11-28 23:31                                           ` Rafael J. Wysocki
2012-11-28 23:49                       ` Rafael J. Wysocki
2012-11-29  1:02                         ` Toshi Kani
2012-11-29  1:15                           ` Toshi Kani
2012-11-29 10:03                             ` Rafael J. Wysocki
2012-11-29 11:30                               ` Vasilis Liaskovitis
2012-11-29 16:57                                 ` Rafael J. Wysocki
2012-11-29 17:56                                 ` Toshi Kani
2012-11-29 20:25                                   ` Rafael J. Wysocki
2012-11-29 20:38                                     ` Toshi Kani
2012-11-29 21:23                                       ` Rafael J. Wysocki
2012-11-29 21:46                                         ` Toshi Kani
2012-11-29 22:11                                           ` Rafael J. Wysocki
2012-11-29 23:17                                             ` Toshi Kani
2012-11-30  0:13                                               ` Rafael J. Wysocki
2012-11-30  1:09                                                 ` Toshi Kani
2012-11-29 16:43                               ` Toshi Kani
2012-11-29 11:04                             ` Vasilis Liaskovitis
2012-11-29 17:44                               ` Toshi Kani
2012-12-06  9:30                                 ` Vasilis Liaskovitis
2012-12-06 12:50                                   ` Rafael J. Wysocki
2012-12-06 15:41                                     ` Toshi Kani
2012-12-06 20:32                                       ` Rafael J. Wysocki
2012-11-28 11:05 ` [RFC PATCH v3 0/3] acpi: Introduce prepare_remove device operation Hanjun Guo
2012-11-28 18:41   ` Toshi Kani
2012-11-29  4:48     ` Hanjun Guo
2012-11-29 22:27       ` Toshi Kani
2012-12-03  4:25         ` Hanjun Guo
2012-12-04  0:10           ` Toshi Kani
2012-12-04  9:16             ` Hanjun Guo
2012-12-04 23:23               ` Toshi Kani
2012-12-05 12:10                 ` Hanjun Guo
2012-12-05 22:31                   ` Toshi Kani
2012-12-06 16:47                 ` Jiang Liu
2012-12-07  2:25                   ` Toshi Kani
2012-12-06 16:40             ` Jiang Liu
2012-12-06 20:30               ` Rafael J. Wysocki
2012-12-07  2:57               ` Toshi Kani
2012-12-07  5:57                 ` Jiang Liu
2012-12-08  1:08                   ` Toshi Kani
2012-12-11 14:34                     ` Jiang Liu
2012-12-13 14:42                       ` Toshi Kani
2012-12-13 15:15                         ` Jiang Liu
2012-12-15  1:19                           ` Toshi Kani
2012-11-29 10:15     ` Rafael J. Wysocki
2012-11-29 11:36       ` Vasilis Liaskovitis
2012-12-06 16:59         ` Jiang Liu
2012-11-29 17:03       ` Toshi Kani
2012-11-29 20:30         ` Rafael J. Wysocki
2012-11-29 20:39           ` Toshi Kani
2012-11-29 20:56             ` Toshi Kani
2012-11-29 21:25               ` Rafael J. Wysocki
2012-12-06 17:10                 ` Jiang Liu
2012-12-06 17:07           ` Jiang Liu
2012-12-06 17:01         ` Jiang Liu
2012-12-06 16:56       ` Jiang Liu
2012-12-06 16:00     ` Jiang Liu
2012-12-06 16:03       ` Toshi Kani
2012-12-06 16:25         ` Jiang Liu
2012-12-06 16:31           ` Toshi Kani
2012-12-06 16:52             ` Jiang Liu
2012-12-06 17:09               ` Toshi Kani
2012-12-06 17:30                 ` Jiang Liu
2012-12-06 17:28                   ` Toshi Kani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).