linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v2 00/12] System device hot-plug framework
@ 2013-01-10 23:40 Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
                   ` (12 more replies)
  0 siblings, 13 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

This patchset is a prototype of proposed system device hot-plug framework
for design review.  Unlike other hot-plug environments, such as USB and
PCI, there is no common framework for system device hot-plug [1].
Therefore, this patchset is designed to provide a common framework for
hot-plugging and online/offline operations of system devices, such as CPU,
Memory and Node.  While this patchset only supports ACPI-based hot-plug
operations, the framework itself is designed to be platform-neural and
can support other FW architectures as necessary.

This patchset is based on Linus's tree (3.8-rc3).

I have seen a few stability issues with 3.8-rc3 in my testing and will
look into their solutions.

[1] System device hot-plug frameworks for ppc and s390 are implemented
    for specific platforms and products.


Background: System Device Initialization
========================================
System devices, such as CPU and memory, must be initialized during early
boot sequence as they are the essential components to provide low-level
services, ex. scheduling, memory allocation and interrupts, which are
the foundations of the kernel services.  start_kernel() and kernel_init()
manage the boot-up sequence to initialize system devices and low-level
services in pre-defined order as shown below. 

  start_kernel()
    boot_cpu_init()          // init cpu0
    setup_arch()
      efi_init()             // init EFI memory map
      initmem_init()         // init NUMA
      x86_init.paging.pagetable_init() // init page table
      acpi_boot_init()       // parse ACPI MADT table
        :
  kernel_init()
    kernel_init_freeable()
      smp_init()             // init other CPUs
        :
      do_basic_setup()
        driver_init()
          cpu_dev_init()     // build system/cpu tree
          memory_dev_init()  // build system/memory tree
        do_initcalls()
          acpi_init()        // build ACPI device tree

Note that drivers are initialized at the end of the boot sequence as they
depend on the kernel services from system devices.  Hence, while system
devices may be exposed to sysfs with their pseudo drivers, their
initialization may not be fully integrated into the driver structures.  

Overview of the System Device Hot-plug Framework
================================================
Similar to the boot-up sequence, the system device hot-plug framework
provides a sequencer that calls all registered handlers in pre-defined
order for hot-add and hot-delete of system devices.  It allows any modules
initializing system devices in the boot-up sequence to participate in
the hot-plug operations as well.  In high-level, there are two types of
handlers, 1) FW-dependent (ex. ACPI) handlers that enumerate or eject
system devices, and 2) system device (ex. CPU, Memory) management handlers
that online or offline the enumerated system devices.  Online/offline
operations are sub-set of hot-add/delete operations.  The ordering of the
handlers are symmetric between hot-add (online) and hot-delete (offline)
operations.

        hot-add    online
           |    ^    :    ^
  HW Enum/ |    |    :    :
    Eject  |    |    :    :
           |    |    :    :
  Online/  |    |    |    |
  Offline  |    |    |    |
           V    |    V    |
             hot-del   offline

The handlers may not call other handlers directly to exceed their role.
Therefore, the role of the handlers in their modules remains consistent
with their role at the boot-up sequence.  For instance, the ACPI module
may not perform online or offline of system devices.

System Device Hot-plug Operation
================================

Serialized Startup
------------------
The framework provides an interface (hp_submit_req) to request a hot-plug
operation.  All requests are queued to and run on a single work queue.
The framework assures that there is only a single hot-plug or online/
offline operation running at a time.  A single request may however target
to multiple devices.  This makes the execution context of handlers to be
consistent with the boot-up sequence and enables code sharing.

Phased Execution
----------------
The framework proceeds hot-plug and online/offline operations in the 
following three phases.  The modules can register their handlers to each
phase.  The framework also initiates a roll-back operation if any hander
failed in the validate or execute phase.

1) Validate Phase - Handlers validate if they support a given request
without making any changes to target device(s).  They check any known
restrictions and/or prerequisite conditions to their modules, and fail
an unsupported request before making any changes.  For instance, the
memory module may check if a hot-remove request is targeted to movable
ranges.

2) Execute Phase - Handlers make requested change within the scope that
its roll-back is possible in case of a failure.  Execute handlers must
implement their roll-back procedures.

3) Commit Phase - Handlers make the final change that cannot be rolled-back.
For instance, the ACPI module invokes _EJ0 for a hot-remove operation.

System Device Management Modules
================================

CPU Handlers
------------
CPU handlers are provided by the CPU driver in drivers/base/cpu.c, and
perform CPU online/offline procedures when CPU device(s) is added or
deleted during an operation.

Memory Handlers
---------------
Memory handlers are provided by the memory module in mm/memory_hotplug.c,
and perform Memory online/offline procedure when memory device(s) is
added or deleted during an operation.

FW-dependent Modules
====================

ACPI Bus Handlers
-----------------
ACPI bus handlers are provided by the ACPI core in drivers/acpi/bus.c,
and construct/destruct acpi_device object(s) during a hot-plug operation.

ACPI Resource Handlers
----------------------
ACPI resource handlers are provided by the ACPI core in
drivers/acpi/hp_resource.c, and set device resource information to
a request during a hot-plug operation.  This device resource information
is then consumed by the system device management modules for their
online/offline procedure.

ACPI Drivers
------------
ACPI drivers are called from the ACPI core during a hot-plug operation
through the following interfaces.  ACPI drivers are not called from the
framework directly, and remain internal to the ACPI core.  ACPI drivers
may not initiate online/offline of a device.

.add - Construct device-specific information to a given acpi_device.
Called at boot, hot-add and sysfs bind.

.remove - Destruct device-specific information to a given acpi_device.
Called at hot-remove and sysfs unbind.

.resource - Set device-specific resource information to a given hot-plug
request.  Called at hot-add and hot-remove.

---
v2:
 - Documented that system devices may not be initialized through the driver
   structures.
 - Clarified that the framework is for "system device" hotplug by changing
   file name, prefix and documentation.
 - Removed the use of CONFIG_HOTPLUG.
 - Moved ACPI specific definitions to include/acpi/sys_hotplug.h.
 - Implemented shp_unregister_handler() and added locking.
 - Added module parameters, shp_trace and del_movable_only.

---
Toshi Kani (12):
 Add sys_hotplug.h for system device hotplug framework
 ACPI: Add sys_hotplug.h for system device hotplug framework
 drivers/base: Add system device hotplug framework 
 cpu: Add cpu hotplug handlers
 mm: Add memory hotplug handlers
 ACPI: Add ACPI bus hotplug handlers
 ACPI: Add ACPI resource hotplug handler
 ACPI: Update processor driver for hotplug framework
 ACPI: Update memory driver for hotplug framework
 ACPI: Update container driver for hotplug framework
 cpu: Update sysfs cpu/online for hotplug framework
 ACPI: Update sysfs eject for hotplug framework

---
 drivers/acpi/Makefile           |   1 +
 drivers/acpi/acpi_memhotplug.c  | 271 ++++++++++++----------------------
 drivers/acpi/bus.c              | 134 +++++++++++++++++
 drivers/acpi/container.c        |  95 +++++-------
 drivers/acpi/internal.h         |   1 +
 drivers/acpi/processor_driver.c | 150 +++++++++----------
 drivers/acpi/scan.c             | 122 +++-------------
 drivers/acpi/shp_resource.c     |  86 +++++++++++
 drivers/base/Makefile           |   1 +
 drivers/base/cpu.c              | 147 +++++++++++++++++--
 drivers/base/sys_hotplug.c      | 313 ++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h         |   8 +-
 include/acpi/sys_hotplug.h      |  48 ++++++
 include/linux/sys_hotplug.h     | 181 +++++++++++++++++++++++
 mm/memory_hotplug.c             | 101 +++++++++++++
 15 files changed, 1224 insertions(+), 435 deletions(-)

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-11 21:23   ` Rafael J. Wysocki
                     ` (2 more replies)
  2013-01-10 23:40 ` [RFC PATCH v2 02/12] ACPI: " Toshi Kani
                   ` (11 subsequent siblings)
  12 siblings, 3 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added include/linux/sys_hotplug.h, which defines the system device
hotplug framework interfaces used by the framework itself and
handlers.

The order values define the calling sequence of handlers.  For add
execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
CPU so that threads on new CPUs can start using their local memory.
The ordering of the delete execute is symmetric to the add execute.

struct shp_request defines a hot-plug request information.  The
device resource information is managed with a list so that a single
request may target to multiple devices.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 include/linux/sys_hotplug.h |  181 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 181 insertions(+)
 create mode 100644 include/linux/sys_hotplug.h

diff --git a/include/linux/sys_hotplug.h b/include/linux/sys_hotplug.h
new file mode 100644
index 0000000..86674dd
--- /dev/null
+++ b/include/linux/sys_hotplug.h
@@ -0,0 +1,181 @@
+/*
+ * sys_hotplug.h - System device hot-plug framework
+ *
+ * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
+ *	Toshi Kani <toshi.kani@hp.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _LINUX_SYS_HOTPLUG_H
+#define _LINUX_SYS_HOTPLUG_H
+
+#include <linux/list.h>
+#include <linux/device.h>
+
+/*
+ * System device hot-plug operation proceeds in the following order.
+ *   Validate phase -> Execute phase -> Commit phase
+ *
+ * The order values below define the calling sequence of platform
+ * neutral handlers for each phase in ascending order.  The order
+ * values of firmware-specific handlers are defined in sys_hotplug.h
+ * under firmware specific directories.
+ */
+
+/* All order values must be smaller than this value */
+#define SHP_ORDER_MAX				0xffffff
+
+/* Add Validate order values */
+
+/* Add Execute order values */
+#define SHP_MEM_ADD_EXECUTE_ORDER		100
+#define SHP_CPU_ADD_EXECUTE_ORDER		110
+
+/* Add Commit order values */
+
+/* Delete Validate order values */
+#define SHP_CPU_DEL_VALIDATE_ORDER		100
+#define SHP_MEM_DEL_VALIDATE_ORDER		110
+
+/* Delete Execute order values */
+#define SHP_CPU_DEL_EXECUTE_ORDER		10
+#define SHP_MEM_DEL_EXECUTE_ORDER		20
+
+/* Delete Commit order values */
+
+/*
+ * Hot-plug request types
+ */
+#define SHP_REQ_ADD		0x000000
+#define SHP_REQ_DELETE		0x000001
+#define SHP_REQ_MASK		0x0000ff
+
+/*
+ * Hot-plug phase types
+ */
+#define SHP_PH_VALIDATE		0x000000
+#define SHP_PH_EXECUTE		0x000100
+#define SHP_PH_COMMIT		0x000200
+#define SHP_PH_MASK		0x00ff00
+
+/*
+ * Hot-plug operation types
+ */
+#define SHP_OP_HOTPLUG		0x000000
+#define SHP_OP_ONLINE		0x010000
+#define SHP_OP_MASK		0xff0000
+
+/*
+ * Hot-plug phases
+ */
+enum shp_phase {
+	SHP_ADD_VALIDATE	= (SHP_REQ_ADD|SHP_PH_VALIDATE),
+	SHP_ADD_EXECUTE		= (SHP_REQ_ADD|SHP_PH_EXECUTE),
+	SHP_ADD_COMMIT		= (SHP_REQ_ADD|SHP_PH_COMMIT),
+	SHP_DEL_VALIDATE	= (SHP_REQ_DELETE|SHP_PH_VALIDATE),
+	SHP_DEL_EXECUTE		= (SHP_REQ_DELETE|SHP_PH_EXECUTE),
+	SHP_DEL_COMMIT		= (SHP_REQ_DELETE|SHP_PH_COMMIT)
+};
+
+/*
+ * Hot-plug operations
+ */
+enum shp_operation {
+	SHP_HOTPLUG_ADD		= (SHP_OP_HOTPLUG|SHP_REQ_ADD),
+	SHP_HOTPLUG_DEL		= (SHP_OP_HOTPLUG|SHP_REQ_DELETE),
+	SHP_ONLINE_ADD		= (SHP_OP_ONLINE|SHP_REQ_ADD),
+	SHP_ONLINE_DEL		= (SHP_OP_ONLINE|SHP_REQ_DELETE)
+};
+
+/*
+ * Hot-plug device classes
+ */
+enum shp_class {
+	SHP_CLS_INVALID		= 0,
+	SHP_CLS_CPU		= 1,
+	SHP_CLS_MEMORY		= 2,
+	SHP_CLS_HOSTBRIDGE	= 3,
+	SHP_CLS_CONTAINER	= 4,
+};
+
+/*
+ * Hot-plug device information
+ */
+union shp_dev_info {
+	struct shp_cpu {
+		u32		cpu_id;
+	} cpu;
+
+	struct shp_memory {
+		int		node;
+		u64		start_addr;
+		u64		length;
+	} mem;
+
+	struct shp_hostbridge {
+	} hb;
+
+	struct shp_node {
+	} node;
+};
+
+struct shp_device {
+	struct list_head	list;
+	struct device		*device;
+	enum shp_class		class;
+	union shp_dev_info	info;
+};
+
+/*
+ * Hot-plug request
+ */
+struct shp_request {
+	/* common info */
+	enum shp_operation	operation;	/* operation */
+
+	/* hot-plug event info: only valid for hot-plug operations */
+	void			*handle;	/* FW handle */
+	u32			event;		/* FW event */
+
+	/* device resource info */
+	struct list_head	dev_list;	/* shp_device list */
+};
+
+/*
+ * Inline Utility Functions
+ */
+static inline bool shp_is_hotplug_op(enum shp_operation operation)
+{
+	return (operation & SHP_OP_MASK) == SHP_OP_HOTPLUG;
+}
+
+static inline bool shp_is_online_op(enum shp_operation operation)
+{
+	return (operation & SHP_OP_MASK) == SHP_OP_ONLINE;
+}
+
+static inline bool shp_is_add_op(enum shp_operation operation)
+{
+	return (operation & SHP_REQ_MASK) == SHP_REQ_ADD;
+}
+
+static inline bool shp_is_add_phase(enum shp_phase phase)
+{
+	return (phase & SHP_REQ_MASK) == SHP_REQ_ADD;
+}
+
+/*
+ * Externs
+ */
+typedef int (*shp_func)(struct shp_request *req, int rollback);
+extern int shp_register_handler(enum shp_phase phase, shp_func func, u32 order);
+extern int shp_unregister_handler(enum shp_phase phase, shp_func func);
+extern int shp_submit_req(struct shp_request *req);
+extern struct shp_request *shp_alloc_request(enum shp_operation operation);
+extern void shp_add_dev_info(struct shp_request *shp_req,
+		struct shp_device *shp_dev);
+
+#endif	/* _LINUX_SYS_HOTPLUG_H */

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-11 21:25   ` Rafael J. Wysocki
  2013-01-10 23:40 ` [RFC PATCH v2 03/12] drivers/base: Add " Toshi Kani
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added include/acpi/sys_hotplug.h, which is ACPI-specific system
device hotplug header and defines the order values of ACPI-specific
handlers.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
 create mode 100644 include/acpi/sys_hotplug.h

diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
new file mode 100644
index 0000000..ad80f61
--- /dev/null
+++ b/include/acpi/sys_hotplug.h
@@ -0,0 +1,48 @@
+/*
+ * sys_hotplug.h - ACPI System device hot-plug framework
+ *
+ * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
+ *	Toshi Kani <toshi.kani@hp.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _ACPI_SYS_HOTPLUG_H
+#define _ACPI_SYS_HOTPLUG_H
+
+#include <linux/list.h>
+#include <linux/device.h>
+#include <linux/sys_hotplug.h>
+
+/*
+ * System device hot-plug operation proceeds in the following order.
+ *   Validate phase -> Execute phase -> Commit phase
+ *
+ * The order values below define the calling sequence of ACPI-specific
+ * handlers for each phase in ascending order.  The order value of
+ * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
+ */
+
+/* Add Validate order values */
+#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
+
+/* Add Execute order values */
+#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
+#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
+
+/* Add Commit order values */
+#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
+
+/* Delete Validate order values */
+#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
+#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
+
+/* Delete Execute order values */
+#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
+
+/* Delete Commit order values */
+#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
+
+#endif	/* _ACPI_SYS_HOTPLUG_H */

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 03/12] drivers/base: Add system device hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 02/12] ACPI: " Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-30  4:54   ` Greg KH
  2013-01-10 23:40 ` [RFC PATCH v2 04/12] cpu: Add cpu hotplug handlers Toshi Kani
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added sys_hotplug.c, which is the system device hotplug framework code.

shp_register_handler() allows modules to register their hotplug handlers
to the framework.  shp_submit_req() provides the interface to submit
a hotplug or online/offline request of system devices.  The request is
then put into hp_workqueue.  shp_start_req() calls all registered handlers
in ascending order for each phase.  If any handler failed in validate or
execute phase, shp_start_req() initiates its rollback procedure.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/base/Makefile      |    1 
 drivers/base/sys_hotplug.c |  313 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 314 insertions(+)
 create mode 100644 drivers/base/sys_hotplug.c

diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 5aa2d70..2e9b2f1 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -21,6 +21,7 @@ endif
 obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o
 obj-$(CONFIG_REGMAP)	+= regmap/
 obj-$(CONFIG_SOC_BUS) += soc.o
+obj-y			+= sys_hotplug.o
 
 ccflags-$(CONFIG_DEBUG_DRIVER) := -DDEBUG
 
diff --git a/drivers/base/sys_hotplug.c b/drivers/base/sys_hotplug.c
new file mode 100644
index 0000000..c5f5285
--- /dev/null
+++ b/drivers/base/sys_hotplug.c
@@ -0,0 +1,313 @@
+/*
+ * sys_hotplug.c - System device hot-plug framework
+ *
+ * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
+ *	Toshi Kani <toshi.kani@hp.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/workqueue.h>
+#include <linux/sys_hotplug.h>
+#include <linux/kallsyms.h>
+
+/*
+ * Hot-plug handler list
+ */
+struct shp_handler {
+	struct list_head	shp_list;
+	int			shp_order;
+	shp_func		shp_func;
+};
+
+LIST_HEAD(shp_add_list_head);
+LIST_HEAD(shp_del_list_head);
+
+static DEFINE_MUTEX(shp_hdr_list_lock);
+
+#define SHP_VALIDATE_ORDER_BASE		(SHP_ORDER_MAX+1)
+#define SHP_EXECUTE_ORDER_BASE		((SHP_ORDER_MAX+1) << 1)
+#define SHP_COMMIT_ORDER_BASE		((SHP_ORDER_MAX+1) << 2)
+
+/*
+ * Hot-plug request work queue
+ */
+struct shp_work {
+	struct shp_request	*request;
+	struct work_struct	work;
+};
+
+static struct workqueue_struct *shp_workqueue;
+
+/* trace messages */
+static int shp_trace = 1;
+static char shp_ksym_buf[KSYM_NAME_LEN];
+module_param(shp_trace, int, 0644);
+MODULE_PARM_DESC(shp_trace, "Enable system device hot-plug trace messages");
+
+static char *shp_operation_string(enum shp_operation operation)
+{
+	switch (operation) {
+	case SHP_HOTPLUG_ADD:
+		return "Hot-Add";
+	case SHP_HOTPLUG_DEL:
+		return "Hot-Delete";
+	case SHP_ONLINE_ADD:
+		return "Online";
+	case SHP_ONLINE_DEL:
+		return "Offline";
+	}
+
+	return "n/a";
+}
+
+static u32 shp_get_order_base(enum shp_phase phase)
+{
+	switch (phase) {
+	case SHP_ADD_VALIDATE:
+	case SHP_DEL_VALIDATE:
+		return SHP_VALIDATE_ORDER_BASE;
+	case SHP_ADD_EXECUTE:
+	case SHP_DEL_EXECUTE:
+		return SHP_EXECUTE_ORDER_BASE;
+	case SHP_ADD_COMMIT:
+	case SHP_DEL_COMMIT:
+		return SHP_COMMIT_ORDER_BASE;
+	}
+
+	return 0;
+}
+
+/**
+ * shp_register_handler - register a hot-plug handler to the framework
+ * @phase: hot-plug phase
+ * @func: Hot-plug function
+ * @order: Pre-defined order value
+ */
+int shp_register_handler(enum shp_phase phase, shp_func func, u32 order)
+{
+	struct list_head *head;
+	struct shp_handler *hdr, *cur;
+	u32 order_base;
+	int insert = 0;
+
+	if (!func || order > SHP_ORDER_MAX)
+		return -EINVAL;
+
+	if (shp_is_add_phase(phase))
+		head = &shp_add_list_head;
+	else
+		head = &shp_del_list_head;
+
+	order_base = shp_get_order_base(phase);
+
+	hdr = kzalloc(sizeof(*hdr), GFP_KERNEL);
+	if (!hdr)
+		return -ENOMEM;
+
+	hdr->shp_order = order + order_base;
+	hdr->shp_func = func;
+
+	/*
+	 * Add this handler to the list in ascending order
+	 */
+	mutex_lock(&shp_hdr_list_lock);
+	if (list_empty(head)) {
+		list_add(&hdr->shp_list, head);
+	} else {
+		list_for_each_entry(cur, head, shp_list)
+			if (cur->shp_order > hdr->shp_order) {
+				insert = 1;
+				break;
+			}
+
+		if (insert)
+			__list_add(&hdr->shp_list,
+				cur->shp_list.prev, &cur->shp_list);
+		else
+			list_add_tail(&hdr->shp_list, head);
+	}
+	mutex_unlock(&shp_hdr_list_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(shp_register_handler);
+
+/**
+ * shp_unregister_handler - unregister a hot-plug handler from the framework
+ * @phase: hot-plug phase
+ * @func: Hot-plug function
+ */
+int shp_unregister_handler(enum shp_phase phase, shp_func func)
+{
+	struct list_head *head;
+	struct shp_handler *cur;
+
+	if (!func)
+		return -EINVAL;
+
+	if (shp_is_add_phase(phase))
+		head = &shp_add_list_head;
+	else
+		head = &shp_del_list_head;
+
+	/*
+	 * Delete this handler from the list
+	 */
+	mutex_lock(&shp_hdr_list_lock);
+	list_for_each_entry(cur, head, shp_list)
+		if (cur->shp_func == func) {
+			list_del(&cur->shp_list);
+			kfree(cur);
+			break;
+		}
+	mutex_unlock(&shp_hdr_list_lock);
+
+	return 0;
+}
+EXPORT_SYMBOL(shp_unregister_handler);
+
+static void shp_start_req(struct work_struct *work)
+{
+	struct shp_work *shp_work = container_of(work, struct shp_work, work);
+	struct shp_request *req = shp_work->request;
+	struct shp_handler *hdr;
+	struct shp_device *shp_dev, *tmp;
+	struct list_head *head;
+	int rollback = 0;
+	int ret;
+
+	if (shp_is_add_op(req->operation))
+		head = &shp_add_list_head;
+	else
+		head = &shp_del_list_head;
+
+	if (shp_trace)
+		pr_info("Starting %s Operation\n",
+				shp_operation_string(req->operation));
+
+	/*
+	 * Call hot-plug handlers in the list
+	 */
+	mutex_lock(&shp_hdr_list_lock);
+	list_for_each_entry(hdr, head, shp_list) {
+		if (shp_trace)
+			pr_info("-> %s\n",
+				kallsyms_lookup((unsigned long)hdr->shp_func,
+					NULL, NULL, NULL, shp_ksym_buf));
+
+		ret = hdr->shp_func(req, 0);
+		if (ret) {
+			if (hdr->shp_order < SHP_COMMIT_ORDER_BASE) {
+				if (shp_trace)
+					pr_info("Initiating Rollback\n");
+				rollback = 1;
+				break;
+			} else {
+				pr_err("Commit handler failed: continuing\n");
+				continue;
+			}
+		}
+	}
+
+	/*
+	 * If rollback is requested, call hot-plug handlers in the reversed
+	 * order from the failed handler.  The failed handler is not called
+	 * again.
+	 */
+	if (rollback) {
+		list_for_each_entry_continue_reverse(hdr, head, shp_list) {
+			if (shp_trace)
+				pr_info("RB-> %s\n",
+					kallsyms_lookup(
+					   (unsigned long)hdr->shp_func,
+					   NULL, NULL, NULL, shp_ksym_buf));
+
+			ret = hdr->shp_func(req, 1);
+			if (ret)
+				pr_err("Rollback handler failed: continuing\n");
+		}
+	}
+	mutex_unlock(&shp_hdr_list_lock);
+
+	/* free up the hot-plug request information */
+	list_for_each_entry_safe(shp_dev, tmp, &req->dev_list, list) {
+		list_del(&shp_dev->list);
+		kfree(shp_dev);
+	}
+	kfree(req);
+	kfree(shp_work);
+}
+
+/**
+ * shp_submit_req - submit a hot-plug request
+ * @req: Hot-plug request pointer
+ */
+int shp_submit_req(struct shp_request *req)
+{
+	struct shp_work *shp_work;
+
+	shp_work = kzalloc(sizeof(*shp_work), GFP_KERNEL);
+	if (!shp_work)
+		return -ENOMEM;
+
+	shp_work->request = req;
+	INIT_WORK(&shp_work->work, shp_start_req);
+
+	if (!queue_work(shp_workqueue, &shp_work->work)) {
+		kfree(shp_work);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(shp_submit_req);
+
+/**
+ * shp_alloc_request - allocate a hot-plug request
+ * @operation: Hot-plug operation
+ */
+struct shp_request *shp_alloc_request(enum shp_operation operation)
+{
+	struct shp_request *shp_req;
+
+	shp_req = kzalloc(sizeof(*shp_req), GFP_KERNEL);
+	if (!shp_req)
+		return NULL;
+
+	shp_req->operation = operation;
+	INIT_LIST_HEAD(&shp_req->dev_list);
+
+	return shp_req;
+}
+EXPORT_SYMBOL(shp_alloc_request);
+
+/**
+ * shp_add_dev_info - add shp_device to the hotplug request
+ * @shp_req: hot-plug request pointer
+ * @shp_dev: hot-plug device info pointer
+ */
+void shp_add_dev_info(struct shp_request *shp_req, struct shp_device *shp_dev)
+{
+	list_add_tail(&shp_dev->list, &shp_req->dev_list);
+}
+EXPORT_SYMBOL(shp_add_dev_info);
+
+static int __init shp_init(void)
+{
+	/*
+	 * Allocate shp_workqueue with max_active set to 1.  This serializes
+	 * hot-plug and online/offline operations on the workqueue.
+	 */
+	shp_workqueue = alloc_workqueue("hotplug", 0, 1);
+
+	return 0;
+}
+device_initcall(shp_init);

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 04/12] cpu: Add cpu hotplug handlers
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (2 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 03/12] drivers/base: Add " Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 05/12] mm: Add memory " Toshi Kani
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added cpu hotplug handlers.  cpu_add_execute() onlines requested
cpus for hot-add & online operations, and cpu_del_execute()
offlines them for hot-delete & offline operations.  They are
also used for rollback as well.

cpu_del_validate() fails a request if cpu0 is requested to delete.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/base/cpu.c |  107 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 107 insertions(+)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 6345294..05534ad 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -13,6 +13,8 @@
 #include <linux/gfp.h>
 #include <linux/slab.h>
 #include <linux/percpu.h>
+#include <linux/list.h>
+#include <linux/sys_hotplug.h>
 
 #include "base.h"
 
@@ -324,10 +326,115 @@ static void __init cpu_dev_register_generic(void)
 #endif
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+static int cpu_del_execute(struct shp_request *req, int rollback);
+
+static int cpu_add_execute(struct shp_request *req, int rollback)
+{
+	struct shp_device *shp_dev;
+	u32 cpu;
+	int ret;
+
+	if (rollback)
+		return cpu_del_execute(req, 0);
+
+	list_for_each_entry(shp_dev, &req->dev_list, list) {
+		if (shp_dev->class != SHP_CLS_CPU)
+			continue;
+
+		cpu = shp_dev->info.cpu.cpu_id;
+
+		if (cpu_online(cpu))
+			continue;
+
+		ret = cpu_up(cpu);
+		if (!ret) {
+			/* REVISIT: need a way to set a cpu dev for hot-plug */
+			if (shp_is_online_op(req->operation))
+				kobject_uevent(&shp_dev->device->kobj,
+							KOBJ_ONLINE);
+		} else {
+			pr_err("cpu: Failed to online cpu %d\n", cpu);
+			/* fall-thru */
+		}
+	}
+
+	return 0;
+}
+
+static int cpu_del_validate(struct shp_request *req, int rollback)
+{
+	struct shp_device *shp_dev;
+
+	if (rollback)
+		return 0;
+
+	list_for_each_entry(shp_dev, &req->dev_list, list) {
+		if (shp_dev->class != SHP_CLS_CPU)
+			continue;
+
+		/*
+		 * cpu 0 cannot be offlined.  This check can be removed when
+		 * cpu 0 offline is supported.
+		 */
+		if (shp_dev->info.cpu.cpu_id == 0)
+			return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int cpu_del_execute(struct shp_request *req, int rollback)
+{
+	struct shp_device *shp_dev;
+	u32 cpu;
+	int ret;
+
+	if (rollback)
+		return cpu_add_execute(req, 0);
+
+	list_for_each_entry(shp_dev, &req->dev_list, list) {
+		if (shp_dev->class != SHP_CLS_CPU)
+			continue;
+
+		cpu = shp_dev->info.cpu.cpu_id;
+
+		if (!cpu_online(cpu))
+			continue;
+
+		ret = cpu_down(cpu);
+		if (!ret) {
+			/* REVISIT: need a way to set a cpu dev for hot-plug */
+			if (shp_is_online_op(req->operation))
+				kobject_uevent(&shp_dev->device->kobj,
+							KOBJ_OFFLINE);
+		} else {
+			pr_err("cpu: Failed to offline cpu %d\n", cpu);
+			return ret;
+		}
+	}
+
+	return 0;
+}
+
+static void __init cpu_shp_init(void)
+{
+	shp_register_handler(SHP_ADD_EXECUTE, cpu_add_execute,
+				SHP_CPU_ADD_EXECUTE_ORDER);
+	shp_register_handler(SHP_DEL_VALIDATE, cpu_del_validate,
+				SHP_CPU_DEL_VALIDATE_ORDER);
+	shp_register_handler(SHP_DEL_EXECUTE, cpu_del_execute,
+				SHP_CPU_DEL_EXECUTE_ORDER);
+}
+#endif	/* CONFIG_HOTPLUG_CPU */
+
 void __init cpu_dev_init(void)
 {
 	if (subsys_system_register(&cpu_subsys, cpu_root_attr_groups))
 		panic("Failed to register CPU subsystem");
 
 	cpu_dev_register_generic();
+#ifdef CONFIG_HOTPLUG_CPU
+	cpu_shp_init();
+#endif
 }

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 05/12] mm: Add memory hotplug handlers
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (3 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 04/12] cpu: Add cpu hotplug handlers Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 06/12] ACPI: Add ACPI bus " Toshi Kani
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added memory hotplug handlers.  mm_add_execute() onlines requested
memory ranges for hot-add & online operations, and mm_del_execute()
offlines them for hot-delete & offline operations.  They are also
used for rollback as well.

mm_del_validate() fails a hot-delete request if a requested memory
range is non-movable when del_movable_only is set.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 mm/memory_hotplug.c |  101 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d04ed87..ed3d829 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -29,6 +29,8 @@
 #include <linux/suspend.h>
 #include <linux/mm_inline.h>
 #include <linux/firmware-map.h>
+#include <linux/module.h>
+#include <linux/sys_hotplug.h>
 
 #include <asm/tlbflush.h>
 
@@ -45,6 +47,13 @@ static void generic_online_page(struct page *page);
 
 static online_page_callback_t online_page_callback = generic_online_page;
 
+static int mm_add_execute(struct shp_request *req, int rollback);
+static int mm_del_execute(struct shp_request *req, int rollback);
+
+static int del_movable_only = 0;
+module_param(del_movable_only, int, 0644);
+MODULE_PARM_DESC(del_movable_only, "Restrict hot-remove to movable memory only");
+
 DEFINE_MUTEX(mem_hotplug_mutex);
 
 void lock_memory_hotplug(void)
@@ -1431,3 +1440,95 @@ int remove_memory(u64 start, u64 size)
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
 EXPORT_SYMBOL_GPL(remove_memory);
+
+static int mm_add_execute(struct shp_request *req, int rollback)
+{
+	struct shp_device *shp_dev;
+	struct shp_memory *shp_mem;
+	int ret;
+
+	if (rollback)
+		return mm_del_execute(req, 0);
+
+	list_for_each_entry(shp_dev, &req->dev_list, list) {
+		if (shp_dev->class != SHP_CLS_MEMORY)
+			continue;
+
+		shp_mem = &shp_dev->info.mem;
+
+		ret = add_memory(shp_mem->node,
+				shp_mem->start_addr, shp_mem->length);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int mm_del_validate(struct shp_request *req, int rollback)
+{
+	struct shp_device *shp_dev;
+	struct shp_memory *shp_mem;
+	unsigned long start_pfn, nr_pages;
+
+	if (rollback || !del_movable_only)
+		return 0;
+
+	list_for_each_entry(shp_dev, &req->dev_list, list) {
+		if (shp_dev->class != SHP_CLS_MEMORY)
+			continue;
+
+		shp_mem = &shp_dev->info.mem;
+		start_pfn = shp_mem->start_addr >> PAGE_SHIFT;
+		nr_pages = PAGE_ALIGN(shp_mem->length) >> PAGE_SHIFT;
+
+		/*
+		 * Check if this memory range is removable.  This check is
+		 * enabled when del_movable_only is set.
+		 */
+		if (is_mem_section_removable(start_pfn, nr_pages)) {
+			pr_info("Memory [%#010llx-%#010llx] not removable\n",
+				shp_mem->start_addr,
+				shp_mem->start_addr + shp_mem->length-1);
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
+static int mm_del_execute(struct shp_request *req, int rollback)
+{
+	struct shp_device *shp_dev;
+	struct shp_memory *shp_mem;
+	int ret;
+
+	if (rollback)
+		return mm_add_execute(req, 0);
+
+	list_for_each_entry(shp_dev, &req->dev_list, list) {
+		if (shp_dev->class != SHP_CLS_MEMORY)
+			continue;
+
+		shp_mem = &shp_dev->info.mem;
+
+		ret = remove_memory(shp_mem->start_addr, shp_mem->length);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int __init mm_shp_init(void)
+{
+	shp_register_handler(SHP_ADD_EXECUTE, mm_add_execute,
+				SHP_MEM_ADD_EXECUTE_ORDER);
+	shp_register_handler(SHP_DEL_VALIDATE, mm_del_validate,
+				SHP_MEM_DEL_VALIDATE_ORDER);
+	shp_register_handler(SHP_DEL_EXECUTE, mm_del_execute,
+				SHP_MEM_DEL_EXECUTE_ORDER);
+
+	return 0;
+}
+module_init(mm_shp_init);

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 06/12] ACPI: Add ACPI bus hotplug handlers
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (4 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 05/12] mm: Add memory " Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 07/12] ACPI: Add ACPI resource hotplug handler Toshi Kani
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added ACPI bus hotplug handlers.  acpi_add_execute() calls
acpi_bus_add() to construct new acpi_device objects for hot-add
operation, and acpi_del_execute() calls acpi_bus_trim() to destruct
them for hot-delete operation.  They are also used for rollback
as well.

acpi_del_commit() calls _EJ0 to eject a target object for hot-delete.

acpi_validate_ost() calls _OST to inform FW that a hot-plug operation
completed with error in case of rollback.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/bus.c |  133 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 1f0d457..31a1911 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -42,6 +42,7 @@
 #include <acpi/apei.h>
 #include <linux/dmi.h>
 #include <linux/suspend.h>
+#include <acpi/sys_hotplug.h>
 
 #include "internal.h"
 
@@ -52,6 +53,9 @@ struct acpi_device *acpi_root;
 struct proc_dir_entry *acpi_root_dir;
 EXPORT_SYMBOL(acpi_root_dir);
 
+static int acpi_add_execute(struct shp_request *req, int rollback);
+static int acpi_del_execute(struct shp_request *req, int rollback);
+
 #define STRUCT_TO_INT(s)	(*((int*)&s))
 
 
@@ -859,6 +863,134 @@ static void acpi_bus_notify(acpi_handle handle, u32 type, void *data)
 }
 
 /* --------------------------------------------------------------------------
+			Hot-plug Handling
+   -------------------------------------------------------------------------- */
+
+static int acpi_validate_ost(struct shp_request *req, int rollback)
+{
+	/* If hotplug request failed, inform firmware with error */
+	if (rollback && shp_is_hotplug_op(req->operation))
+		(void) acpi_evaluate_hotplug_ost(req->handle, req->event,
+				ACPI_OST_SC_NON_SPECIFIC_FAILURE, NULL);
+
+	return 0;
+}
+
+static int acpi_add_execute(struct shp_request *req, int rollback)
+{
+	acpi_handle handle = (acpi_handle) req->handle;
+	acpi_handle phandle;
+	struct acpi_device *device = NULL;
+	struct acpi_device *pdev;
+	int ret;
+
+	if (rollback)
+		return acpi_del_execute(req, 0);
+
+	/* only handle hot-plug operation */
+	if (!shp_is_hotplug_op(req->operation))
+		return 0;
+
+	if (acpi_get_parent(handle, &phandle))
+		return -ENODEV;
+
+	if (acpi_bus_get_device(phandle, &pdev))
+		return -ENODEV;
+
+	ret = acpi_bus_add(&device, pdev, handle, ACPI_BUS_TYPE_DEVICE);
+
+	return ret;
+}
+
+static int acpi_add_commit(struct shp_request *req, int rollback)
+{
+	/* Inform firmware that the hotplug operation has completed */
+	(void) acpi_evaluate_hotplug_ost(req->handle, req->event,
+					ACPI_OST_SC_SUCCESS, NULL);
+
+	return 0;
+}
+
+static int acpi_del_execute(struct shp_request *req, int rollback)
+{
+	acpi_handle handle = (acpi_handle) req->handle;
+	struct acpi_device *device;
+
+	if (rollback)
+		return acpi_add_execute(req, 0);
+
+	/* only handle hot-plug operation */
+	if (!shp_is_hotplug_op(req->operation))
+		return 0;
+
+	if (acpi_bus_get_device(handle, &device)) {
+		acpi_handle_err(handle, "Failed to obtain device\n");
+		return -EINVAL;
+	}
+
+	if (acpi_bus_trim(device, 1)) {
+		dev_err(&device->dev, "Removing device failed\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int acpi_del_commit(struct shp_request *req, int rollback)
+{
+	acpi_handle handle = (acpi_handle) req->handle;
+	acpi_handle temp;
+	struct acpi_object_list arg_list;
+	union acpi_object arg;
+	acpi_status status;
+
+	/* only handle hot-plug operation */
+	if (!shp_is_hotplug_op(req->operation))
+		return 0;
+
+	/* power off device */
+	status = acpi_evaluate_object(handle, "_PS3", NULL, NULL);
+	if (ACPI_FAILURE(status) && status != AE_NOT_FOUND)
+		acpi_handle_warn(handle, "Power-off device failed\n");
+
+	if (ACPI_SUCCESS(acpi_get_handle(handle, "_LCK", &temp))) {
+		arg_list.count = 1;
+		arg_list.pointer = &arg;
+		arg.type = ACPI_TYPE_INTEGER;
+		arg.integer.value = 0;
+		acpi_evaluate_object(handle, "_LCK", &arg_list, NULL);
+	}
+
+	arg_list.count = 1;
+	arg_list.pointer = &arg;
+	arg.type = ACPI_TYPE_INTEGER;
+	arg.integer.value = 1;
+
+	status = acpi_evaluate_object(handle, "_EJ0", &arg_list, NULL);
+	if (ACPI_FAILURE(status) && (status != AE_NOT_FOUND))
+			acpi_handle_warn(handle, "Eject device failed\n");
+
+	return 0;
+}
+
+static void __init acpi_shp_init(void)
+{
+	shp_register_handler(SHP_ADD_VALIDATE, acpi_validate_ost,
+				SHP_ACPI_BUS_ADD_VALIDATE_ORDER);
+	shp_register_handler(SHP_ADD_EXECUTE, acpi_add_execute,
+				SHP_ACPI_BUS_ADD_EXECUTE_ORDER);
+	shp_register_handler(SHP_ADD_COMMIT, acpi_add_commit,
+				SHP_ACPI_BUS_ADD_COMMIT_ORDER);
+
+	shp_register_handler(SHP_DEL_VALIDATE, acpi_validate_ost,
+				SHP_ACPI_BUS_DEL_VALIDATE_ORDER);
+	shp_register_handler(SHP_DEL_EXECUTE, acpi_del_execute,
+				SHP_ACPI_BUS_DEL_EXECUTE_ORDER);
+	shp_register_handler(SHP_DEL_COMMIT, acpi_del_commit,
+				SHP_ACPI_BUS_DEL_COMMIT_ORDER);
+}
+
+/* --------------------------------------------------------------------------
                              Initialization/Cleanup
    -------------------------------------------------------------------------- */
 
@@ -1103,6 +1235,7 @@ static int __init acpi_init(void)
 	acpi_debugfs_init();
 	acpi_sleep_proc_init();
 	acpi_wakeup_device_init();
+	acpi_shp_init();
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 07/12] ACPI: Add ACPI resource hotplug handler
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (5 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 06/12] ACPI: Add ACPI bus " Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 08/12] ACPI: Update processor driver for hotplug framework Toshi Kani
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added ACPI resource handler for hotplug operations.  The handler,
acpi_set_shp_device(), sets device resource information to a hotplug
request, which is consumed by the CPU and memory handlers.
For setting the device resource information, acpi_scan_shp_devices()
walks the acpi_device tree from a target device, and calls .resource
of ACPI drivers.

For hot-add, acpi_set_shp_device() is called right after the ACPI bus
handler so that it can walk through new acpi_device objects.  For
hot-delete, it is called at the begging of the validate phase so that
other validate handlers can use the device resource information for
their validations.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/Makefile       |    1 +
 drivers/acpi/bus.c          |    1 +
 drivers/acpi/internal.h     |    1 +
 drivers/acpi/shp_resource.c |   86 +++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h     |    4 ++
 5 files changed, 93 insertions(+)
 create mode 100644 drivers/acpi/shp_resource.c

diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 2a4502b..205be23 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -34,6 +34,7 @@ acpi-$(CONFIG_ACPI_SLEEP)	+= proc.o
 acpi-y				+= bus.o glue.o
 acpi-y				+= scan.o
 acpi-y				+= resource.o
+acpi-y				+= shp_resource.o
 acpi-y				+= processor_core.o
 acpi-y				+= ec.o
 acpi-$(CONFIG_ACPI_DOCK)	+= dock.o
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index 31a1911..69b5edb 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -1236,6 +1236,7 @@ static int __init acpi_init(void)
 	acpi_sleep_proc_init();
 	acpi_wakeup_device_init();
 	acpi_shp_init();
+	acpi_shp_res_init();
 	return 0;
 }
 
diff --git a/drivers/acpi/internal.h b/drivers/acpi/internal.h
index 3c407cd..51aa740 100644
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -26,6 +26,7 @@
 int init_acpi_device_notify(void);
 int acpi_scan_init(void);
 int acpi_sysfs_init(void);
+void acpi_shp_res_init(void);
 
 #ifdef CONFIG_DEBUG_FS
 extern struct dentry *acpi_debugfs_dir;
diff --git a/drivers/acpi/shp_resource.c b/drivers/acpi/shp_resource.c
new file mode 100644
index 0000000..51ab968
--- /dev/null
+++ b/drivers/acpi/shp_resource.c
@@ -0,0 +1,86 @@
+/*
+ * shp_resource.c - Setup system device hot-plug resource information
+ *
+ * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
+ *	Toshi Kani <toshi.kani@hp.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/kernel.h>
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/device.h>
+#include <linux/list.h>
+#include <linux/acpi.h>
+#include <acpi/sys_hotplug.h>
+
+#include "internal.h"
+
+static int
+acpi_set_shp_device(struct acpi_device *device, struct shp_request *req)
+{
+	int ret;
+
+	if (!device->driver) {
+		dev_dbg(&device->dev, "driver not bound\n");
+		return 0;
+	}
+
+	if (!device->driver->ops.resource)
+		return 0;
+
+	ret = device->driver->ops.resource(device, req);
+	if (ret) {
+		dev_err(&device->dev, "ops.resource failed (%d)\n", ret);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int
+acpi_scan_shp_devices(struct acpi_device *device, struct shp_request *req)
+{
+	struct acpi_device *child = NULL;
+
+	if (acpi_set_shp_device(device, req))
+		return 0;
+
+	list_for_each_entry(child, &device->children, node)
+		acpi_scan_shp_devices(child, req);
+
+	return 0;
+}
+
+static int acpi_set_shp_resources(struct shp_request *req, int rollback)
+{
+	acpi_handle handle = (acpi_handle) req->handle;
+	struct acpi_device *device = NULL;
+
+	if (rollback)
+		return 0;
+
+	/* only handle hot-plug operation */
+	if (!shp_is_hotplug_op(req->operation))
+		return 0;
+
+	if (acpi_bus_get_device(handle, &device)) {
+		acpi_handle_err(handle, "acpi_bus_get_device failed\n");
+		return -EINVAL;
+	}
+
+	acpi_scan_shp_devices(device, req);
+
+	return 0;
+}
+
+void __init acpi_shp_res_init(void)
+{
+	shp_register_handler(SHP_ADD_EXECUTE, acpi_set_shp_resources,
+				SHP_ACPI_RES_ADD_EXECUTE_ORDER);
+	shp_register_handler(SHP_DEL_VALIDATE, acpi_set_shp_resources,
+				SHP_ACPI_RES_DEL_VALIDATE_ORDER);
+}
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 7ced5dc..6bf002e 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -27,6 +27,7 @@
 #define __ACPI_BUS_H__
 
 #include <linux/device.h>
+#include <acpi/sys_hotplug.h>
 
 #include <acpi/acpi.h>
 
@@ -94,6 +95,8 @@ typedef int (*acpi_op_start) (struct acpi_device * device);
 typedef int (*acpi_op_bind) (struct acpi_device * device);
 typedef int (*acpi_op_unbind) (struct acpi_device * device);
 typedef void (*acpi_op_notify) (struct acpi_device * device, u32 event);
+typedef int (*acpi_op_resource) (struct acpi_device *device,
+			struct shp_request *shp_req);
 
 struct acpi_bus_ops {
 	u32 acpi_op_add:1;
@@ -107,6 +110,7 @@ struct acpi_device_ops {
 	acpi_op_bind bind;
 	acpi_op_unbind unbind;
 	acpi_op_notify notify;
+	acpi_op_resource resource;
 };
 
 #define ACPI_DRIVER_ALL_NOTIFY_EVENTS	0x1	/* system AND device events */

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 08/12] ACPI: Update processor driver for hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (6 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 07/12] ACPI: Add ACPI resource hotplug handler Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 09/12] ACPI: Update memory " Toshi Kani
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Added acpi_processor_resource() for the .resource() interface,
which sets CPU information to a hotplug request.

Changed acpi_processor_hotplug_notify() to request a hotplug
operation by calling shp_submit_req().  It no longer initiates
hot-add or hot-delete operation by calling acpi_bus_add() or
acpi_bus_hot_remove_device() directly.

acpi_processor_handle_eject() is changed not to call cpu_down()
since .add() / .remove() may not online / offline a device.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/processor_driver.c |  150 ++++++++++++++++++---------------------
 1 file changed, 70 insertions(+), 80 deletions(-)

diff --git a/drivers/acpi/processor_driver.c b/drivers/acpi/processor_driver.c
index e83311b..f630c2c 100644
--- a/drivers/acpi/processor_driver.c
+++ b/drivers/acpi/processor_driver.c
@@ -57,6 +57,7 @@
 #include <acpi/acpi_bus.h>
 #include <acpi/acpi_drivers.h>
 #include <acpi/processor.h>
+#include <acpi/sys_hotplug.h>
 
 #define PREFIX "ACPI: "
 
@@ -83,6 +84,8 @@ MODULE_LICENSE("GPL");
 static int acpi_processor_add(struct acpi_device *device);
 static int acpi_processor_remove(struct acpi_device *device, int type);
 static void acpi_processor_notify(struct acpi_device *device, u32 event);
+static int acpi_processor_resource(struct acpi_device *device,
+		struct shp_request *shp_req);
 static acpi_status acpi_processor_hotadd_init(struct acpi_processor *pr);
 static int acpi_processor_handle_eject(struct acpi_processor *pr);
 static int acpi_processor_start(struct acpi_processor *pr);
@@ -105,6 +108,7 @@ static struct acpi_driver acpi_processor_driver = {
 		.add = acpi_processor_add,
 		.remove = acpi_processor_remove,
 		.notify = acpi_processor_notify,
+		.resource = acpi_processor_resource,
 		},
 	.drv.pm = &acpi_processor_pm,
 };
@@ -649,6 +653,33 @@ free:
 	return 0;
 }
 
+static int
+acpi_processor_resource(struct acpi_device *device, struct shp_request *shp_req)
+{
+	struct acpi_processor *pr;
+	struct shp_device *shp_dev;
+
+	pr = acpi_driver_data(device);
+	if (!pr) {
+		dev_err(&device->dev, "Driver data missing\n");
+		return -EINVAL;
+	}
+
+	shp_dev = kzalloc(sizeof(*shp_dev), GFP_KERNEL);
+	if (!shp_dev) {
+		dev_err(&device->dev, "Failed to allocate shp_dev\n");
+		return -EINVAL;
+	}
+
+	shp_dev->device = &device->dev;
+	shp_dev->class = SHP_CLS_CPU;
+	shp_dev->info.cpu.cpu_id = pr->id;
+
+	shp_add_dev_info(shp_req, shp_dev);
+
+	return 0;
+}
+
 #ifdef CONFIG_ACPI_HOTPLUG_CPU
 /****************************************************************************
  * 	Acpi processor hotplug support 				       	    *
@@ -677,97 +708,68 @@ static int is_processor_present(acpi_handle handle)
 	return 0;
 }
 
-static
-int acpi_processor_device_add(acpi_handle handle, struct acpi_device **device)
-{
-	acpi_handle phandle;
-	struct acpi_device *pdev;
-
-
-	if (acpi_get_parent(handle, &phandle)) {
-		return -ENODEV;
-	}
-
-	if (acpi_bus_get_device(phandle, &pdev)) {
-		return -ENODEV;
-	}
-
-	if (acpi_bus_add(device, pdev, handle, ACPI_BUS_TYPE_PROCESSOR)) {
-		return -ENODEV;
-	}
-
-	return 0;
-}
-
 static void acpi_processor_hotplug_notify(acpi_handle handle,
 					  u32 event, void *data)
 {
 	struct acpi_device *device = NULL;
-	struct acpi_eject_event *ej_event = NULL;
+	struct shp_request *shp_req;
+	enum shp_operation shp_op;
 	u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */
-	int result;
 
 	switch (event) {
 	case ACPI_NOTIFY_BUS_CHECK:
 	case ACPI_NOTIFY_DEVICE_CHECK:
-		ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-		"Processor driver received %s event\n",
-		       (event == ACPI_NOTIFY_BUS_CHECK) ?
-		       "ACPI_NOTIFY_BUS_CHECK" : "ACPI_NOTIFY_DEVICE_CHECK"));
-
-		if (!is_processor_present(handle))
-			break;
-
-		if (!acpi_bus_get_device(handle, &device))
-			break;
+		if (!is_processor_present(handle)) {
+			acpi_handle_err(handle, "Device not enabled\n");
+			goto err;
+		}
 
-		result = acpi_processor_device_add(handle, &device);
-		if (result) {
-			acpi_handle_err(handle, "Unable to add the device\n");
-			break;
+		if (!acpi_bus_get_device(handle, &device)) {
+			acpi_handle_err(handle, "Device added already\n");
+			goto err;
 		}
 
-		ost_code = ACPI_OST_SC_SUCCESS;
+		shp_op = SHP_HOTPLUG_ADD;
 		break;
 
 	case ACPI_NOTIFY_EJECT_REQUEST:
-		ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-				  "received ACPI_NOTIFY_EJECT_REQUEST\n"));
-
 		if (acpi_bus_get_device(handle, &device)) {
-			acpi_handle_err(handle,
-				"Device don't exist, dropping EJECT\n");
-			break;
+			acpi_handle_err(handle, "Device not added yet\n");
+			goto err;
 		}
 		if (!acpi_driver_data(device)) {
-			acpi_handle_err(handle,
-				"Driver data is NULL, dropping EJECT\n");
-			break;
+			acpi_handle_err(handle, "Driver data missing\n");
+			goto err;
 		}
 
-		ej_event = kmalloc(sizeof(*ej_event), GFP_KERNEL);
-		if (!ej_event) {
-			acpi_handle_err(handle, "No memory, dropping EJECT\n");
-			break;
-		}
-
-		ej_event->handle = handle;
-		ej_event->event = ACPI_NOTIFY_EJECT_REQUEST;
-		acpi_os_hotplug_execute(acpi_bus_hot_remove_device,
-					(void *)ej_event);
-
-		/* eject is performed asynchronously */
-		return;
+		shp_op = SHP_HOTPLUG_DEL;
+		break;
 
 	default:
 		ACPI_DEBUG_PRINT((ACPI_DB_INFO,
 				  "Unsupported event [0x%x]\n", event));
-
-		/* non-hotplug event; possibly handled by other handler */
 		return;
 	}
 
-	/* Inform firmware that the hotplug operation has completed */
+	shp_req = shp_alloc_request(shp_op);
+	if (!shp_req) {
+		acpi_handle_err(handle, "No memory to request hotplug\n");
+		goto err;
+	}
+
+	shp_req->handle = (void *)handle;
+	shp_req->event = event;
+
+	if (shp_submit_req(shp_req)) {
+		acpi_handle_err(handle, "Failed to request hotplug\n");
+		kfree(shp_req);
+		goto err;
+	}
+
+	return;
+
+err:
+	/* Inform firmware that the hotplug operation completed w/ error */
 	(void) acpi_evaluate_hotplug_ost(handle, event, ost_code, NULL);
 	return;
 }
@@ -865,25 +867,13 @@ static acpi_status acpi_processor_hotadd_init(struct acpi_processor *pr)
 
 static int acpi_processor_handle_eject(struct acpi_processor *pr)
 {
-	if (cpu_online(pr->id))
-		cpu_down(pr->id);
-
-	get_online_cpus();
-	/*
-	 * The cpu might become online again at this point. So we check whether
-	 * the cpu has been onlined or not. If the cpu became online, it means
-	 * that someone wants to use the cpu. So acpi_processor_handle_eject()
-	 * returns -EAGAIN.
-	 */
-	if (unlikely(cpu_online(pr->id))) {
-		put_online_cpus();
-		pr_warn("Failed to remove CPU %d, because other task "
-			"brought the CPU back online\n", pr->id);
-		return -EAGAIN;
+	if (cpu_online(pr->id)) {
+		pr_err("ACPI: cpu %d not off-lined\n", pr->id);
+		return -EINVAL;
 	}
+
 	arch_unregister_cpu(pr->id);
 	acpi_unmap_lsapic(pr->id);
-	put_online_cpus();
 	return (0);
 }
 #else

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 09/12] ACPI: Update memory driver for hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (7 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 08/12] ACPI: Update processor driver for hotplug framework Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 10/12] ACPI: Update container " Toshi Kani
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Changed acpi_memory_device_notify() to request a hotplug operation
by calling shp_submit_req().  It no longer initiates hot-add or
hot-delete operation by calling add_memory() or remove_memory()
directly.  Removed the enabled and failed flags from acpi_memory_info
since they are no longer used.

Changed acpi_memory_device_add() to not call add_memory() to online
a memory device.  Similarly, changed acpi_memory_device_remove()
to not call remove_memory() to offline a memory device.

Added acpi_memory_resource() to set memory information to a hotplug
request.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/acpi_memhotplug.c |  271 +++++++++++++---------------------------
 1 file changed, 89 insertions(+), 182 deletions(-)

diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index b679bf8..67868f5 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -33,6 +33,7 @@
 #include <linux/slab.h>
 #include <linux/acpi.h>
 #include <acpi/acpi_drivers.h>
+#include <acpi/sys_hotplug.h>
 
 #define ACPI_MEMORY_DEVICE_CLASS		"memory"
 #define ACPI_MEMORY_DEVICE_HID			"PNP0C80"
@@ -55,6 +56,8 @@ MODULE_LICENSE("GPL");
 
 static int acpi_memory_device_add(struct acpi_device *device);
 static int acpi_memory_device_remove(struct acpi_device *device, int type);
+static int acpi_memory_device_resource(struct acpi_device *device,
+		struct shp_request *shp_req);
 
 static const struct acpi_device_id memory_device_ids[] = {
 	{ACPI_MEMORY_DEVICE_HID, 0},
@@ -69,6 +72,7 @@ static struct acpi_driver acpi_memory_device_driver = {
 	.ops = {
 		.add = acpi_memory_device_add,
 		.remove = acpi_memory_device_remove,
+		.resource = acpi_memory_device_resource,
 		},
 };
 
@@ -153,59 +157,12 @@ acpi_memory_get_device_resources(struct acpi_memory_device *mem_device)
 	return 0;
 }
 
-static int
-acpi_memory_get_device(acpi_handle handle,
-		       struct acpi_memory_device **mem_device)
-{
-	acpi_status status;
-	acpi_handle phandle;
-	struct acpi_device *device = NULL;
-	struct acpi_device *pdevice = NULL;
-	int result;
-
-
-	if (!acpi_bus_get_device(handle, &device) && device)
-		goto end;
-
-	status = acpi_get_parent(handle, &phandle);
-	if (ACPI_FAILURE(status)) {
-		ACPI_EXCEPTION((AE_INFO, status, "Cannot find acpi parent"));
-		return -EINVAL;
-	}
-
-	/* Get the parent device */
-	result = acpi_bus_get_device(phandle, &pdevice);
-	if (result) {
-		acpi_handle_warn(phandle, "Cannot get acpi bus device\n");
-		return -EINVAL;
-	}
-
-	/*
-	 * Now add the notified device.  This creates the acpi_device
-	 * and invokes .add function
-	 */
-	result = acpi_bus_add(&device, pdevice, handle, ACPI_BUS_TYPE_DEVICE);
-	if (result) {
-		acpi_handle_warn(handle, "Cannot add acpi bus\n");
-		return -EINVAL;
-	}
-
-      end:
-	*mem_device = acpi_driver_data(device);
-	if (!(*mem_device)) {
-		dev_err(&device->dev, "driver data not found\n");
-		return -ENODEV;
-	}
-
-	return 0;
-}
-
-static int acpi_memory_check_device(struct acpi_memory_device *mem_device)
+static int acpi_memory_check_device(acpi_handle handle)
 {
 	unsigned long long current_status;
 
 	/* Get device present/absent information from the _STA */
-	if (ACPI_FAILURE(acpi_evaluate_integer(mem_device->device->handle, "_STA",
+	if (ACPI_FAILURE(acpi_evaluate_integer(handle, "_STA",
 					       NULL, &current_status)))
 		return -ENODEV;
 	/*
@@ -220,148 +177,46 @@ static int acpi_memory_check_device(struct acpi_memory_device *mem_device)
 	return 0;
 }
 
-static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
-{
-	int result, num_enabled = 0;
-	struct acpi_memory_info *info;
-	int node;
-
-	node = acpi_get_node(mem_device->device->handle);
-	/*
-	 * Tell the VM there is more memory here...
-	 * Note: Assume that this function returns zero on success
-	 * We don't have memory-hot-add rollback function,now.
-	 * (i.e. memory-hot-remove function)
-	 */
-	list_for_each_entry(info, &mem_device->res_list, list) {
-		if (info->enabled) { /* just sanity check...*/
-			num_enabled++;
-			continue;
-		}
-		/*
-		 * If the memory block size is zero, please ignore it.
-		 * Don't try to do the following memory hotplug flowchart.
-		 */
-		if (!info->length)
-			continue;
-		if (node < 0)
-			node = memory_add_physaddr_to_nid(info->start_addr);
-
-		result = add_memory(node, info->start_addr, info->length);
-
-		/*
-		 * If the memory block has been used by the kernel, add_memory()
-		 * returns -EEXIST. If add_memory() returns the other error, it
-		 * means that this memory block is not used by the kernel.
-		 */
-		if (result && result != -EEXIST) {
-			info->failed = 1;
-			continue;
-		}
-
-		if (!result)
-			info->enabled = 1;
-		/*
-		 * Add num_enable even if add_memory() returns -EEXIST, so the
-		 * device is bound to this driver.
-		 */
-		num_enabled++;
-	}
-	if (!num_enabled) {
-		dev_err(&mem_device->device->dev, "add_memory failed\n");
-		mem_device->state = MEMORY_INVALID_STATE;
-		return -EINVAL;
-	}
-	/*
-	 * Sometimes the memory device will contain several memory blocks.
-	 * When one memory block is hot-added to the system memory, it will
-	 * be regarded as a success.
-	 * Otherwise if the last memory block can't be hot-added to the system
-	 * memory, it will be failure and the memory device can't be bound with
-	 * driver.
-	 */
-	return 0;
-}
-
-static int acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
-{
-	int result = 0;
-	struct acpi_memory_info *info, *n;
-
-	list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
-		if (info->failed)
-			/* The kernel does not use this memory block */
-			continue;
-
-		if (!info->enabled)
-			/*
-			 * The kernel uses this memory block, but it may be not
-			 * managed by us.
-			 */
-			return -EBUSY;
-
-		result = remove_memory(info->start_addr, info->length);
-		if (result)
-			return result;
-
-		list_del(&info->list);
-		kfree(info);
-	}
-
-	return result;
-}
-
 static void acpi_memory_device_notify(acpi_handle handle, u32 event, void *data)
 {
 	struct acpi_memory_device *mem_device;
 	struct acpi_device *device;
-	struct acpi_eject_event *ej_event = NULL;
+	struct shp_request *shp_req;
+	enum shp_operation shp_op;
 	u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */
 
 	switch (event) {
 	case ACPI_NOTIFY_BUS_CHECK:
-		ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-				  "\nReceived BUS CHECK notification for device\n"));
 		/* Fall Through */
 	case ACPI_NOTIFY_DEVICE_CHECK:
-		if (event == ACPI_NOTIFY_DEVICE_CHECK)
-			ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-					  "\nReceived DEVICE CHECK notification for device\n"));
-		if (acpi_memory_get_device(handle, &mem_device)) {
-			acpi_handle_err(handle, "Cannot find driver data\n");
-			break;
+		if (acpi_memory_check_device(handle)) {
+			acpi_handle_err(handle, "Device not enabled\n");
+			goto err;
+		}
+
+		if (!acpi_bus_get_device(handle, &device)) {
+			acpi_handle_err(handle, "Device added already\n");
+			goto err;
 		}
 
-		ost_code = ACPI_OST_SC_SUCCESS;
+		shp_op = SHP_HOTPLUG_ADD;
 		break;
 
 	case ACPI_NOTIFY_EJECT_REQUEST:
-		ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-				  "\nReceived EJECT REQUEST notification for device\n"));
-
 		if (acpi_bus_get_device(handle, &device)) {
 			acpi_handle_err(handle, "Device doesn't exist\n");
-			break;
+			goto err;
 		}
+
 		mem_device = acpi_driver_data(device);
 		if (!mem_device) {
 			acpi_handle_err(handle, "Driver Data is NULL\n");
-			break;
+			goto err;
 		}
 
-		ej_event = kmalloc(sizeof(*ej_event), GFP_KERNEL);
-		if (!ej_event) {
-			pr_err(PREFIX "No memory, dropping EJECT\n");
-			break;
-		}
-
-		ej_event->handle = handle;
-		ej_event->event = ACPI_NOTIFY_EJECT_REQUEST;
-		acpi_os_hotplug_execute(acpi_bus_hot_remove_device,
-					(void *)ej_event);
+		shp_op = SHP_HOTPLUG_DEL;
+		break;
 
-		/* eject is performed asynchronously */
-		return;
 	default:
 		ACPI_DEBUG_PRINT((ACPI_DB_INFO,
 				  "Unsupported event [0x%x]\n", event));
@@ -370,7 +225,25 @@ static void acpi_memory_device_notify(acpi_handle handle, u32 event, void *data)
 		return;
 	}
 
-	/* Inform firmware that the hotplug operation has completed */
+	shp_req = shp_alloc_request(shp_op);
+	if (!shp_req) {
+		acpi_handle_err(handle, "No memory to request hotplug\n");
+		goto err;
+	}
+
+	shp_req->handle = (void *)handle;
+	shp_req->event = event;
+
+	if (shp_submit_req(shp_req)) {
+		acpi_handle_err(handle, "Failed to request hotplug\n");
+		kfree(shp_req);
+		goto err;
+	}
+
+	return;
+
+err:
+	/* Inform firmware that the hotplug operation completed w/ error */
 	(void) acpi_evaluate_hotplug_ost(handle, event, ost_code, NULL);
 	return;
 }
@@ -414,38 +287,72 @@ static int acpi_memory_device_add(struct acpi_device *device)
 	mem_device->state = MEMORY_POWER_ON_STATE;
 
 	pr_debug("%s\n", acpi_device_name(device));
-
-	if (!acpi_memory_check_device(mem_device)) {
-		/* call add_memory func */
-		result = acpi_memory_enable_device(mem_device);
-		if (result) {
-			dev_err(&device->dev,
-				"Error in acpi_memory_enable_device\n");
-			acpi_memory_device_free(mem_device);
-		}
-	}
 	return result;
 }
 
 static int acpi_memory_device_remove(struct acpi_device *device, int type)
 {
 	struct acpi_memory_device *mem_device = NULL;
-	int result;
+	struct acpi_memory_info *info, *n;
 
 	if (!device || !acpi_driver_data(device))
 		return -EINVAL;
 
 	mem_device = acpi_driver_data(device);
 
-	result = acpi_memory_remove_memory(mem_device);
-	if (result)
-		return result;
+	/* remove the memory_info list of this mem_device */
+	list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
+		list_del(&info->list);
+		kfree(info);
+	}
 
 	acpi_memory_device_free(mem_device);
 
 	return 0;
 }
 
+static int acpi_memory_device_resource(struct acpi_device *device,
+				struct shp_request *shp_req)
+{
+	struct acpi_memory_device *mem_device = NULL;
+	struct acpi_memory_info *info, *n;
+	struct shp_device *shp_dev;
+	int node;
+
+	mem_device = acpi_driver_data(device);
+	if (!mem_device) {
+		dev_err(&device->dev, "Invalid device\n");
+		return -EINVAL;
+	}
+
+	node = acpi_get_node(mem_device->device->handle);
+
+	/*
+	 * Set resource info of the device
+	 */
+	list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
+
+		if (!info->length)
+			continue;
+
+		shp_dev = kzalloc(sizeof(*shp_dev), GFP_KERNEL);
+		if (!shp_dev) {
+			dev_err(&device->dev, "Failed to allocate shp_dev\n");
+			return -EINVAL;
+		}
+
+		shp_dev->device = &device->dev;
+		shp_dev->class = SHP_CLS_MEMORY;
+		shp_dev->info.mem.node = node;
+		shp_dev->info.mem.start_addr = info->start_addr;
+		shp_dev->info.mem.length = info->length;
+
+		shp_add_dev_info(shp_req, shp_dev);
+	}
+
+	return 0;
+}
+
 /*
  * Helper function to check for memory device
  */

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 10/12] ACPI: Update container driver for hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (8 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 09/12] ACPI: Update memory " Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 11/12] cpu: Update sysfs cpu/online " Toshi Kani
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Changed container_notify_cb() to request a hotplug operation by
calling shp_submit_req().  It no longer initiates hot-add by calling
acpi_bus_add().  Also, it no longer sets device->flags.eject_pending
and generates KOBJ_OFFLINE event for hot-delete.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/container.c |   95 ++++++++++++++++++----------------------------
 1 file changed, 37 insertions(+), 58 deletions(-)

diff --git a/drivers/acpi/container.c b/drivers/acpi/container.c
index 811910b..89af4fc 100644
--- a/drivers/acpi/container.c
+++ b/drivers/acpi/container.c
@@ -35,6 +35,7 @@
 #include <acpi/acpi_bus.h>
 #include <acpi/acpi_drivers.h>
 #include <acpi/container.h>
+#include <acpi/sys_hotplug.h>
 
 #define PREFIX "ACPI: "
 
@@ -135,77 +136,37 @@ static int acpi_container_remove(struct acpi_device *device, int type)
 	return status;
 }
 
-static int container_device_add(struct acpi_device **device, acpi_handle handle)
-{
-	acpi_handle phandle;
-	struct acpi_device *pdev;
-	int result;
-
-
-	if (acpi_get_parent(handle, &phandle)) {
-		return -ENODEV;
-	}
-
-	if (acpi_bus_get_device(phandle, &pdev)) {
-		return -ENODEV;
-	}
-
-	if (acpi_bus_add(device, pdev, handle, ACPI_BUS_TYPE_DEVICE)) {
-		return -ENODEV;
-	}
-
-	result = acpi_bus_start(*device);
-
-	return result;
-}
-
-static void container_notify_cb(acpi_handle handle, u32 type, void *context)
+static void container_notify_cb(acpi_handle handle, u32 event, void *context)
 {
 	struct acpi_device *device = NULL;
-	int result;
-	int present;
-	acpi_status status;
+	struct shp_request *shp_req;
+	enum shp_operation shp_op;
 	u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */
 
-	switch (type) {
+	switch (event) {
 	case ACPI_NOTIFY_BUS_CHECK:
 		/* Fall through */
 	case ACPI_NOTIFY_DEVICE_CHECK:
-		pr_debug("Container driver received %s event\n",
-		       (type == ACPI_NOTIFY_BUS_CHECK) ?
-		       "ACPI_NOTIFY_BUS_CHECK" : "ACPI_NOTIFY_DEVICE_CHECK");
-
-		present = is_device_present(handle);
-		status = acpi_bus_get_device(handle, &device);
-		if (!present) {
-			if (ACPI_SUCCESS(status)) {
-				/* device exist and this is a remove request */
-				device->flags.eject_pending = 1;
-				kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
-				return;
-			}
-			break;
+		if (!is_device_present(handle)) {
+			acpi_handle_err(handle, "Device not enabled\n");
+			goto err;
 		}
 
-		if (!ACPI_FAILURE(status) || device)
-			break;
-
-		result = container_device_add(&device, handle);
-		if (result) {
-			acpi_handle_warn(handle, "Failed to add container\n");
-			break;
+		if (!acpi_bus_get_device(handle, &device)) {
+			acpi_handle_err(handle, "Device added already\n");
+			goto err;
 		}
 
-		kobject_uevent(&device->dev.kobj, KOBJ_ONLINE);
-		ost_code = ACPI_OST_SC_SUCCESS;
+		shp_op = SHP_HOTPLUG_ADD;
 		break;
 
 	case ACPI_NOTIFY_EJECT_REQUEST:
-		if (!acpi_bus_get_device(handle, &device) && device) {
-			device->flags.eject_pending = 1;
-			kobject_uevent(&device->dev.kobj, KOBJ_OFFLINE);
-			return;
+		if (acpi_bus_get_device(handle, &device)) {
+			acpi_handle_err(handle, "Device not added yet\n");
+			goto err;
 		}
+
+		shp_op = SHP_HOTPLUG_DEL;
 		break;
 
 	default:
@@ -213,8 +174,26 @@ static void container_notify_cb(acpi_handle handle, u32 type, void *context)
 		return;
 	}
 
-	/* Inform firmware that the hotplug operation has completed */
-	(void) acpi_evaluate_hotplug_ost(handle, type, ost_code, NULL);
+	shp_req = shp_alloc_request(shp_op);
+	if (!shp_req) {
+		acpi_handle_err(handle, "No memory to request hotplug\n");
+		goto err;
+	}
+
+	shp_req->handle = (void *)handle;
+	shp_req->event = event;
+
+	if (shp_submit_req(shp_req)) {
+		acpi_handle_err(handle, "Failed to request hotplug\n");
+		kfree(shp_req);
+		goto err;
+	}
+
+	return;
+
+err:
+	/* Inform firmware that the hotplug operation completed w/ error */
+	(void) acpi_evaluate_hotplug_ost(handle, event, ost_code, NULL);
 	return;
 }
 

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 11/12] cpu: Update sysfs cpu/online for hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (9 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 10/12] ACPI: Update container " Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-10 23:40 ` [RFC PATCH v2 12/12] ACPI: Update sysfs eject " Toshi Kani
  2013-01-17  0:50 ` [RFC PATCH v2 00/12] System device hot-plug framework Rafael J. Wysocki
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Changed store_online() to request a cpu online or offline
operation by calling shp_submit_req().  It sets a target cpu
device information with shp_add_dev_info() for the request.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/base/cpu.c |   40 ++++++++++++++++++++++++++++------------
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/drivers/base/cpu.c b/drivers/base/cpu.c
index 05534ad..cd1cbdc 100644
--- a/drivers/base/cpu.c
+++ b/drivers/base/cpu.c
@@ -41,27 +41,43 @@ static ssize_t __ref store_online(struct device *dev,
 				  const char *buf, size_t count)
 {
 	struct cpu *cpu = container_of(dev, struct cpu, dev);
-	ssize_t ret;
+	struct shp_request *shp_req;
+	struct shp_device *shp_dev;
+	enum shp_operation operation;
+	ssize_t ret = count;
 
-	cpu_hotplug_driver_lock();
 	switch (buf[0]) {
 	case '0':
-		ret = cpu_down(cpu->dev.id);
-		if (!ret)
-			kobject_uevent(&dev->kobj, KOBJ_OFFLINE);
+		operation = SHP_ONLINE_DEL;
 		break;
 	case '1':
-		ret = cpu_up(cpu->dev.id);
-		if (!ret)
-			kobject_uevent(&dev->kobj, KOBJ_ONLINE);
+		operation = SHP_ONLINE_ADD;
 		break;
 	default:
-		ret = -EINVAL;
+		return -EINVAL;
+	}
+
+	shp_req = shp_alloc_request(operation);
+	if (!shp_req)
+		return -ENOMEM;
+
+	shp_dev = kzalloc(sizeof(*shp_dev), GFP_KERNEL);
+	if (!shp_dev) {
+		kfree(shp_req);
+		return -ENOMEM;
+	}
+
+	shp_dev->device = dev;
+	shp_dev->class = SHP_CLS_CPU;
+	shp_dev->info.cpu.cpu_id = cpu->dev.id;
+	shp_add_dev_info(shp_req, shp_dev);
+
+	if (shp_submit_req(shp_req)) {
+		kfree(shp_dev);
+		kfree(shp_req);
+		return -EINVAL;
 	}
-	cpu_hotplug_driver_unlock();
 
-	if (ret >= 0)
-		ret = count;
 	return ret;
 }
 static DEVICE_ATTR(online, 0644, show_online, store_online);

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [RFC PATCH v2 12/12] ACPI: Update sysfs eject for hotplug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (10 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 11/12] cpu: Update sysfs cpu/online " Toshi Kani
@ 2013-01-10 23:40 ` Toshi Kani
  2013-01-17  0:50 ` [RFC PATCH v2 00/12] System device hot-plug framework Rafael J. Wysocki
  12 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-10 23:40 UTC (permalink / raw)
  To: rjw, lenb, gregkh, akpm
  Cc: linux-acpi, linux-kernel, linux-mm, linuxppc-dev, linux-s390,
	bhelgaas, isimatu.yasuaki, jiang.liu, wency, guohanjun, yinghai,
	srivatsa.bhat, Toshi Kani

Changed acpi_eject_store() to request a hot-delete operation by
calling shp_submit_req().  It no longer initiates a hot-delete
operation by calling acpi_bus_hot_remove_device().

Deleted acpi_bus_hot_remove_device() since it no longer has any
caller and should not be called for hot-delete.

Deleted eject_pending bit from acpi_device_flags since the ACPI
container driver no longer sets it for hot-delete, and sysfs
eject no longer checks it in acpi_bus_hot_remove_device().

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
---
 drivers/acpi/scan.c     |  122 ++++++++---------------------------------------
 include/acpi/acpi_bus.h |    4 --
 2 files changed, 23 insertions(+), 103 deletions(-)

diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
index c88be6c..5e47b49 100644
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -13,6 +13,7 @@
 #include <linux/nls.h>
 
 #include <acpi/acpi_drivers.h>
+#include <acpi/sys_hotplug.h>
 
 #include "internal.h"
 
@@ -105,85 +106,6 @@ acpi_device_modalias_show(struct device *dev, struct device_attribute *attr, cha
 }
 static DEVICE_ATTR(modalias, 0444, acpi_device_modalias_show, NULL);
 
-/**
- * acpi_bus_hot_remove_device: hot-remove a device and its children
- * @context: struct acpi_eject_event pointer (freed in this func)
- *
- * Hot-remove a device and its children. This function frees up the
- * memory space passed by arg context, so that the caller may call
- * this function asynchronously through acpi_os_hotplug_execute().
- */
-void acpi_bus_hot_remove_device(void *context)
-{
-	struct acpi_eject_event *ej_event = (struct acpi_eject_event *) context;
-	struct acpi_device *device;
-	acpi_handle handle = ej_event->handle;
-	acpi_handle temp;
-	struct acpi_object_list arg_list;
-	union acpi_object arg;
-	acpi_status status = AE_OK;
-	u32 ost_code = ACPI_OST_SC_NON_SPECIFIC_FAILURE; /* default */
-
-	if (acpi_bus_get_device(handle, &device))
-		goto err_out;
-
-	if (!device)
-		goto err_out;
-
-	ACPI_DEBUG_PRINT((ACPI_DB_INFO,
-		"Hot-removing device %s...\n", dev_name(&device->dev)));
-
-	if (acpi_bus_trim(device, 1)) {
-		printk(KERN_ERR PREFIX
-				"Removing device failed\n");
-		goto err_out;
-	}
-
-	/* device has been freed */
-	device = NULL;
-
-	/* power off device */
-	status = acpi_evaluate_object(handle, "_PS3", NULL, NULL);
-	if (ACPI_FAILURE(status) && status != AE_NOT_FOUND)
-		printk(KERN_WARNING PREFIX
-				"Power-off device failed\n");
-
-	if (ACPI_SUCCESS(acpi_get_handle(handle, "_LCK", &temp))) {
-		arg_list.count = 1;
-		arg_list.pointer = &arg;
-		arg.type = ACPI_TYPE_INTEGER;
-		arg.integer.value = 0;
-		acpi_evaluate_object(handle, "_LCK", &arg_list, NULL);
-	}
-
-	arg_list.count = 1;
-	arg_list.pointer = &arg;
-	arg.type = ACPI_TYPE_INTEGER;
-	arg.integer.value = 1;
-
-	/*
-	 * TBD: _EJD support.
-	 */
-	status = acpi_evaluate_object(handle, "_EJ0", &arg_list, NULL);
-	if (ACPI_FAILURE(status)) {
-		if (status != AE_NOT_FOUND)
-			printk(KERN_WARNING PREFIX
-					"Eject device failed\n");
-		goto err_out;
-	}
-
-	kfree(context);
-	return;
-
-err_out:
-	/* Inform firmware the hot-remove operation has completed w/ error */
-	(void) acpi_evaluate_hotplug_ost(handle,
-				ej_event->event, ost_code, NULL);
-	kfree(context);
-	return;
-}
-EXPORT_SYMBOL(acpi_bus_hot_remove_device);
-
 static ssize_t
 acpi_eject_store(struct device *d, struct device_attribute *attr,
 		const char *buf, size_t count)
@@ -192,44 +114,44 @@ acpi_eject_store(struct device *d, struct device_attribute *attr,
 	acpi_status status;
 	acpi_object_type type = 0;
 	struct acpi_device *acpi_device = to_acpi_device(d);
-	struct acpi_eject_event *ej_event;
+	struct shp_request *shp_req;
 
 	if ((!count) || (buf[0] != '1')) {
 		return -EINVAL;
 	}
 #ifndef FORCE_EJECT
 	if (acpi_device->driver == NULL) {
-		ret = -ENODEV;
-		goto err;
+		return -ENODEV;
 	}
 #endif
 	status = acpi_get_type(acpi_device->handle, &type);
 	if (ACPI_FAILURE(status) || (!acpi_device->flags.ejectable)) {
-		ret = -ENODEV;
-		goto err;
+		return -ENODEV;
 	}
 
-	ej_event = kmalloc(sizeof(*ej_event), GFP_KERNEL);
-	if (!ej_event) {
-		ret = -ENOMEM;
+	shp_req = shp_alloc_request(SHP_HOTPLUG_DEL);
+	if (!shp_req)
+		return -ENOMEM;
+
+	shp_req->handle = (void *) acpi_device->handle;
+
+	/* event originated from user */
+	shp_req->event = ACPI_OST_EC_OSPM_EJECT;
+	(void) acpi_evaluate_hotplug_ost(shp_req->handle,
+			shp_req->event, ACPI_OST_SC_EJECT_IN_PROGRESS, NULL);
+
+	if (shp_submit_req(shp_req)) {
+		kfree(shp_req);
 		goto err;
 	}
 
-	ej_event->handle = acpi_device->handle;
-	if (acpi_device->flags.eject_pending) {
-		/* event originated from ACPI eject notification */
-		ej_event->event = ACPI_NOTIFY_EJECT_REQUEST;
-		acpi_device->flags.eject_pending = 0;
-	} else {
-		/* event originated from user */
-		ej_event->event = ACPI_OST_EC_OSPM_EJECT;
-		(void) acpi_evaluate_hotplug_ost(ej_event->handle,
-			ej_event->event, ACPI_OST_SC_EJECT_IN_PROGRESS, NULL);
-	}
+	return ret;
 
-	acpi_os_hotplug_execute(acpi_bus_hot_remove_device, (void *)ej_event);
 err:
-	return ret;
+	/* Inform firmware that the hotplug operation completed w/ error */
+	(void) acpi_evaluate_hotplug_ost(shp_req->handle,
+			shp_req->event, ACPI_OST_SC_NON_SPECIFIC_FAILURE, NULL);
+	return -EINVAL;
 }
 
 static DEVICE_ATTR(eject, 0200, NULL, acpi_eject_store);
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index 6bf002e..ccbfef3 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -151,8 +151,7 @@ struct acpi_device_flags {
 	u32 suprise_removal_ok:1;
 	u32 power_manageable:1;
 	u32 performance_manageable:1;
-	u32 eject_pending:1;
-	u32 reserved:24;
+	u32 reserved:25;
 };
 
 /* File System */
@@ -362,7 +361,6 @@ int acpi_bus_register_driver(struct acpi_driver *driver);
 void acpi_bus_unregister_driver(struct acpi_driver *driver);
 int acpi_bus_add(struct acpi_device **child, struct acpi_device *parent,
 		 acpi_handle handle, int type);
-void acpi_bus_hot_remove_device(void *context);
 int acpi_bus_trim(struct acpi_device *start, int rmdevice);
 int acpi_bus_start(struct acpi_device *device);
 acpi_status acpi_bus_get_ejd(acpi_handle handle, acpi_handle * ejd);

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
@ 2013-01-11 21:23   ` Rafael J. Wysocki
  2013-01-14 15:33     ` Toshi Kani
  2013-01-30  4:53   ` Greg KH
  2013-01-30  4:58   ` Greg KH
  2 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-01-11 21:23 UTC (permalink / raw)
  To: Toshi Kani
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thursday, January 10, 2013 04:40:19 PM Toshi Kani wrote:
> Added include/linux/sys_hotplug.h, which defines the system device
> hotplug framework interfaces used by the framework itself and
> handlers.
> 
> The order values define the calling sequence of handlers.  For add
> execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> CPU so that threads on new CPUs can start using their local memory.
> The ordering of the delete execute is symmetric to the add execute.
> 
> struct shp_request defines a hot-plug request information.  The
> device resource information is managed with a list so that a single
> request may target to multiple devices.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  include/linux/sys_hotplug.h |  181 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 181 insertions(+)
>  create mode 100644 include/linux/sys_hotplug.h
> 
> diff --git a/include/linux/sys_hotplug.h b/include/linux/sys_hotplug.h
> new file mode 100644
> index 0000000..86674dd
> --- /dev/null
> +++ b/include/linux/sys_hotplug.h
> @@ -0,0 +1,181 @@
> +/*
> + * sys_hotplug.h - System device hot-plug framework
> + *
> + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> + *	Toshi Kani <toshi.kani@hp.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef _LINUX_SYS_HOTPLUG_H
> +#define _LINUX_SYS_HOTPLUG_H
> +
> +#include <linux/list.h>
> +#include <linux/device.h>
> +
> +/*
> + * System device hot-plug operation proceeds in the following order.
> + *   Validate phase -> Execute phase -> Commit phase
> + *
> + * The order values below define the calling sequence of platform
> + * neutral handlers for each phase in ascending order.  The order
> + * values of firmware-specific handlers are defined in sys_hotplug.h
> + * under firmware specific directories.
> + */
> +
> +/* All order values must be smaller than this value */
> +#define SHP_ORDER_MAX				0xffffff
> +
> +/* Add Validate order values */
> +
> +/* Add Execute order values */
> +#define SHP_MEM_ADD_EXECUTE_ORDER		100
> +#define SHP_CPU_ADD_EXECUTE_ORDER		110
> +
> +/* Add Commit order values */
> +
> +/* Delete Validate order values */
> +#define SHP_CPU_DEL_VALIDATE_ORDER		100
> +#define SHP_MEM_DEL_VALIDATE_ORDER		110
> +
> +/* Delete Execute order values */
> +#define SHP_CPU_DEL_EXECUTE_ORDER		10
> +#define SHP_MEM_DEL_EXECUTE_ORDER		20
> +
> +/* Delete Commit order values */
> +
> +/*
> + * Hot-plug request types
> + */
> +#define SHP_REQ_ADD		0x000000
> +#define SHP_REQ_DELETE		0x000001
> +#define SHP_REQ_MASK		0x0000ff
> +
> +/*
> + * Hot-plug phase types
> + */
> +#define SHP_PH_VALIDATE		0x000000
> +#define SHP_PH_EXECUTE		0x000100
> +#define SHP_PH_COMMIT		0x000200
> +#define SHP_PH_MASK		0x00ff00
> +
> +/*
> + * Hot-plug operation types
> + */
> +#define SHP_OP_HOTPLUG		0x000000
> +#define SHP_OP_ONLINE		0x010000
> +#define SHP_OP_MASK		0xff0000
> +
> +/*
> + * Hot-plug phases
> + */
> +enum shp_phase {
> +	SHP_ADD_VALIDATE	= (SHP_REQ_ADD|SHP_PH_VALIDATE),
> +	SHP_ADD_EXECUTE		= (SHP_REQ_ADD|SHP_PH_EXECUTE),
> +	SHP_ADD_COMMIT		= (SHP_REQ_ADD|SHP_PH_COMMIT),
> +	SHP_DEL_VALIDATE	= (SHP_REQ_DELETE|SHP_PH_VALIDATE),
> +	SHP_DEL_EXECUTE		= (SHP_REQ_DELETE|SHP_PH_EXECUTE),
> +	SHP_DEL_COMMIT		= (SHP_REQ_DELETE|SHP_PH_COMMIT)
> +};
> +
> +/*
> + * Hot-plug operations
> + */
> +enum shp_operation {
> +	SHP_HOTPLUG_ADD		= (SHP_OP_HOTPLUG|SHP_REQ_ADD),
> +	SHP_HOTPLUG_DEL		= (SHP_OP_HOTPLUG|SHP_REQ_DELETE),
> +	SHP_ONLINE_ADD		= (SHP_OP_ONLINE|SHP_REQ_ADD),
> +	SHP_ONLINE_DEL		= (SHP_OP_ONLINE|SHP_REQ_DELETE)
> +};
> +
> +/*
> + * Hot-plug device classes
> + */
> +enum shp_class {
> +	SHP_CLS_INVALID		= 0,
> +	SHP_CLS_CPU		= 1,
> +	SHP_CLS_MEMORY		= 2,
> +	SHP_CLS_HOSTBRIDGE	= 3,
> +	SHP_CLS_CONTAINER	= 4,
> +};
> +
> +/*
> + * Hot-plug device information
> + */
> +union shp_dev_info {
> +	struct shp_cpu {
> +		u32		cpu_id;
> +	} cpu;
> +
> +	struct shp_memory {
> +		int		node;
> +		u64		start_addr;
> +		u64		length;
> +	} mem;
> +
> +	struct shp_hostbridge {
> +	} hb;
> +
> +	struct shp_node {
> +	} node;
> +};
> +
> +struct shp_device {
> +	struct list_head	list;
> +	struct device		*device;
> +	enum shp_class		class;
> +	union shp_dev_info	info;
> +};
> +
> +/*
> + * Hot-plug request
> + */
> +struct shp_request {
> +	/* common info */
> +	enum shp_operation	operation;	/* operation */
> +
> +	/* hot-plug event info: only valid for hot-plug operations */
> +	void			*handle;	/* FW handle */

What's the role of handle here?


> +	u32			event;		/* FW event */
> +
> +	/* device resource info */
> +	struct list_head	dev_list;	/* shp_device list */
> +};
> +
> +/*
> + * Inline Utility Functions
> + */
> +static inline bool shp_is_hotplug_op(enum shp_operation operation)
> +{
> +	return (operation & SHP_OP_MASK) == SHP_OP_HOTPLUG;
> +}
> +
> +static inline bool shp_is_online_op(enum shp_operation operation)
> +{
> +	return (operation & SHP_OP_MASK) == SHP_OP_ONLINE;
> +}
> +
> +static inline bool shp_is_add_op(enum shp_operation operation)
> +{
> +	return (operation & SHP_REQ_MASK) == SHP_REQ_ADD;
> +}
> +
> +static inline bool shp_is_add_phase(enum shp_phase phase)
> +{
> +	return (phase & SHP_REQ_MASK) == SHP_REQ_ADD;
> +}
> +
> +/*
> + * Externs
> + */
> +typedef int (*shp_func)(struct shp_request *req, int rollback);
> +extern int shp_register_handler(enum shp_phase phase, shp_func func, u32 order);
> +extern int shp_unregister_handler(enum shp_phase phase, shp_func func);
> +extern int shp_submit_req(struct shp_request *req);
> +extern struct shp_request *shp_alloc_request(enum shp_operation operation);
> +extern void shp_add_dev_info(struct shp_request *shp_req,
> +		struct shp_device *shp_dev);
> +
> +#endif	/* _LINUX_SYS_HOTPLUG_H */
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-10 23:40 ` [RFC PATCH v2 02/12] ACPI: " Toshi Kani
@ 2013-01-11 21:25   ` Rafael J. Wysocki
  2013-01-14 15:53     ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-01-11 21:25 UTC (permalink / raw)
  To: Toshi Kani
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> device hotplug header and defines the order values of ACPI-specific
> handlers.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 48 insertions(+)
>  create mode 100644 include/acpi/sys_hotplug.h
> 
> diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> new file mode 100644
> index 0000000..ad80f61
> --- /dev/null
> +++ b/include/acpi/sys_hotplug.h
> @@ -0,0 +1,48 @@
> +/*
> + * sys_hotplug.h - ACPI System device hot-plug framework
> + *
> + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> + *	Toshi Kani <toshi.kani@hp.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef _ACPI_SYS_HOTPLUG_H
> +#define _ACPI_SYS_HOTPLUG_H
> +
> +#include <linux/list.h>
> +#include <linux/device.h>
> +#include <linux/sys_hotplug.h>
> +
> +/*
> + * System device hot-plug operation proceeds in the following order.
> + *   Validate phase -> Execute phase -> Commit phase
> + *
> + * The order values below define the calling sequence of ACPI-specific
> + * handlers for each phase in ascending order.  The order value of
> + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> + */
> +
> +/* Add Validate order values */
> +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> +
> +/* Add Execute order values */
> +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> +
> +/* Add Commit order values */
> +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> +
> +/* Delete Validate order values */
> +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> +
> +/* Delete Execute order values */
> +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> +
> +/* Delete Commit order values */
> +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> +
> +#endif	/* _ACPI_SYS_HOTPLUG_H */
> --

Why did you use the particular values above?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-11 21:23   ` Rafael J. Wysocki
@ 2013-01-14 15:33     ` Toshi Kani
  2013-01-14 18:48       ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-14 15:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Fri, 2013-01-11 at 22:23 +0100, Rafael J. Wysocki wrote:
> On Thursday, January 10, 2013 04:40:19 PM Toshi Kani wrote:
> > Added include/linux/sys_hotplug.h, which defines the system device
> > hotplug framework interfaces used by the framework itself and
> > handlers.
> > 
> > The order values define the calling sequence of handlers.  For add
> > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > CPU so that threads on new CPUs can start using their local memory.
> > The ordering of the delete execute is symmetric to the add execute.
> > 
> > struct shp_request defines a hot-plug request information.  The
> > device resource information is managed with a list so that a single
> > request may target to multiple devices.
> > 
 :
> > +
> > +struct shp_device {
> > +	struct list_head	list;
> > +	struct device		*device;
> > +	enum shp_class		class;
> > +	union shp_dev_info	info;
> > +};
> > +
> > +/*
> > + * Hot-plug request
> > + */
> > +struct shp_request {
> > +	/* common info */
> > +	enum shp_operation	operation;	/* operation */
> > +
> > +	/* hot-plug event info: only valid for hot-plug operations */
> > +	void			*handle;	/* FW handle */
> 
> What's the role of handle here?

On ACPI-based platforms, the handle keeps a notified ACPI handle when a
hot-plug request is made.  ACPI bus handlers, acpi_add_execute() /
acpi_del_execute(), then scans / trims ACPI devices from the handle.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-11 21:25   ` Rafael J. Wysocki
@ 2013-01-14 15:53     ` Toshi Kani
  2013-01-14 18:47       ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-14 15:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > device hotplug header and defines the order values of ACPI-specific
> > handlers.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 48 insertions(+)
> >  create mode 100644 include/acpi/sys_hotplug.h
> > 
> > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > new file mode 100644
> > index 0000000..ad80f61
> > --- /dev/null
> > +++ b/include/acpi/sys_hotplug.h
> > @@ -0,0 +1,48 @@
> > +/*
> > + * sys_hotplug.h - ACPI System device hot-plug framework
> > + *
> > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > + *	Toshi Kani <toshi.kani@hp.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#ifndef _ACPI_SYS_HOTPLUG_H
> > +#define _ACPI_SYS_HOTPLUG_H
> > +
> > +#include <linux/list.h>
> > +#include <linux/device.h>
> > +#include <linux/sys_hotplug.h>
> > +
> > +/*
> > + * System device hot-plug operation proceeds in the following order.
> > + *   Validate phase -> Execute phase -> Commit phase
> > + *
> > + * The order values below define the calling sequence of ACPI-specific
> > + * handlers for each phase in ascending order.  The order value of
> > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > + */
> > +
> > +/* Add Validate order values */
> > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > +
> > +/* Add Execute order values */
> > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > +
> > +/* Add Commit order values */
> > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > +
> > +/* Delete Validate order values */
> > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > +
> > +/* Delete Execute order values */
> > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > +
> > +/* Delete Commit order values */
> > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > +
> > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > --
> 
> Why did you use the particular values above?

The ordering values above are used to define the relative order among
handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
potentially be 21 since it is still larger than 20 for
SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
so that more platform-neutral handlers can be added in between 20 and
100 in future.

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-14 18:47       ` Rafael J. Wysocki
@ 2013-01-14 18:42         ` Toshi Kani
  2013-01-14 19:07           ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-14 18:42 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > device hotplug header and defines the order values of ACPI-specific
> > > > handlers.
> > > > 
> > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > ---
> > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 48 insertions(+)
> > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > 
> > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > new file mode 100644
> > > > index 0000000..ad80f61
> > > > --- /dev/null
> > > > +++ b/include/acpi/sys_hotplug.h
> > > > @@ -0,0 +1,48 @@
> > > > +/*
> > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > + *
> > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > + *
> > > > + * This program is free software; you can redistribute it and/or modify
> > > > + * it under the terms of the GNU General Public License version 2 as
> > > > + * published by the Free Software Foundation.
> > > > + */
> > > > +
> > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > +
> > > > +#include <linux/list.h>
> > > > +#include <linux/device.h>
> > > > +#include <linux/sys_hotplug.h>
> > > > +
> > > > +/*
> > > > + * System device hot-plug operation proceeds in the following order.
> > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > + *
> > > > + * The order values below define the calling sequence of ACPI-specific
> > > > + * handlers for each phase in ascending order.  The order value of
> > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > + */
> > > > +
> > > > +/* Add Validate order values */
> > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > +
> > > > +/* Add Execute order values */
> > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > +
> > > > +/* Add Commit order values */
> > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > +
> > > > +/* Delete Validate order values */
> > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > +
> > > > +/* Delete Execute order values */
> > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > +
> > > > +/* Delete Commit order values */
> > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > +
> > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > --
> > > 
> > > Why did you use the particular values above?
> > 
> > The ordering values above are used to define the relative order among
> > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > potentially be 21 since it is still larger than 20 for
> > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > so that more platform-neutral handlers can be added in between 20 and
> > 100 in future.
> 
> I thought so, but I don't think it's a good idea to add gaps like this.

OK, I will use an equal gap of 10 for all values.  So, the 100 in the
above example will be changed to 30.   

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-14 15:53     ` Toshi Kani
@ 2013-01-14 18:47       ` Rafael J. Wysocki
  2013-01-14 18:42         ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-01-14 18:47 UTC (permalink / raw)
  To: Toshi Kani
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > device hotplug header and defines the order values of ACPI-specific
> > > handlers.
> > > 
> > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > ---
> > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > >  1 file changed, 48 insertions(+)
> > >  create mode 100644 include/acpi/sys_hotplug.h
> > > 
> > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > new file mode 100644
> > > index 0000000..ad80f61
> > > --- /dev/null
> > > +++ b/include/acpi/sys_hotplug.h
> > > @@ -0,0 +1,48 @@
> > > +/*
> > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > + *
> > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > + *	Toshi Kani <toshi.kani@hp.com>
> > > + *
> > > + * This program is free software; you can redistribute it and/or modify
> > > + * it under the terms of the GNU General Public License version 2 as
> > > + * published by the Free Software Foundation.
> > > + */
> > > +
> > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > +#define _ACPI_SYS_HOTPLUG_H
> > > +
> > > +#include <linux/list.h>
> > > +#include <linux/device.h>
> > > +#include <linux/sys_hotplug.h>
> > > +
> > > +/*
> > > + * System device hot-plug operation proceeds in the following order.
> > > + *   Validate phase -> Execute phase -> Commit phase
> > > + *
> > > + * The order values below define the calling sequence of ACPI-specific
> > > + * handlers for each phase in ascending order.  The order value of
> > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > + */
> > > +
> > > +/* Add Validate order values */
> > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > +
> > > +/* Add Execute order values */
> > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > +
> > > +/* Add Commit order values */
> > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > +
> > > +/* Delete Validate order values */
> > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > +
> > > +/* Delete Execute order values */
> > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > +
> > > +/* Delete Commit order values */
> > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > +
> > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > --
> > 
> > Why did you use the particular values above?
> 
> The ordering values above are used to define the relative order among
> handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> potentially be 21 since it is still larger than 20 for
> SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> so that more platform-neutral handlers can be added in between 20 and
> 100 in future.

I thought so, but I don't think it's a good idea to add gaps like this.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-14 15:33     ` Toshi Kani
@ 2013-01-14 18:48       ` Rafael J. Wysocki
  2013-01-14 19:02         ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-01-14 18:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, January 14, 2013 08:33:48 AM Toshi Kani wrote:
> On Fri, 2013-01-11 at 22:23 +0100, Rafael J. Wysocki wrote:
> > On Thursday, January 10, 2013 04:40:19 PM Toshi Kani wrote:
> > > Added include/linux/sys_hotplug.h, which defines the system device
> > > hotplug framework interfaces used by the framework itself and
> > > handlers.
> > > 
> > > The order values define the calling sequence of handlers.  For add
> > > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > > CPU so that threads on new CPUs can start using their local memory.
> > > The ordering of the delete execute is symmetric to the add execute.
> > > 
> > > struct shp_request defines a hot-plug request information.  The
> > > device resource information is managed with a list so that a single
> > > request may target to multiple devices.
> > > 
>  :
> > > +
> > > +struct shp_device {
> > > +	struct list_head	list;
> > > +	struct device		*device;
> > > +	enum shp_class		class;
> > > +	union shp_dev_info	info;
> > > +};
> > > +
> > > +/*
> > > + * Hot-plug request
> > > + */
> > > +struct shp_request {
> > > +	/* common info */
> > > +	enum shp_operation	operation;	/* operation */
> > > +
> > > +	/* hot-plug event info: only valid for hot-plug operations */
> > > +	void			*handle;	/* FW handle */
> > 
> > What's the role of handle here?
> 
> On ACPI-based platforms, the handle keeps a notified ACPI handle when a
> hot-plug request is made.  ACPI bus handlers, acpi_add_execute() /
> acpi_del_execute(), then scans / trims ACPI devices from the handle.

OK, so this is ACPI-specific and should be described as such.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-14 18:48       ` Rafael J. Wysocki
@ 2013-01-14 19:02         ` Toshi Kani
  2013-01-30  4:48           ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-14 19:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-01-14 at 19:48 +0100, Rafael J. Wysocki wrote:
> On Monday, January 14, 2013 08:33:48 AM Toshi Kani wrote:
> > On Fri, 2013-01-11 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > On Thursday, January 10, 2013 04:40:19 PM Toshi Kani wrote:
> > > > Added include/linux/sys_hotplug.h, which defines the system device
> > > > hotplug framework interfaces used by the framework itself and
> > > > handlers.
> > > > 
> > > > The order values define the calling sequence of handlers.  For add
> > > > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > > > CPU so that threads on new CPUs can start using their local memory.
> > > > The ordering of the delete execute is symmetric to the add execute.
> > > > 
> > > > struct shp_request defines a hot-plug request information.  The
> > > > device resource information is managed with a list so that a single
> > > > request may target to multiple devices.
> > > > 
> >  :
> > > > +
> > > > +struct shp_device {
> > > > +	struct list_head	list;
> > > > +	struct device		*device;
> > > > +	enum shp_class		class;
> > > > +	union shp_dev_info	info;
> > > > +};
> > > > +
> > > > +/*
> > > > + * Hot-plug request
> > > > + */
> > > > +struct shp_request {
> > > > +	/* common info */
> > > > +	enum shp_operation	operation;	/* operation */
> > > > +
> > > > +	/* hot-plug event info: only valid for hot-plug operations */
> > > > +	void			*handle;	/* FW handle */
> > > 
> > > What's the role of handle here?
> > 
> > On ACPI-based platforms, the handle keeps a notified ACPI handle when a
> > hot-plug request is made.  ACPI bus handlers, acpi_add_execute() /
> > acpi_del_execute(), then scans / trims ACPI devices from the handle.
> 
> OK, so this is ACPI-specific and should be described as such.

Other FW interface I know is parisc, which has mod_index (module index)
to identify a unique object, just like what ACPI handle does.  The
handle can keep the mod_index as an opaque value as well.  But as you
said, I do not know if the handle works for all other FWs.  So, I will
add descriptions, such that the hot-plug event info is modeled after
ACPI and may need to be revisited when supporting other FW.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-14 18:42         ` Toshi Kani
@ 2013-01-14 19:07           ` Rafael J. Wysocki
  2013-01-14 19:21             ` Toshi Kani
  2013-01-14 19:21             ` Greg KH
  0 siblings, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-01-14 19:07 UTC (permalink / raw)
  To: Toshi Kani
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, January 14, 2013 11:42:09 AM Toshi Kani wrote:
> On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> > On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > > device hotplug header and defines the order values of ACPI-specific
> > > > > handlers.
> > > > > 
> > > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > > ---
> > > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > > >  1 file changed, 48 insertions(+)
> > > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > > 
> > > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > > new file mode 100644
> > > > > index 0000000..ad80f61
> > > > > --- /dev/null
> > > > > +++ b/include/acpi/sys_hotplug.h
> > > > > @@ -0,0 +1,48 @@
> > > > > +/*
> > > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > > + *
> > > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > > + *
> > > > > + * This program is free software; you can redistribute it and/or modify
> > > > > + * it under the terms of the GNU General Public License version 2 as
> > > > > + * published by the Free Software Foundation.
> > > > > + */
> > > > > +
> > > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > > +
> > > > > +#include <linux/list.h>
> > > > > +#include <linux/device.h>
> > > > > +#include <linux/sys_hotplug.h>
> > > > > +
> > > > > +/*
> > > > > + * System device hot-plug operation proceeds in the following order.
> > > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > > + *
> > > > > + * The order values below define the calling sequence of ACPI-specific
> > > > > + * handlers for each phase in ascending order.  The order value of
> > > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > > + */
> > > > > +
> > > > > +/* Add Validate order values */
> > > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > > +
> > > > > +/* Add Execute order values */
> > > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > > +
> > > > > +/* Add Commit order values */
> > > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > > +
> > > > > +/* Delete Validate order values */
> > > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > > +
> > > > > +/* Delete Execute order values */
> > > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > > +
> > > > > +/* Delete Commit order values */
> > > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > > +
> > > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > > --
> > > > 
> > > > Why did you use the particular values above?
> > > 
> > > The ordering values above are used to define the relative order among
> > > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > > potentially be 21 since it is still larger than 20 for
> > > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > > so that more platform-neutral handlers can be added in between 20 and
> > > 100 in future.
> > 
> > I thought so, but I don't think it's a good idea to add gaps like this.
> 
> OK, I will use an equal gap of 10 for all values.  So, the 100 in the
> above example will be changed to 30.  

I wonder why you want to have those gaps at all.

Anyway, this is just a small detail and it doesn't mean I don't have more
comments.  I just need some more time to get the big picture idea of how this
is supposed to work and perhaps Greg will have some remarks too.

Thanks,
Rafael



-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-14 19:07           ` Rafael J. Wysocki
@ 2013-01-14 19:21             ` Toshi Kani
  2013-01-30  4:51               ` Greg KH
  2013-01-14 19:21             ` Greg KH
  1 sibling, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-14 19:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-01-14 at 20:07 +0100, Rafael J. Wysocki wrote:
> On Monday, January 14, 2013 11:42:09 AM Toshi Kani wrote:
> > On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> > > On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > > > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > > > device hotplug header and defines the order values of ACPI-specific
> > > > > > handlers.
> > > > > > 
> > > > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > > > ---
> > > > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 48 insertions(+)
> > > > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > > > 
> > > > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > > > new file mode 100644
> > > > > > index 0000000..ad80f61
> > > > > > --- /dev/null
> > > > > > +++ b/include/acpi/sys_hotplug.h
> > > > > > @@ -0,0 +1,48 @@
> > > > > > +/*
> > > > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > > > + *
> > > > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > > > + *
> > > > > > + * This program is free software; you can redistribute it and/or modify
> > > > > > + * it under the terms of the GNU General Public License version 2 as
> > > > > > + * published by the Free Software Foundation.
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > > > +
> > > > > > +#include <linux/list.h>
> > > > > > +#include <linux/device.h>
> > > > > > +#include <linux/sys_hotplug.h>
> > > > > > +
> > > > > > +/*
> > > > > > + * System device hot-plug operation proceeds in the following order.
> > > > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > > > + *
> > > > > > + * The order values below define the calling sequence of ACPI-specific
> > > > > > + * handlers for each phase in ascending order.  The order value of
> > > > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > > > + */
> > > > > > +
> > > > > > +/* Add Validate order values */
> > > > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > > > +
> > > > > > +/* Add Execute order values */
> > > > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > > > +
> > > > > > +/* Add Commit order values */
> > > > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > > > +
> > > > > > +/* Delete Validate order values */
> > > > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > > > +
> > > > > > +/* Delete Execute order values */
> > > > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > > > +
> > > > > > +/* Delete Commit order values */
> > > > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > > > +
> > > > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > > > --
> > > > > 
> > > > > Why did you use the particular values above?
> > > > 
> > > > The ordering values above are used to define the relative order among
> > > > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > > > potentially be 21 since it is still larger than 20 for
> > > > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > > > so that more platform-neutral handlers can be added in between 20 and
> > > > 100 in future.
> > > 
> > > I thought so, but I don't think it's a good idea to add gaps like this.
> > 
> > OK, I will use an equal gap of 10 for all values.  So, the 100 in the
> > above example will be changed to 30.  
> 
> I wonder why you want to have those gaps at all.

Oh, I see.  I think some gap is helpful since it allows a new handler to
come between without recompiling other modules.  For instance, OEM
vendors may want to add their own handlers with loadable modules after
the kernel is distributed.

> Anyway, this is just a small detail and it doesn't mean I don't have more
> comments.  I just need some more time to get the big picture idea of how this
> is supposed to work and perhaps Greg will have some remarks too.

Yes, I am well-aware of that. :-)  Please let me know if you have any
questions.  I'd be happy to explain any details.

Thanks a lot for reviewing!
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-14 19:07           ` Rafael J. Wysocki
  2013-01-14 19:21             ` Toshi Kani
@ 2013-01-14 19:21             ` Greg KH
  2013-01-14 19:29               ` Toshi Kani
  1 sibling, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-01-14 19:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, Jan 14, 2013 at 08:07:35PM +0100, Rafael J. Wysocki wrote:
> On Monday, January 14, 2013 11:42:09 AM Toshi Kani wrote:
> > On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> > > On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > > > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > > > device hotplug header and defines the order values of ACPI-specific
> > > > > > handlers.
> > > > > > 
> > > > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > > > ---
> > > > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > >  1 file changed, 48 insertions(+)
> > > > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > > > 
> > > > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > > > new file mode 100644
> > > > > > index 0000000..ad80f61
> > > > > > --- /dev/null
> > > > > > +++ b/include/acpi/sys_hotplug.h
> > > > > > @@ -0,0 +1,48 @@
> > > > > > +/*
> > > > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > > > + *
> > > > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > > > + *
> > > > > > + * This program is free software; you can redistribute it and/or modify
> > > > > > + * it under the terms of the GNU General Public License version 2 as
> > > > > > + * published by the Free Software Foundation.
> > > > > > + */
> > > > > > +
> > > > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > > > +
> > > > > > +#include <linux/list.h>
> > > > > > +#include <linux/device.h>
> > > > > > +#include <linux/sys_hotplug.h>
> > > > > > +
> > > > > > +/*
> > > > > > + * System device hot-plug operation proceeds in the following order.
> > > > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > > > + *
> > > > > > + * The order values below define the calling sequence of ACPI-specific
> > > > > > + * handlers for each phase in ascending order.  The order value of
> > > > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > > > + */
> > > > > > +
> > > > > > +/* Add Validate order values */
> > > > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > > > +
> > > > > > +/* Add Execute order values */
> > > > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > > > +
> > > > > > +/* Add Commit order values */
> > > > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > > > +
> > > > > > +/* Delete Validate order values */
> > > > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > > > +
> > > > > > +/* Delete Execute order values */
> > > > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > > > +
> > > > > > +/* Delete Commit order values */
> > > > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > > > +
> > > > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > > > --
> > > > > 
> > > > > Why did you use the particular values above?
> > > > 
> > > > The ordering values above are used to define the relative order among
> > > > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > > > potentially be 21 since it is still larger than 20 for
> > > > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > > > so that more platform-neutral handlers can be added in between 20 and
> > > > 100 in future.
> > > 
> > > I thought so, but I don't think it's a good idea to add gaps like this.
> > 
> > OK, I will use an equal gap of 10 for all values.  So, the 100 in the
> > above example will be changed to 30.  
> 
> I wonder why you want to have those gaps at all.
> 
> Anyway, this is just a small detail and it doesn't mean I don't have more
> comments.  I just need some more time to get the big picture idea of how this
> is supposed to work and perhaps Greg will have some remarks too.

Yes, give me a few days to catch up on other patches before I get the
chance to review these.

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-14 19:21             ` Greg KH
@ 2013-01-14 19:29               ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-14 19:29 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-01-14 at 11:21 -0800, Greg KH wrote:
> On Mon, Jan 14, 2013 at 08:07:35PM +0100, Rafael J. Wysocki wrote:
> > On Monday, January 14, 2013 11:42:09 AM Toshi Kani wrote:
> > > On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> > > > On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > > > > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > > > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > > > > device hotplug header and defines the order values of ACPI-specific
> > > > > > > handlers.
> > > > > > > 
> > > > > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > > > > ---
> > > > > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 48 insertions(+)
> > > > > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > > > > 
> > > > > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > > > > new file mode 100644
> > > > > > > index 0000000..ad80f61
> > > > > > > --- /dev/null
> > > > > > > +++ b/include/acpi/sys_hotplug.h
> > > > > > > @@ -0,0 +1,48 @@
> > > > > > > +/*
> > > > > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > > > > + *
> > > > > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > > > > + *
> > > > > > > + * This program is free software; you can redistribute it and/or modify
> > > > > > > + * it under the terms of the GNU General Public License version 2 as
> > > > > > > + * published by the Free Software Foundation.
> > > > > > > + */
> > > > > > > +
> > > > > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > > > > +
> > > > > > > +#include <linux/list.h>
> > > > > > > +#include <linux/device.h>
> > > > > > > +#include <linux/sys_hotplug.h>
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * System device hot-plug operation proceeds in the following order.
> > > > > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > > > > + *
> > > > > > > + * The order values below define the calling sequence of ACPI-specific
> > > > > > > + * handlers for each phase in ascending order.  The order value of
> > > > > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > > > > + */
> > > > > > > +
> > > > > > > +/* Add Validate order values */
> > > > > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > > > > +
> > > > > > > +/* Add Execute order values */
> > > > > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > > > > +
> > > > > > > +/* Add Commit order values */
> > > > > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > > > > +
> > > > > > > +/* Delete Validate order values */
> > > > > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > > > > +
> > > > > > > +/* Delete Execute order values */
> > > > > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > > > > +
> > > > > > > +/* Delete Commit order values */
> > > > > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > > > > +
> > > > > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > > > > --
> > > > > > 
> > > > > > Why did you use the particular values above?
> > > > > 
> > > > > The ordering values above are used to define the relative order among
> > > > > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > > > > potentially be 21 since it is still larger than 20 for
> > > > > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > > > > so that more platform-neutral handlers can be added in between 20 and
> > > > > 100 in future.
> > > > 
> > > > I thought so, but I don't think it's a good idea to add gaps like this.
> > > 
> > > OK, I will use an equal gap of 10 for all values.  So, the 100 in the
> > > above example will be changed to 30.  
> > 
> > I wonder why you want to have those gaps at all.
> > 
> > Anyway, this is just a small detail and it doesn't mean I don't have more
> > comments.  I just need some more time to get the big picture idea of how this
> > is supposed to work and perhaps Greg will have some remarks too.
> 
> Yes, give me a few days to catch up on other patches before I get the
> chance to review these.

That's great!  Thanks Greg!
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 00/12] System device hot-plug framework
  2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
                   ` (11 preceding siblings ...)
  2013-01-10 23:40 ` [RFC PATCH v2 12/12] ACPI: Update sysfs eject " Toshi Kani
@ 2013-01-17  0:50 ` Rafael J. Wysocki
  2013-01-17 17:59   ` Toshi Kani
  12 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-01-17  0:50 UTC (permalink / raw)
  To: Toshi Kani
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thursday, January 10, 2013 04:40:18 PM Toshi Kani wrote:
> This patchset is a prototype of proposed system device hot-plug framework
> for design review.  Unlike other hot-plug environments, such as USB and
> PCI, there is no common framework for system device hot-plug [1].
> Therefore, this patchset is designed to provide a common framework for
> hot-plugging and online/offline operations of system devices, such as CPU,
> Memory and Node.  While this patchset only supports ACPI-based hot-plug
> operations, the framework itself is designed to be platform-neural and
> can support other FW architectures as necessary.
> 
> This patchset is based on Linus's tree (3.8-rc3).
> 
> I have seen a few stability issues with 3.8-rc3 in my testing and will
> look into their solutions.
> 
> [1] System device hot-plug frameworks for ppc and s390 are implemented
>     for specific platforms and products.
> 
> 
> Background: System Device Initialization
> ========================================
> System devices, such as CPU and memory, must be initialized during early
> boot sequence as they are the essential components to provide low-level
> services, ex. scheduling, memory allocation and interrupts, which are
> the foundations of the kernel services.  start_kernel() and kernel_init()
> manage the boot-up sequence to initialize system devices and low-level
> services in pre-defined order as shown below. 
> 
>   start_kernel()
>     boot_cpu_init()          // init cpu0
>     setup_arch()
>       efi_init()             // init EFI memory map
>       initmem_init()         // init NUMA
>       x86_init.paging.pagetable_init() // init page table
>       acpi_boot_init()       // parse ACPI MADT table
>         :
>   kernel_init()
>     kernel_init_freeable()
>       smp_init()             // init other CPUs
>         :
>       do_basic_setup()
>         driver_init()
>           cpu_dev_init()     // build system/cpu tree
>           memory_dev_init()  // build system/memory tree
>         do_initcalls()
>           acpi_init()        // build ACPI device tree
> 
> Note that drivers are initialized at the end of the boot sequence as they
> depend on the kernel services from system devices.  Hence, while system
> devices may be exposed to sysfs with their pseudo drivers, their
> initialization may not be fully integrated into the driver structures.  
> 
> Overview of the System Device Hot-plug Framework
> ================================================
> Similar to the boot-up sequence, the system device hot-plug framework
> provides a sequencer that calls all registered handlers in pre-defined
> order for hot-add and hot-delete of system devices.  It allows any modules
> initializing system devices in the boot-up sequence to participate in
> the hot-plug operations as well.  In high-level, there are two types of
> handlers, 1) FW-dependent (ex. ACPI) handlers that enumerate or eject
> system devices, and 2) system device (ex. CPU, Memory) management handlers
> that online or offline the enumerated system devices.  Online/offline
> operations are sub-set of hot-add/delete operations.  The ordering of the
> handlers are symmetric between hot-add (online) and hot-delete (offline)
> operations.
> 
>         hot-add    online
>            |    ^    :    ^
>   HW Enum/ |    |    :    :
>     Eject  |    |    :    :
>            |    |    :    :
>   Online/  |    |    |    |
>   Offline  |    |    |    |
>            V    |    V    |
>              hot-del   offline
> 
> The handlers may not call other handlers directly to exceed their role.
> Therefore, the role of the handlers in their modules remains consistent
> with their role at the boot-up sequence.  For instance, the ACPI module
> may not perform online or offline of system devices.
> 
> System Device Hot-plug Operation
> ================================
> 
> Serialized Startup
> ------------------
> The framework provides an interface (hp_submit_req) to request a hot-plug
> operation.  All requests are queued to and run on a single work queue.
> The framework assures that there is only a single hot-plug or online/
> offline operation running at a time.  A single request may however target
> to multiple devices.  This makes the execution context of handlers to be
> consistent with the boot-up sequence and enables code sharing.
> 
> Phased Execution
> ----------------
> The framework proceeds hot-plug and online/offline operations in the 
> following three phases.  The modules can register their handlers to each
> phase.  The framework also initiates a roll-back operation if any hander
> failed in the validate or execute phase.
> 
> 1) Validate Phase - Handlers validate if they support a given request
> without making any changes to target device(s).  They check any known
> restrictions and/or prerequisite conditions to their modules, and fail
> an unsupported request before making any changes.  For instance, the
> memory module may check if a hot-remove request is targeted to movable
> ranges.
> 
> 2) Execute Phase - Handlers make requested change within the scope that
> its roll-back is possible in case of a failure.  Execute handlers must
> implement their roll-back procedures.
> 
> 3) Commit Phase - Handlers make the final change that cannot be rolled-back.
> For instance, the ACPI module invokes _EJ0 for a hot-remove operation.
> 
> System Device Management Modules
> ================================
> 
> CPU Handlers
> ------------
> CPU handlers are provided by the CPU driver in drivers/base/cpu.c, and
> perform CPU online/offline procedures when CPU device(s) is added or
> deleted during an operation.
> 
> Memory Handlers
> ---------------
> Memory handlers are provided by the memory module in mm/memory_hotplug.c,
> and perform Memory online/offline procedure when memory device(s) is
> added or deleted during an operation.
> 
> FW-dependent Modules
> ====================
> 
> ACPI Bus Handlers
> -----------------
> ACPI bus handlers are provided by the ACPI core in drivers/acpi/bus.c,
> and construct/destruct acpi_device object(s) during a hot-plug operation.
> 
> ACPI Resource Handlers
> ----------------------
> ACPI resource handlers are provided by the ACPI core in
> drivers/acpi/hp_resource.c, and set device resource information to
> a request during a hot-plug operation.  This device resource information
> is then consumed by the system device management modules for their
> online/offline procedure.
> 
> ACPI Drivers
> ------------
> ACPI drivers are called from the ACPI core during a hot-plug operation
> through the following interfaces.  ACPI drivers are not called from the
> framework directly, and remain internal to the ACPI core.  ACPI drivers
> may not initiate online/offline of a device.
> 
> .add - Construct device-specific information to a given acpi_device.
> Called at boot, hot-add and sysfs bind.
> 
> .remove - Destruct device-specific information to a given acpi_device.
> Called at hot-remove and sysfs unbind.
> 
> .resource - Set device-specific resource information to a given hot-plug
> request.  Called at hot-add and hot-remove.

At this point I'd like to clearly understand how the code is supposed to work.

>From what I can say at the moment it all boils down to having two (ordered)
lists of notifiers (shp_add_list, shp_del_list) that can be added to or removed
from with shp_register_handler() and shp_unregister_handler(), respectively
(BTW, the abbreviation "hdr" makes me think about a "header" rather than a
"handler", but maybe that's just me :-)), and a workqueue for requests (why do
we need a separate workqueue for that?).

Whoever needs to carry out a hotplug operation is supposed to prepare a request
and then put it into the workqueue with shp_submit_request().  The framework
will then execute all of the notifier callbacks from the appropriate notifier
list (depending on whether the operation is a hot-add or a hot-remove).  If any
of those callbacks returns an error code and it is not too late (the order of
the failing notifier is not too high), the already executed notifier callbacks
will be run again with the "rollback" argument set to 1 (why not to use bool?)
to indicate that they are supposed to bring things back to the initial state.
Error codes returned in that stage only cause messages to be printed.

Is the description above correct?

If so, it looks like subsystems are supposed to register notifiers (handlers)
for hotplug/hot-remove operations of the devices they handle.  They are
supposed to use predefined order values to indicate what kinds of devices
those are.  Then, hopefully, if they do everything correctly, and the
initiator of a hotplug/hot-remove operation prepares the request correctly,
the callbacks will be executed in the right order, they will find their
devices in the list attached to the request object and they will do what's
necessary with them.

Am I still on the right track?

If that's the case, I have a few questions.

(1) Why is this limited to system devices?

(2) What's the guarantee that the ordering of hot-removal (for example) of CPU
    cores with respect to memory and host bridges will always be the same?
    What if the CPU cores themselves need to be hot-removed in a specific
    order?

(3) What's the guarantee that the ordering of shp_add_list and shp_del_list
    will be in agreement with the ordering of the device hierarchy?

(4) Why do you think that the ordering of hot-plug operations needs to be
    independent of the device herarchy ordering?

(5) Why do you think it's a good idea to require every callback routine to
    browse the entire list of devices attached to the request object?  Wouldn't
    it be more convenient if they were called only for the types of devices
    they have declared to handle?  [That would reduce some code duplication,
    for example.]

(6) Why is it convenient to use order values (priorities) of notifiers to
    indicate both the ordering with respect to the other notifiers and the
    "level" (e.g. whether or not rollback is possible) at the same time?  Those
    things appear to be conceptually distinct.

(7) Why callbacks used for "add" operations still need to check if the
    operation type is "add" (cpu_add_execute() does that for example)?

(8) What problems *exactly* this is supposed to address?  Can you give a few
    examples, please?

I guess I'll have more questions going forward.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 00/12] System device hot-plug framework
  2013-01-17  0:50 ` [RFC PATCH v2 00/12] System device hot-plug framework Rafael J. Wysocki
@ 2013-01-17 17:59   ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-17 17:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: lenb, gregkh, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thu, 2013-01-17 at 01:50 +0100, Rafael J. Wysocki wrote:
> On Thursday, January 10, 2013 04:40:18 PM Toshi Kani wrote:
> > This patchset is a prototype of proposed system device hot-plug framework
> > for design review.  Unlike other hot-plug environments, such as USB and
> > PCI, there is no common framework for system device hot-plug [1].
> > Therefore, this patchset is designed to provide a common framework for
> > hot-plugging and online/offline operations of system devices, such as CPU,
> > Memory and Node.  While this patchset only supports ACPI-based hot-plug
> > operations, the framework itself is designed to be platform-neural and
> > can support other FW architectures as necessary.
 :
> At this point I'd like to clearly understand how the code is supposed to work.

Thanks for reviewing!

> From what I can say at the moment it all boils down to having two (ordered)
> lists of notifiers (shp_add_list, shp_del_list) that can be added to or removed
> from with shp_register_handler() and shp_unregister_handler(), respectively

Yes.

> (BTW, the abbreviation "hdr" makes me think about a "header" rather than a
> "handler", but maybe that's just me :-)), 

Well, it makes me think that way as well. :)  How about "hdlr"?

> and a workqueue for requests (why do
> we need a separate workqueue for that?).

This workqueue needs to be platform-neutral and max_active set to 1, and
preferably is dedicated for hotplug operations.  kacpi_hotplug_wq is
close, but is ACPI-specific.  So, I decided to create a new workqueue
for this framework.

> Whoever needs to carry out a hotplug operation is supposed to prepare a request
> and then put it into the workqueue with shp_submit_request().  The framework
> will then execute all of the notifier callbacks from the appropriate notifier
> list (depending on whether the operation is a hot-add or a hot-remove).  If any
> of those callbacks returns an error code and it is not too late (the order of
> the failing notifier is not too high), the already executed notifier callbacks
> will be run again with the "rollback" argument set to 1 (why not to use bool?)

Agreed.  I will change the rollback to bool.

> to indicate that they are supposed to bring things back to the initial state.
> Error codes returned in that stage only cause messages to be printed.
>
> Is the description above correct?

Yes.  It's very good summary!

> If so, it looks like subsystems are supposed to register notifiers (handlers)
> for hotplug/hot-remove operations of the devices they handle.  They are
> supposed to use predefined order values to indicate what kinds of devices
> those are.  Then, hopefully, if they do everything correctly, and the
> initiator of a hotplug/hot-remove operation prepares the request correctly,
> the callbacks will be executed in the right order, they will find their
> devices in the list attached to the request object and they will do what's
> necessary with them.
> 
> Am I still on the right track?

Yes.

> If that's the case, I have a few questions.

Well, there are more than a few :), but they all are excellent
questions!

> (1) Why is this limited to system devices?

It could be extended to other devices, but is specifically designed for
system devices as follows.  So, I think it is best to keep it in that
way.

a) Work with multiple subsystems without bus dependency.  Other hot-plug
frameworks are designed and implemented for a particular bus and a
subsystem.  Therefore, they work best for their targeted environment as
well.

b) Sequence with pre-defined order.  This allows hot-add operation and
the boot sequence to be consistent.  Other non-system devices are
initialized within a subsystem, and do not depend on the boot-up
sequence.

> (2) What's the guarantee that the ordering of hot-removal (for example) of CPU
>     cores with respect to memory and host bridges will always be the same?
>     What if the CPU cores themselves need to be hot-removed in a specific
>     order?

When devices are added in the order of A->B->C, their dependency model
is:
 - B may depend on A (but A may not depend on B)
 - C may depend on A and B (but A and B may not depend on C)

Therefore, they can be deleted in the order of C->B->A.

The boot sequence defines the order for add.  So, it is important to
make sure that we hot-add devices in the same order with the boot
sequence.  Of course, if there is an issue in the order, we need to fix
it.  But the point is that the add order should be consistent between
the boot sequence and hot-add.

In your example, the boot sequence adds them in the order of
memory->CPU->host bridge.  I think this makes sense because cpu may need
its local memory, and host bridge may need its local memory and local
cpu for interrupt.  So, hot-add needs to do the same for node hot-add,
and hot-delete should be able to delete them in the reversed order per
their dependency model.

> (3) What's the guarantee that the ordering of shp_add_list and shp_del_list
>     will be in agreement with the ordering of the device hierarchy?

Only the ACPI bus handlers (i.e. ACPI core) performs hierarchy based
initialization / deletion.  This is the case with the boot-up sequence
as well.

For hot-add, the ACPI core enumerates all devices based on the device
hierarchy and builds their device tree with "enabled" devices.  The
hierarchy defines the scope of devices to be added.  Then, all enabled
system devices are directly accessible without any restriction, so their
online initialization (cpu and mm handlers) does not have to be based on
their hierarchy.  It is done by the predefined order.

Similarly, for hot-delete, the ACPI core trims all devices based on the
device hierarchy after all devices are off-lined with predefined order.

> (4) Why do you think that the ordering of hot-plug operations needs to be
>     independent of the device herarchy ordering?

The ordering of the boot sequence and hot-add need to be consistent, and
the boot sequence may not be ordered by the hierarchy.  Furthermore, the
hierarchy does not necessarily dictate the proper order of device
initialization.  For instance, memory devices and processor devices may
be described as siblings under a same parent (ex. node, socket), but
there is no guarantee that memory devices are listed before processor
devices in order to initialize memory before cpu (or in order to delete
cpu before memory).

> (5) Why do you think it's a good idea to require every callback routine to
>     browse the entire list of devices attached to the request object?  Wouldn't
>     it be more convenient if they were called only for the types of devices
>     they have declared to handle?  [That would reduce some code duplication,
>     for example.]

This version is aimed for simplicity and yes, there is a room for
optimization.  One way to do so is to have a separate device list for
each type in shp_request.  This way, for instance, the cpu handlers only
check for the cpu list, and do nothing for memory hot-plug since the cpu
list is empty.  I will make this change if it makes sense.

struct shp_request {
        /* common info */
		:

        /* device resource info */
        struct list_head        cpu_dev_list;   /* cpu device list */
        struct list_head        mem_dev_list;   /* memory device list */
		:
};

> (6) Why is it convenient to use order values (priorities) of notifiers to
>     indicate both the ordering with respect to the other notifiers and the
>     "level" (e.g. whether or not rollback is possible) at the same time?  Those
>     things appear to be conceptually distinct.

It allows a single set of add (shp_add_list_head) and delete (
shp_del_list_head) lists to list all levels of the handlers.  Otherwise,
it will need to have a separate list for validate, execute and commit.
This makes shp_start_req() simpler as it can call all handlers from a
single list.

Note that this list handling is abstracted within sys_hotplug.c, and is
not visible from the handlers.  struct shp_handler, order base values
(ex. SHP_EXECUTE_ORDER_BASE), and the lists are all locally defined in
sys_hotplug.c.  Therefore, the list handling can be updated without
impacting the handlers.

> (7) Why callbacks used for "add" operations still need to check if the
>     operation type is "add" (cpu_add_execute() does that for example)?

Such check should not be needed.  Are you referring the check with
shp_is_online_op() in cpu_add_execute()?  shp_is_online_op() returns
true for online/offline operations, and false for hot-add/delete
operations.  This check is a workaround for an inherited issue from the
original code.  KOBJ_ONLINE needs to be sent to a cpu dev
under /sys/devices/system/cpu.  However, in case of hot-add/delete
operations, we only have a device for an ACPI cpu dev (LNXCPU)
under /sys/bus/acpi/devices.  Hence, we cannot send KOBJ_ONLINE to the
cpu dev.  Similarly, acpi_processor_handle_eject() in the original code
cannot send KOBJ_OFFLINE to its cpu dev when it calls cpu_down().

>From what I see in udev's behavior, though, this issue does not seem to
cause any issue.  For hot-add/delete, it still sends
KOBJ_ADD/KOBJ_REMOVE to a cpu dev, and udev reacts from this event.

> (8) What problems *exactly* this is supposed to address?  Can you give a few
>     examples, please?

Here are a few examples of the problems that this framework will
address.

1. Race conditions.  The current locking scheme is fine grained.  While
it protects some critical sections, it does not protect from multiple
operations running simultaneously and competing each other.  For
instance, the following case can happen.
 1) A node hot-delete operation runs, and offlined all CPUs in the node.
 2) A separate cpu online operation comes in, and onlined a CPU in the
node.
 3) The node hot-plug operation ejects the node, and the system gets
crashed since one of the CPU is online.

The framework provides end-to-end protection to an operation, and
prevents such case to happen.  We may also remove the current fine
grained locking for simpler and better code maintainability.

2. ACPI .add/.remove overload.  Currently, ACPI drivers use .add/.remove
to online/offline a device during hot-plug operations.  The .add/.remove
ops are defined as attach/detach of the driver to a device, not
online/offline of the device.  Therefore, .add/.remove may not fail.
This has caused a major issue in memory hot-delete that it still ejects
a target memory even if its memory offlining failed.

The framework allows .add/.remove opts to do as they defined, and handle
failure cases properly.

3. Inconsistency with the boot path.  In the boot-up sequence, system
devices are initialized in pre-defined order, and ACPI bus walk is done
as one of the last steps after most system devices are actually
initialized.  The current hotplug scheme requires all system device
initialization to proceed in ACPI bus walk, which requires an
inconsistent role model for hotplug operations compared with the boot-up
sequence.  Furthermore, it may not properly order system device
initialization among multiple device types (i.e. Memory -> CPU) for node
hotplug, unlike the boot-up sequence.

The framework keeps the role model consistent with the boot sequence as
well as the ordering of the initialization.

> I guess I'll have more questions going forward.

Great!

Thanks a lot!
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-14 19:02         ` Toshi Kani
@ 2013-01-30  4:48           ` Greg KH
  2013-01-31  1:15             ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-01-30  4:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Mon, Jan 14, 2013 at 12:02:04PM -0700, Toshi Kani wrote:
> On Mon, 2013-01-14 at 19:48 +0100, Rafael J. Wysocki wrote:
> > On Monday, January 14, 2013 08:33:48 AM Toshi Kani wrote:
> > > On Fri, 2013-01-11 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > On Thursday, January 10, 2013 04:40:19 PM Toshi Kani wrote:
> > > > > Added include/linux/sys_hotplug.h, which defines the system device
> > > > > hotplug framework interfaces used by the framework itself and
> > > > > handlers.
> > > > > 
> > > > > The order values define the calling sequence of handlers.  For add
> > > > > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > > > > CPU so that threads on new CPUs can start using their local memory.
> > > > > The ordering of the delete execute is symmetric to the add execute.
> > > > > 
> > > > > struct shp_request defines a hot-plug request information.  The
> > > > > device resource information is managed with a list so that a single
> > > > > request may target to multiple devices.
> > > > > 
> > >  :
> > > > > +
> > > > > +struct shp_device {
> > > > > +	struct list_head	list;
> > > > > +	struct device		*device;
> > > > > +	enum shp_class		class;
> > > > > +	union shp_dev_info	info;
> > > > > +};
> > > > > +
> > > > > +/*
> > > > > + * Hot-plug request
> > > > > + */
> > > > > +struct shp_request {
> > > > > +	/* common info */
> > > > > +	enum shp_operation	operation;	/* operation */
> > > > > +
> > > > > +	/* hot-plug event info: only valid for hot-plug operations */
> > > > > +	void			*handle;	/* FW handle */
> > > > 
> > > > What's the role of handle here?
> > > 
> > > On ACPI-based platforms, the handle keeps a notified ACPI handle when a
> > > hot-plug request is made.  ACPI bus handlers, acpi_add_execute() /
> > > acpi_del_execute(), then scans / trims ACPI devices from the handle.
> > 
> > OK, so this is ACPI-specific and should be described as such.
> 
> Other FW interface I know is parisc, which has mod_index (module index)
> to identify a unique object, just like what ACPI handle does.  The
> handle can keep the mod_index as an opaque value as well.  But as you
> said, I do not know if the handle works for all other FWs.  So, I will
> add descriptions, such that the hot-plug event info is modeled after
> ACPI and may need to be revisited when supporting other FW.

Please make it a "real" pointer, and not a void *, those shouldn't be
used at all if possible.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-14 19:21             ` Toshi Kani
@ 2013-01-30  4:51               ` Greg KH
  2013-01-31  1:38                 ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-01-30  4:51 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Mon, Jan 14, 2013 at 12:21:30PM -0700, Toshi Kani wrote:
> On Mon, 2013-01-14 at 20:07 +0100, Rafael J. Wysocki wrote:
> > On Monday, January 14, 2013 11:42:09 AM Toshi Kani wrote:
> > > On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> > > > On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > > > > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > > > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > > > > device hotplug header and defines the order values of ACPI-specific
> > > > > > > handlers.
> > > > > > > 
> > > > > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > > > > ---
> > > > > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > > >  1 file changed, 48 insertions(+)
> > > > > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > > > > 
> > > > > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > > > > new file mode 100644
> > > > > > > index 0000000..ad80f61
> > > > > > > --- /dev/null
> > > > > > > +++ b/include/acpi/sys_hotplug.h
> > > > > > > @@ -0,0 +1,48 @@
> > > > > > > +/*
> > > > > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > > > > + *
> > > > > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > > > > + *
> > > > > > > + * This program is free software; you can redistribute it and/or modify
> > > > > > > + * it under the terms of the GNU General Public License version 2 as
> > > > > > > + * published by the Free Software Foundation.
> > > > > > > + */
> > > > > > > +
> > > > > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > > > > +
> > > > > > > +#include <linux/list.h>
> > > > > > > +#include <linux/device.h>
> > > > > > > +#include <linux/sys_hotplug.h>
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * System device hot-plug operation proceeds in the following order.
> > > > > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > > > > + *
> > > > > > > + * The order values below define the calling sequence of ACPI-specific
> > > > > > > + * handlers for each phase in ascending order.  The order value of
> > > > > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > > > > + */
> > > > > > > +
> > > > > > > +/* Add Validate order values */
> > > > > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > > > > +
> > > > > > > +/* Add Execute order values */
> > > > > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > > > > +
> > > > > > > +/* Add Commit order values */
> > > > > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > > > > +
> > > > > > > +/* Delete Validate order values */
> > > > > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > > > > +
> > > > > > > +/* Delete Execute order values */
> > > > > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > > > > +
> > > > > > > +/* Delete Commit order values */
> > > > > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > > > > +
> > > > > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > > > > --
> > > > > > 
> > > > > > Why did you use the particular values above?
> > > > > 
> > > > > The ordering values above are used to define the relative order among
> > > > > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > > > > potentially be 21 since it is still larger than 20 for
> > > > > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > > > > so that more platform-neutral handlers can be added in between 20 and
> > > > > 100 in future.
> > > > 
> > > > I thought so, but I don't think it's a good idea to add gaps like this.
> > > 
> > > OK, I will use an equal gap of 10 for all values.  So, the 100 in the
> > > above example will be changed to 30.  
> > 
> > I wonder why you want to have those gaps at all.
> 
> Oh, I see.  I think some gap is helpful since it allows a new handler to
> come between without recompiling other modules.  For instance, OEM
> vendors may want to add their own handlers with loadable modules after
> the kernel is distributed.

No, we don't support such a model, sorry, just make it a sequence of
numbers and go from there.  If a vendor wants to modify the kernel to
add new values, they can rebuild the core code as well.

I really don't like the whole idea of values in the first place, can't
we just do things in the correct order in the code, and not be driven by
random magic values?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
  2013-01-11 21:23   ` Rafael J. Wysocki
@ 2013-01-30  4:53   ` Greg KH
  2013-01-31  1:46     ` Toshi Kani
  2013-01-30  4:58   ` Greg KH
  2 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-01-30  4:53 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> Added include/linux/sys_hotplug.h, which defines the system device
> hotplug framework interfaces used by the framework itself and
> handlers.
> 
> The order values define the calling sequence of handlers.  For add
> execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> CPU so that threads on new CPUs can start using their local memory.
> The ordering of the delete execute is symmetric to the add execute.
> 
> struct shp_request defines a hot-plug request information.  The
> device resource information is managed with a list so that a single
> request may target to multiple devices.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  include/linux/sys_hotplug.h |  181 +++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 181 insertions(+)
>  create mode 100644 include/linux/sys_hotplug.h
> 
> diff --git a/include/linux/sys_hotplug.h b/include/linux/sys_hotplug.h
> new file mode 100644
> index 0000000..86674dd
> --- /dev/null
> +++ b/include/linux/sys_hotplug.h
> @@ -0,0 +1,181 @@
> +/*
> + * sys_hotplug.h - System device hot-plug framework
> + *
> + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> + *	Toshi Kani <toshi.kani@hp.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + */
> +
> +#ifndef _LINUX_SYS_HOTPLUG_H
> +#define _LINUX_SYS_HOTPLUG_H
> +
> +#include <linux/list.h>
> +#include <linux/device.h>
> +
> +/*
> + * System device hot-plug operation proceeds in the following order.
> + *   Validate phase -> Execute phase -> Commit phase
> + *
> + * The order values below define the calling sequence of platform
> + * neutral handlers for each phase in ascending order.  The order
> + * values of firmware-specific handlers are defined in sys_hotplug.h
> + * under firmware specific directories.
> + */
> +
> +/* All order values must be smaller than this value */
> +#define SHP_ORDER_MAX				0xffffff
> +
> +/* Add Validate order values */
> +
> +/* Add Execute order values */
> +#define SHP_MEM_ADD_EXECUTE_ORDER		100
> +#define SHP_CPU_ADD_EXECUTE_ORDER		110
> +
> +/* Add Commit order values */
> +
> +/* Delete Validate order values */
> +#define SHP_CPU_DEL_VALIDATE_ORDER		100
> +#define SHP_MEM_DEL_VALIDATE_ORDER		110
> +
> +/* Delete Execute order values */
> +#define SHP_CPU_DEL_EXECUTE_ORDER		10
> +#define SHP_MEM_DEL_EXECUTE_ORDER		20
> +
> +/* Delete Commit order values */
> +

Empty value?

Anyway, as I said before, don't use "values", just call things directly
in the order you need to.

This isn't like other operating systems, we don't need to be so
"flexible", we can modify the core code as much as we want and need to
if future things come along :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 03/12] drivers/base: Add system device hotplug framework
  2013-01-10 23:40 ` [RFC PATCH v2 03/12] drivers/base: Add " Toshi Kani
@ 2013-01-30  4:54   ` Greg KH
  2013-01-31  1:48     ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-01-30  4:54 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thu, Jan 10, 2013 at 04:40:21PM -0700, Toshi Kani wrote:
> Added sys_hotplug.c, which is the system device hotplug framework code.
> 
> shp_register_handler() allows modules to register their hotplug handlers
> to the framework.  shp_submit_req() provides the interface to submit
> a hotplug or online/offline request of system devices.  The request is
> then put into hp_workqueue.  shp_start_req() calls all registered handlers
> in ascending order for each phase.  If any handler failed in validate or
> execute phase, shp_start_req() initiates its rollback procedure.
> 
> Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> ---
>  drivers/base/Makefile      |    1 
>  drivers/base/sys_hotplug.c |  313 ++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 314 insertions(+)
>  create mode 100644 drivers/base/sys_hotplug.c
> 
> diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> index 5aa2d70..2e9b2f1 100644
> --- a/drivers/base/Makefile
> +++ b/drivers/base/Makefile
> @@ -21,6 +21,7 @@ endif
>  obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o
>  obj-$(CONFIG_REGMAP)	+= regmap/
>  obj-$(CONFIG_SOC_BUS) += soc.o
> +obj-y			+= sys_hotplug.o

No option to select this for systems that don't need it?  If not, then
put it up higher with all of the other code for the core.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
  2013-01-11 21:23   ` Rafael J. Wysocki
  2013-01-30  4:53   ` Greg KH
@ 2013-01-30  4:58   ` Greg KH
  2013-01-31  2:57     ` Toshi Kani
  2 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-01-30  4:58 UTC (permalink / raw)
  To: Toshi Kani
  Cc: rjw, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> +/*
> + * Hot-plug device information
> + */

Again, stop it with the "generic" hotplug term here, and everywhere
else.  You are doing a very _specific_ type of hotplug devices, so spell
it out.  We've worked hard to hotplug _everything_ in Linux, you are
going to confuse a lot of people with this type of terms.

> +union shp_dev_info {
> +	struct shp_cpu {
> +		u32		cpu_id;
> +	} cpu;

What is this?  Why not point to the system device for the cpu?

> +	struct shp_memory {
> +		int		node;
> +		u64		start_addr;
> +		u64		length;
> +	} mem;

Same here, why not point to the system device?

> +	struct shp_hostbridge {
> +	} hb;
> +
> +	struct shp_node {
> +	} node;

What happened here with these?  Empty structures?  Huh?

> +};
> +
> +struct shp_device {
> +	struct list_head	list;
> +	struct device		*device;

No, make it a "real" device, embed the device into it.

But, again, I'm going to ask why you aren't using the existing cpu /
memory / bridge / node devices that we have in the kernel.  Please use
them, or give me a _really_ good reason why they will not work.

> +	enum shp_class		class;
> +	union shp_dev_info	info;
> +};
> +
> +/*
> + * Hot-plug request
> + */
> +struct shp_request {
> +	/* common info */
> +	enum shp_operation	operation;	/* operation */
> +
> +	/* hot-plug event info: only valid for hot-plug operations */
> +	void			*handle;	/* FW handle */
> +	u32			event;		/* FW event */

What is this?

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-30  4:48           ` Greg KH
@ 2013-01-31  1:15             ` Toshi Kani
  2013-01-31  5:24               ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-31  1:15 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Tue, 2013-01-29 at 23:48 -0500, Greg KH wrote:
> On Mon, Jan 14, 2013 at 12:02:04PM -0700, Toshi Kani wrote:
> > On Mon, 2013-01-14 at 19:48 +0100, Rafael J. Wysocki wrote:
> > > On Monday, January 14, 2013 08:33:48 AM Toshi Kani wrote:
> > > > On Fri, 2013-01-11 at 22:23 +0100, Rafael J. Wysocki wrote:
> > > > > On Thursday, January 10, 2013 04:40:19 PM Toshi Kani wrote:
> > > > > > Added include/linux/sys_hotplug.h, which defines the system device
> > > > > > hotplug framework interfaces used by the framework itself and
> > > > > > handlers.
> > > > > > 
> > > > > > The order values define the calling sequence of handlers.  For add
> > > > > > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > > > > > CPU so that threads on new CPUs can start using their local memory.
> > > > > > The ordering of the delete execute is symmetric to the add execute.
> > > > > > 
> > > > > > struct shp_request defines a hot-plug request information.  The
> > > > > > device resource information is managed with a list so that a single
> > > > > > request may target to multiple devices.
> > > > > > 
> > > >  :
> > > > > > +
> > > > > > +struct shp_device {
> > > > > > +	struct list_head	list;
> > > > > > +	struct device		*device;
> > > > > > +	enum shp_class		class;
> > > > > > +	union shp_dev_info	info;
> > > > > > +};
> > > > > > +
> > > > > > +/*
> > > > > > + * Hot-plug request
> > > > > > + */
> > > > > > +struct shp_request {
> > > > > > +	/* common info */
> > > > > > +	enum shp_operation	operation;	/* operation */
> > > > > > +
> > > > > > +	/* hot-plug event info: only valid for hot-plug operations */
> > > > > > +	void			*handle;	/* FW handle */
> > > > > 
> > > > > What's the role of handle here?
> > > > 
> > > > On ACPI-based platforms, the handle keeps a notified ACPI handle when a
> > > > hot-plug request is made.  ACPI bus handlers, acpi_add_execute() /
> > > > acpi_del_execute(), then scans / trims ACPI devices from the handle.
> > > 
> > > OK, so this is ACPI-specific and should be described as such.
> > 
> > Other FW interface I know is parisc, which has mod_index (module index)
> > to identify a unique object, just like what ACPI handle does.  The
> > handle can keep the mod_index as an opaque value as well.  But as you
> > said, I do not know if the handle works for all other FWs.  So, I will
> > add descriptions, such that the hot-plug event info is modeled after
> > ACPI and may need to be revisited when supporting other FW.
> 
> Please make it a "real" pointer, and not a void *, those shouldn't be
> used at all if possible.

How about changing the "void *handle" to acpi_dev_node below?   

   struct acpi_dev_node    acpi_node;

Basically, it has the same challenge as struct device, which uses
acpi_dev_node as well.  We can add other FW node when needed (just like
device also has *of_node).

Thanks,
-Toshi




^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 02/12] ACPI: Add sys_hotplug.h for system device hotplug framework
  2013-01-30  4:51               ` Greg KH
@ 2013-01-31  1:38                 ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-31  1:38 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Tue, 2013-01-29 at 23:51 -0500, Greg KH wrote:
> On Mon, Jan 14, 2013 at 12:21:30PM -0700, Toshi Kani wrote:
> > On Mon, 2013-01-14 at 20:07 +0100, Rafael J. Wysocki wrote:
> > > On Monday, January 14, 2013 11:42:09 AM Toshi Kani wrote:
> > > > On Mon, 2013-01-14 at 19:47 +0100, Rafael J. Wysocki wrote:
> > > > > On Monday, January 14, 2013 08:53:53 AM Toshi Kani wrote:
> > > > > > On Fri, 2013-01-11 at 22:25 +0100, Rafael J. Wysocki wrote:
> > > > > > > On Thursday, January 10, 2013 04:40:20 PM Toshi Kani wrote:
> > > > > > > > Added include/acpi/sys_hotplug.h, which is ACPI-specific system
> > > > > > > > device hotplug header and defines the order values of ACPI-specific
> > > > > > > > handlers.
> > > > > > > > 
> > > > > > > > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > > > > > > > ---
> > > > > > > >  include/acpi/sys_hotplug.h |   48 ++++++++++++++++++++++++++++++++++++++++++++
> > > > > > > >  1 file changed, 48 insertions(+)
> > > > > > > >  create mode 100644 include/acpi/sys_hotplug.h
> > > > > > > > 
> > > > > > > > diff --git a/include/acpi/sys_hotplug.h b/include/acpi/sys_hotplug.h
> > > > > > > > new file mode 100644
> > > > > > > > index 0000000..ad80f61
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/include/acpi/sys_hotplug.h
> > > > > > > > @@ -0,0 +1,48 @@
> > > > > > > > +/*
> > > > > > > > + * sys_hotplug.h - ACPI System device hot-plug framework
> > > > > > > > + *
> > > > > > > > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > > > > > > > + *	Toshi Kani <toshi.kani@hp.com>
> > > > > > > > + *
> > > > > > > > + * This program is free software; you can redistribute it and/or modify
> > > > > > > > + * it under the terms of the GNU General Public License version 2 as
> > > > > > > > + * published by the Free Software Foundation.
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#ifndef _ACPI_SYS_HOTPLUG_H
> > > > > > > > +#define _ACPI_SYS_HOTPLUG_H
> > > > > > > > +
> > > > > > > > +#include <linux/list.h>
> > > > > > > > +#include <linux/device.h>
> > > > > > > > +#include <linux/sys_hotplug.h>
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * System device hot-plug operation proceeds in the following order.
> > > > > > > > + *   Validate phase -> Execute phase -> Commit phase
> > > > > > > > + *
> > > > > > > > + * The order values below define the calling sequence of ACPI-specific
> > > > > > > > + * handlers for each phase in ascending order.  The order value of
> > > > > > > > + * platform-neutral handlers are defined in <linux/sys_hotplug.h>.
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +/* Add Validate order values */
> > > > > > > > +#define SHP_ACPI_BUS_ADD_VALIDATE_ORDER		0	/* must be first */
> > > > > > > > +
> > > > > > > > +/* Add Execute order values */
> > > > > > > > +#define SHP_ACPI_BUS_ADD_EXECUTE_ORDER		10
> > > > > > > > +#define SHP_ACPI_RES_ADD_EXECUTE_ORDER		20
> > > > > > > > +
> > > > > > > > +/* Add Commit order values */
> > > > > > > > +#define SHP_ACPI_BUS_ADD_COMMIT_ORDER		10
> > > > > > > > +
> > > > > > > > +/* Delete Validate order values */
> > > > > > > > +#define SHP_ACPI_BUS_DEL_VALIDATE_ORDER		0	/* must be first */
> > > > > > > > +#define SHP_ACPI_RES_DEL_VALIDATE_ORDER		10
> > > > > > > > +
> > > > > > > > +/* Delete Execute order values */
> > > > > > > > +#define SHP_ACPI_BUS_DEL_EXECUTE_ORDER		100
> > > > > > > > +
> > > > > > > > +/* Delete Commit order values */
> > > > > > > > +#define SHP_ACPI_BUS_DEL_COMMIT_ORDER		100
> > > > > > > > +
> > > > > > > > +#endif	/* _ACPI_SYS_HOTPLUG_H */
> > > > > > > > --
> > > > > > > 
> > > > > > > Why did you use the particular values above?
> > > > > > 
> > > > > > The ordering values above are used to define the relative order among
> > > > > > handlers.  For instance, the 100 for SHP_ACPI_BUS_DEL_EXECUTE_ORDER can
> > > > > > potentially be 21 since it is still larger than 20 for
> > > > > > SHP_MEM_DEL_EXECUTE_ORDER defined in linux/sys_hotplug.h.  I picked 100
> > > > > > so that more platform-neutral handlers can be added in between 20 and
> > > > > > 100 in future.
> > > > > 
> > > > > I thought so, but I don't think it's a good idea to add gaps like this.
> > > > 
> > > > OK, I will use an equal gap of 10 for all values.  So, the 100 in the
> > > > above example will be changed to 30.  
> > > 
> > > I wonder why you want to have those gaps at all.
> > 
> > Oh, I see.  I think some gap is helpful since it allows a new handler to
> > come between without recompiling other modules.  For instance, OEM
> > vendors may want to add their own handlers with loadable modules after
> > the kernel is distributed.
> 
> No, we don't support such a model, sorry, just make it a sequence of
> numbers and go from there.  If a vendor wants to modify the kernel to
> add new values, they can rebuild the core code as well.
> 
> I really don't like the whole idea of values in the first place, can't
> we just do things in the correct order in the code, and not be driven by
> random magic values?

OK, I will define all the values with enum, which is something like
below.  I think it is more manageable in this way as we do not have to
define magic values.

enum shp_add_order {
    /* Validate Phase */
    SHP_FW_BUS_ADD_VALIDATE_ORDER,

    /* Execute Phase */
    SHP_FW_BUS_ADD_EXECUTE_ORDER,
    SHP_FW_RES_ADD_EXECUTE_ORDER,
    SHP_MEM_ADD_EXECUTE_ORDER,
    SHP_CPU_ADD_EXECUTE_ORDER,

    /* Commit Phase */
    SHP_ADD_COMMIT_BASE_ORDER,
    SHP_FW_BUS_ADD_COMMIT_ORDER,
};

Thanks,
-Toshi






^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-30  4:53   ` Greg KH
@ 2013-01-31  1:46     ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-31  1:46 UTC (permalink / raw)
  To: Greg KH
  Cc: rjw, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tue, 2013-01-29 at 23:53 -0500, Greg KH wrote:
> On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > Added include/linux/sys_hotplug.h, which defines the system device
> > hotplug framework interfaces used by the framework itself and
> > handlers.
> > 
> > The order values define the calling sequence of handlers.  For add
> > execute, the ordering is ACPI->MEM->CPU.  Memory is onlined before
> > CPU so that threads on new CPUs can start using their local memory.
> > The ordering of the delete execute is symmetric to the add execute.
> > 
> > struct shp_request defines a hot-plug request information.  The
> > device resource information is managed with a list so that a single
> > request may target to multiple devices.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  include/linux/sys_hotplug.h |  181 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 181 insertions(+)
> >  create mode 100644 include/linux/sys_hotplug.h
> > 
> > diff --git a/include/linux/sys_hotplug.h b/include/linux/sys_hotplug.h
> > new file mode 100644
> > index 0000000..86674dd
> > --- /dev/null
> > +++ b/include/linux/sys_hotplug.h
> > @@ -0,0 +1,181 @@
> > +/*
> > + * sys_hotplug.h - System device hot-plug framework
> > + *
> > + * Copyright (C) 2012 Hewlett-Packard Development Company, L.P.
> > + *	Toshi Kani <toshi.kani@hp.com>
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License version 2 as
> > + * published by the Free Software Foundation.
> > + */
> > +
> > +#ifndef _LINUX_SYS_HOTPLUG_H
> > +#define _LINUX_SYS_HOTPLUG_H
> > +
> > +#include <linux/list.h>
> > +#include <linux/device.h>
> > +
> > +/*
> > + * System device hot-plug operation proceeds in the following order.
> > + *   Validate phase -> Execute phase -> Commit phase
> > + *
> > + * The order values below define the calling sequence of platform
> > + * neutral handlers for each phase in ascending order.  The order
> > + * values of firmware-specific handlers are defined in sys_hotplug.h
> > + * under firmware specific directories.
> > + */
> > +
> > +/* All order values must be smaller than this value */
> > +#define SHP_ORDER_MAX				0xffffff
> > +
> > +/* Add Validate order values */
> > +
> > +/* Add Execute order values */
> > +#define SHP_MEM_ADD_EXECUTE_ORDER		100
> > +#define SHP_CPU_ADD_EXECUTE_ORDER		110
> > +
> > +/* Add Commit order values */
> > +
> > +/* Delete Validate order values */
> > +#define SHP_CPU_DEL_VALIDATE_ORDER		100
> > +#define SHP_MEM_DEL_VALIDATE_ORDER		110
> > +
> > +/* Delete Execute order values */
> > +#define SHP_CPU_DEL_EXECUTE_ORDER		10
> > +#define SHP_MEM_DEL_EXECUTE_ORDER		20
> > +
> > +/* Delete Commit order values */
> > +
> 
> Empty value?

Yes, in this version, all the delete commit order values are defined in
<acpi/sys_hotplug.h>.

> Anyway, as I said before, don't use "values", just call things directly
> in the order you need to.
> 
> This isn't like other operating systems, we don't need to be so
> "flexible", we can modify the core code as much as we want and need to
> if future things come along :)

Understood.  As described in the previous email, I will define them with
enum and avoid using values.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 03/12] drivers/base: Add system device hotplug framework
  2013-01-30  4:54   ` Greg KH
@ 2013-01-31  1:48     ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-31  1:48 UTC (permalink / raw)
  To: Greg KH
  Cc: rjw, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tue, 2013-01-29 at 23:54 -0500, Greg KH wrote:
> On Thu, Jan 10, 2013 at 04:40:21PM -0700, Toshi Kani wrote:
> > Added sys_hotplug.c, which is the system device hotplug framework code.
> > 
> > shp_register_handler() allows modules to register their hotplug handlers
> > to the framework.  shp_submit_req() provides the interface to submit
> > a hotplug or online/offline request of system devices.  The request is
> > then put into hp_workqueue.  shp_start_req() calls all registered handlers
> > in ascending order for each phase.  If any handler failed in validate or
> > execute phase, shp_start_req() initiates its rollback procedure.
> > 
> > Signed-off-by: Toshi Kani <toshi.kani@hp.com>
> > ---
> >  drivers/base/Makefile      |    1 
> >  drivers/base/sys_hotplug.c |  313 ++++++++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 314 insertions(+)
> >  create mode 100644 drivers/base/sys_hotplug.c
> > 
> > diff --git a/drivers/base/Makefile b/drivers/base/Makefile
> > index 5aa2d70..2e9b2f1 100644
> > --- a/drivers/base/Makefile
> > +++ b/drivers/base/Makefile
> > @@ -21,6 +21,7 @@ endif
> >  obj-$(CONFIG_SYS_HYPERVISOR) += hypervisor.o
> >  obj-$(CONFIG_REGMAP)	+= regmap/
> >  obj-$(CONFIG_SOC_BUS) += soc.o
> > +obj-y			+= sys_hotplug.o
> 
> No option to select this for systems that don't need it?  If not, then
> put it up higher with all of the other code for the core.

It used to have CONFIG_HOTPLUG, but I removed it as you suggested.  Yes,
I will put it up higher.  

Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-30  4:58   ` Greg KH
@ 2013-01-31  2:57     ` Toshi Kani
  2013-01-31 20:54       ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-01-31  2:57 UTC (permalink / raw)
  To: Greg KH
  Cc: rjw, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > +/*
> > + * Hot-plug device information
> > + */
> 
> Again, stop it with the "generic" hotplug term here, and everywhere
> else.  You are doing a very _specific_ type of hotplug devices, so spell
> it out.  We've worked hard to hotplug _everything_ in Linux, you are
> going to confuse a lot of people with this type of terms.

Agreed.  I will clarify in all places.

> > +union shp_dev_info {
> > +	struct shp_cpu {
> > +		u32		cpu_id;
> > +	} cpu;
> 
> What is this?  Why not point to the system device for the cpu?

This info is used to on-line a new CPU and create its system/cpu device.
In other word, a system/cpu device is created as a result of CPU
hotplug.

> > +	struct shp_memory {
> > +		int		node;
> > +		u64		start_addr;
> > +		u64		length;
> > +	} mem;
> 
> Same here, why not point to the system device?

Same as above.

> > +	struct shp_hostbridge {
> > +	} hb;
> > +
> > +	struct shp_node {
> > +	} node;
> 
> What happened here with these?  Empty structures?  Huh?

They are place holders for now.  PCI bridge hot-plug and node hot-plug
are still very much work in progress, so I have not integrated them into
this framework yet.

> > +};
> > +
> > +struct shp_device {
> > +	struct list_head	list;
> > +	struct device		*device;
> 
> No, make it a "real" device, embed the device into it.

This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
online/offline operation in order to maintain the current behavior.  CPU
online/offline operation only changes the state of CPU, so its
system/cpu device continues to be present before and after an operation.
(Whereas, CPU hot-add/delete operation creates or removes a system/cpu
device.)  So, this "*device" needs to be a pointer to reference an
existing device that is to be on-lined/off-lined.

> But, again, I'm going to ask why you aren't using the existing cpu /
> memory / bridge / node devices that we have in the kernel.  Please use
> them, or give me a _really_ good reason why they will not work.

We cannot use the existing system devices or ACPI devices here.  During
hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
device information in a platform-neutral way.  During hot-add, we first
creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
but platform-neutral modules cannot use them as they are ACPI-specific.
Also, its system device (i.e. device under /sys/devices/system) has not
been created until the hot-add operation completes.

> > +	enum shp_class		class;
> > +	union shp_dev_info	info;
> > +};
> > +
> > +/*
> > + * Hot-plug request
> > + */
> > +struct shp_request {
> > +	/* common info */
> > +	enum shp_operation	operation;	/* operation */
> > +
> > +	/* hot-plug event info: only valid for hot-plug operations */
> > +	void			*handle;	/* FW handle */
> > +	u32			event;		/* FW event */
> 
> What is this?

The shp_request describes a hotplug or online/offline operation that is
requested.  In case of hot-plug request, the "*handle" describes a
target device (which is an ACPI device object) and the "event" describes
a type of request, such as hot-add or hot-delete.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-31  1:15             ` Toshi Kani
@ 2013-01-31  5:24               ` Greg KH
  2013-01-31 14:42                 ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-01-31  5:24 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Wed, Jan 30, 2013 at 06:15:12PM -0700, Toshi Kani wrote:
> > Please make it a "real" pointer, and not a void *, those shouldn't be
> > used at all if possible.
> 
> How about changing the "void *handle" to acpi_dev_node below?   
> 
>    struct acpi_dev_node    acpi_node;
> 
> Basically, it has the same challenge as struct device, which uses
> acpi_dev_node as well.  We can add other FW node when needed (just like
> device also has *of_node).

That sounds good to me.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-31  5:24               ` Greg KH
@ 2013-01-31 14:42                 ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-01-31 14:42 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Thu, 2013-01-31 at 05:24 +0000, Greg KH wrote:
> On Wed, Jan 30, 2013 at 06:15:12PM -0700, Toshi Kani wrote:
> > > Please make it a "real" pointer, and not a void *, those shouldn't be
> > > used at all if possible.
> > 
> > How about changing the "void *handle" to acpi_dev_node below?   
> > 
> >    struct acpi_dev_node    acpi_node;
> > 
> > Basically, it has the same challenge as struct device, which uses
> > acpi_dev_node as well.  We can add other FW node when needed (just like
> > device also has *of_node).
> 
> That sounds good to me.

Great!  Thanks Greg,
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-31  2:57     ` Toshi Kani
@ 2013-01-31 20:54       ` Rafael J. Wysocki
  2013-02-01  1:32         ` Toshi Kani
  2013-02-01  7:23         ` Greg KH
  0 siblings, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-01-31 20:54 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Wednesday, January 30, 2013 07:57:45 PM Toshi Kani wrote:
> On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> > On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
> > > +/*
> > > + * Hot-plug device information
> > > + */
> > 
> > Again, stop it with the "generic" hotplug term here, and everywhere
> > else.  You are doing a very _specific_ type of hotplug devices, so spell
> > it out.  We've worked hard to hotplug _everything_ in Linux, you are
> > going to confuse a lot of people with this type of terms.
> 
> Agreed.  I will clarify in all places.
> 
> > > +union shp_dev_info {
> > > +	struct shp_cpu {
> > > +		u32		cpu_id;
> > > +	} cpu;
> > 
> > What is this?  Why not point to the system device for the cpu?
> 
> This info is used to on-line a new CPU and create its system/cpu device.
> In other word, a system/cpu device is created as a result of CPU
> hotplug.
> 
> > > +	struct shp_memory {
> > > +		int		node;
> > > +		u64		start_addr;
> > > +		u64		length;
> > > +	} mem;
> > 
> > Same here, why not point to the system device?
> 
> Same as above.
> 
> > > +	struct shp_hostbridge {
> > > +	} hb;
> > > +
> > > +	struct shp_node {
> > > +	} node;
> > 
> > What happened here with these?  Empty structures?  Huh?
> 
> They are place holders for now.  PCI bridge hot-plug and node hot-plug
> are still very much work in progress, so I have not integrated them into
> this framework yet.
> 
> > > +};
> > > +
> > > +struct shp_device {
> > > +	struct list_head	list;
> > > +	struct device		*device;
> > 
> > No, make it a "real" device, embed the device into it.
> 
> This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
> online/offline operation in order to maintain the current behavior.  CPU
> online/offline operation only changes the state of CPU, so its
> system/cpu device continues to be present before and after an operation.
> (Whereas, CPU hot-add/delete operation creates or removes a system/cpu
> device.)  So, this "*device" needs to be a pointer to reference an
> existing device that is to be on-lined/off-lined.
> 
> > But, again, I'm going to ask why you aren't using the existing cpu /
> > memory / bridge / node devices that we have in the kernel.  Please use
> > them, or give me a _really_ good reason why they will not work.
> 
> We cannot use the existing system devices or ACPI devices here.  During
> hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> device information in a platform-neutral way.  During hot-add, we first
> creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> but platform-neutral modules cannot use them as they are ACPI-specific.

But suppose we're smart and have ACPI scan handlers that will create
"physical" device nodes for those devices during the ACPI namespace scan.
Then, the platform-neutral nodes will be able to bind to those "physical"
nodes.  Moreover, it should be possible to get a hierarchy of device objects
this way that will reflect all of the dependencies we need to take into
account during hot-add and hot-remove operations.  That may not be what we
have today, but I don't see any *fundamental* obstacles preventing us from
using this approach.

This is already done for PCI host bridges and platform devices and I don't
see why we can't do that for the other types of devices too.

The only missing piece I see is a way to handle the "eject" problem, i.e.
when we try do eject a device at the top of a subtree and need to tear down
the entire subtree below it, but if that's going to lead to a system crash,
for example, we want to cancel the eject.  It seems to me that we'll need some
help from the driver core here.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-31 20:54       ` Rafael J. Wysocki
@ 2013-02-01  1:32         ` Toshi Kani
  2013-02-01  7:30           ` Greg KH
  2013-02-01  7:23         ` Greg KH
  1 sibling, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-01  1:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thu, 2013-01-31 at 21:54 +0100, Rafael J. Wysocki wrote:
> On Wednesday, January 30, 2013 07:57:45 PM Toshi Kani wrote:
> > On Tue, 2013-01-29 at 23:58 -0500, Greg KH wrote:
> > > On Thu, Jan 10, 2013 at 04:40:19PM -0700, Toshi Kani wrote:
 :
> > > > +};
> > > > +
> > > > +struct shp_device {
> > > > +	struct list_head	list;
> > > > +	struct device		*device;
> > > 
> > > No, make it a "real" device, embed the device into it.
> > 
> > This device pointer is used to send KOBJ_ONLINE/OFFLINE event during CPU
> > online/offline operation in order to maintain the current behavior.  CPU
> > online/offline operation only changes the state of CPU, so its
> > system/cpu device continues to be present before and after an operation.
> > (Whereas, CPU hot-add/delete operation creates or removes a system/cpu
> > device.)  So, this "*device" needs to be a pointer to reference an
> > existing device that is to be on-lined/off-lined.
> > 
> > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > memory / bridge / node devices that we have in the kernel.  Please use
> > > them, or give me a _really_ good reason why they will not work.
> > 
> > We cannot use the existing system devices or ACPI devices here.  During
> > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > device information in a platform-neutral way.  During hot-add, we first
> > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > but platform-neutral modules cannot use them as they are ACPI-specific.
> 
> But suppose we're smart and have ACPI scan handlers that will create
> "physical" device nodes for those devices during the ACPI namespace scan.
> Then, the platform-neutral nodes will be able to bind to those "physical"
> nodes.  Moreover, it should be possible to get a hierarchy of device objects
> this way that will reflect all of the dependencies we need to take into
> account during hot-add and hot-remove operations.  That may not be what we
> have today, but I don't see any *fundamental* obstacles preventing us from
> using this approach.

I misstated in my previous email.  system/cpu device is actually created
by ACPI driver during ACPI scan in case of hot-add.  This is done by 
acpi_processor_hotadd_init(), which I consider as a hack but can be
done.  system/memory device is created in add_memory() by the mm module.

> This is already done for PCI host bridges and platform devices and I don't
> see why we can't do that for the other types of devices too.
> 
> The only missing piece I see is a way to handle the "eject" problem, i.e.
> when we try do eject a device at the top of a subtree and need to tear down
> the entire subtree below it, but if that's going to lead to a system crash,
> for example, we want to cancel the eject.  It seems to me that we'll need some
> help from the driver core here.

There are three different approaches suggested for system device
hot-plug:
 A. Proceed within system device bus scan.
 B. Proceed within ACPI bus scan.
 C. Proceed with a sequence (as a mini-boot).

Option A uses system devices as tokens, option B uses acpi devices as
tokens, and option C uses resource tables as tokens, for their handlers.

Here is summary of key questions & answers so far.  I hope this
clarifies why I am suggesting option 3.

1. What are the system devices?
System devices provide system-wide core computing resources, which are
essential to compose a computer system.  System devices are not
connected to any particular standard buses.

2. Why are the system devices special?
The system devices are initialized during early boot-time, by multiple
subsystems, from the boot-up sequence, in pre-defined order.  They
provide low-level services to enable other subsystems to come up.

3. Why can't initialize the system devices from the driver structure at
boot?
The driver structure is initialized at the end of the boot sequence and
requires the low-level services from the system devices initialized
beforehand.

4. Why do we need a new common framework?
Sysfs CPU and memory on-lining/off-lining are performed within the CPU
and memory modules.  They are common code and do not depend on ACPI.
Therefore, a new common framework is necessary to integrate both
on-lining/off-lining operation and hot-plugging operation of system
devices into a single framework.

5. Why can't do everything with ACPI bus scan?
Software dependency among system devices may not be dictated by the ACPI
hierarchy.  For instance, memory should be initialized before CPUs (i.e.
a new cpu may need its local memory), but such ordering cannot be
guaranteed by the ACPI hierarchy.  Also, as described in 4,
online/offline operations are independent from ACPI.  

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-01-31 20:54       ` Rafael J. Wysocki
  2013-02-01  1:32         ` Toshi Kani
@ 2013-02-01  7:23         ` Greg KH
  2013-02-01 22:12           ` Rafael J. Wysocki
  1 sibling, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-01  7:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > memory / bridge / node devices that we have in the kernel.  Please use
> > > them, or give me a _really_ good reason why they will not work.
> > 
> > We cannot use the existing system devices or ACPI devices here.  During
> > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > device information in a platform-neutral way.  During hot-add, we first
> > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > but platform-neutral modules cannot use them as they are ACPI-specific.
> 
> But suppose we're smart and have ACPI scan handlers that will create
> "physical" device nodes for those devices during the ACPI namespace scan.
> Then, the platform-neutral nodes will be able to bind to those "physical"
> nodes.  Moreover, it should be possible to get a hierarchy of device objects
> this way that will reflect all of the dependencies we need to take into
> account during hot-add and hot-remove operations.  That may not be what we
> have today, but I don't see any *fundamental* obstacles preventing us from
> using this approach.

I would _much_ rather see that be the solution here as I think it is the
proper one.

> This is already done for PCI host bridges and platform devices and I don't
> see why we can't do that for the other types of devices too.

I agree.

> The only missing piece I see is a way to handle the "eject" problem, i.e.
> when we try do eject a device at the top of a subtree and need to tear down
> the entire subtree below it, but if that's going to lead to a system crash,
> for example, we want to cancel the eject.  It seems to me that we'll need some
> help from the driver core here.

I say do what we always have done here, if the user asked us to tear
something down, let it happen as they are the ones that know best :)

Seriously, I guess this gets back to the "fail disconnect" idea that the
ACPI developers keep harping on.  I thought we already resolved this
properly by having them implement it in their bus code, no reason the
same thing couldn't happen here, right?  I don't think the core needs to
do anything special, but if so, I'll be glad to review it.

thanks,

gre k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-01  1:32         ` Toshi Kani
@ 2013-02-01  7:30           ` Greg KH
  2013-02-01 20:40             ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-01  7:30 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
 > This is already done for PCI host bridges and platform devices and I don't
> > see why we can't do that for the other types of devices too.
> > 
> > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > when we try do eject a device at the top of a subtree and need to tear down
> > the entire subtree below it, but if that's going to lead to a system crash,
> > for example, we want to cancel the eject.  It seems to me that we'll need some
> > help from the driver core here.
> 
> There are three different approaches suggested for system device
> hot-plug:
>  A. Proceed within system device bus scan.
>  B. Proceed within ACPI bus scan.
>  C. Proceed with a sequence (as a mini-boot).
> 
> Option A uses system devices as tokens, option B uses acpi devices as
> tokens, and option C uses resource tables as tokens, for their handlers.
> 
> Here is summary of key questions & answers so far.  I hope this
> clarifies why I am suggesting option 3.
> 
> 1. What are the system devices?
> System devices provide system-wide core computing resources, which are
> essential to compose a computer system.  System devices are not
> connected to any particular standard buses.

Not a problem, lots of devices are not connected to any "particular
standard busses".  All this means is that system devices are connected
to the "system" bus, nothing more.

> 2. Why are the system devices special?
> The system devices are initialized during early boot-time, by multiple
> subsystems, from the boot-up sequence, in pre-defined order.  They
> provide low-level services to enable other subsystems to come up.

Sorry, no, that doesn't mean they are special, nothing here is unique
for the point of view of the driver model from any other device or bus.

> 3. Why can't initialize the system devices from the driver structure at
> boot?
> The driver structure is initialized at the end of the boot sequence and
> requires the low-level services from the system devices initialized
> beforehand.

Wait, what "driver structure"?  If you need to initialize the driver
core earlier, then do so.  Or, even better, just wait until enough of
the system has come up and then go initialize all of the devices you
have found so far as part of your boot process.

None of the above things you have stated seem to have anything to do
with your proposed patch, so I don't understand why you have mentioned
them...

> 4. Why do we need a new common framework?
> Sysfs CPU and memory on-lining/off-lining are performed within the CPU
> and memory modules.  They are common code and do not depend on ACPI.
> Therefore, a new common framework is necessary to integrate both
> on-lining/off-lining operation and hot-plugging operation of system
> devices into a single framework.

{sigh}

Removing and adding devices and handling hotplug operations is what the
driver core was written for, almost 10 years ago.  To somehow think that
your devices are "special" just because they don't use ACPI is odd,
because the driver core itself has nothing to do with ACPI.  Don't get
the current mix of x86 system code tied into ACPI confused with an
driver core issues here please.

> 5. Why can't do everything with ACPI bus scan?
> Software dependency among system devices may not be dictated by the ACPI
> hierarchy.  For instance, memory should be initialized before CPUs (i.e.
> a new cpu may need its local memory), but such ordering cannot be
> guaranteed by the ACPI hierarchy.  Also, as described in 4,
> online/offline operations are independent from ACPI.  

That's fine, the driver core is independant from ACPI.  I don't care how
you do the scaning of your devices, but I do care about you creating new
driver core pieces that duplicate the existing functionality of what we
have today.

In short, I like Rafael's proposal better, and I fail to see how
anything you have stated here would matter in how this is implemented. :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-01  7:30           ` Greg KH
@ 2013-02-01 20:40             ` Toshi Kani
  2013-02-01 22:21               ` Rafael J. Wysocki
  2013-02-02 15:01               ` Greg KH
  0 siblings, 2 replies; 83+ messages in thread
From: Toshi Kani @ 2013-02-01 20:40 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
>  > This is already done for PCI host bridges and platform devices and I don't
> > > see why we can't do that for the other types of devices too.
> > > 
> > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > when we try do eject a device at the top of a subtree and need to tear down
> > > the entire subtree below it, but if that's going to lead to a system crash,
> > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > help from the driver core here.
> > 
> > There are three different approaches suggested for system device
> > hot-plug:
> >  A. Proceed within system device bus scan.
> >  B. Proceed within ACPI bus scan.
> >  C. Proceed with a sequence (as a mini-boot).
> > 
> > Option A uses system devices as tokens, option B uses acpi devices as
> > tokens, and option C uses resource tables as tokens, for their handlers.
> > 
> > Here is summary of key questions & answers so far.  I hope this
> > clarifies why I am suggesting option 3.
> > 
> > 1. What are the system devices?
> > System devices provide system-wide core computing resources, which are
> > essential to compose a computer system.  System devices are not
> > connected to any particular standard buses.
> 
> Not a problem, lots of devices are not connected to any "particular
> standard busses".  All this means is that system devices are connected
> to the "system" bus, nothing more.

Can you give me a few examples of other devices that support hotplug and
are not connected to any particular buses?  I will investigate them to
see how they are managed to support hotplug.

> > 2. Why are the system devices special?
> > The system devices are initialized during early boot-time, by multiple
> > subsystems, from the boot-up sequence, in pre-defined order.  They
> > provide low-level services to enable other subsystems to come up.
> 
> Sorry, no, that doesn't mean they are special, nothing here is unique
> for the point of view of the driver model from any other device or bus.

I think system devices are unique in a sense that they are initialized
before drivers run.

> > 3. Why can't initialize the system devices from the driver structure at
> > boot?
> > The driver structure is initialized at the end of the boot sequence and
> > requires the low-level services from the system devices initialized
> > beforehand.
> 
> Wait, what "driver structure"?  

Sorry it was not clear.  cpu_dev_init() and memory_dev_init() are called
from driver_init() at the end of the boot sequence, and initialize
system/cpu and system/memory devices.  I assume they are the system bus
you are referring with option A.

> If you need to initialize the driver
> core earlier, then do so.  Or, even better, just wait until enough of
> the system has come up and then go initialize all of the devices you
> have found so far as part of your boot process.

They are pseudo drivers that provide sysfs entry points of cpu and
memory.  They do not actually initialize cpu and memory.  I do not think
initializing cpu and memory fits into the driver model either, since
drivers should run after cpu and memory are initialized.

> None of the above things you have stated seem to have anything to do
> with your proposed patch, so I don't understand why you have mentioned
> them...

You suggested option A before, which uses system bus scan to initialize
all system devices at boot time as well as hot-plug.  I tried to say
that this option would not be doable.

> > 4. Why do we need a new common framework?
> > Sysfs CPU and memory on-lining/off-lining are performed within the CPU
> > and memory modules.  They are common code and do not depend on ACPI.
> > Therefore, a new common framework is necessary to integrate both
> > on-lining/off-lining operation and hot-plugging operation of system
> > devices into a single framework.
> 
> {sigh}
> 
> Removing and adding devices and handling hotplug operations is what the
> driver core was written for, almost 10 years ago.  To somehow think that
> your devices are "special" just because they don't use ACPI is odd,
> because the driver core itself has nothing to do with ACPI.  Don't get
> the current mix of x86 system code tied into ACPI confused with an
> driver core issues here please.

CPU online/offline operation is performed within the CPU module.  Memory
online/offline operation is performed within the memory module.  CPU and
memory hotplug operations are performed within ACPI.  While they deal
with the same set of devices, they operate independently and are not
managed under a same framework.

I agree with you that not using ACPI is perfectly fine.  My point is
that ACPI framework won't be able to manage operations that do not use
ACPI.

> > 5. Why can't do everything with ACPI bus scan?
> > Software dependency among system devices may not be dictated by the ACPI
> > hierarchy.  For instance, memory should be initialized before CPUs (i.e.
> > a new cpu may need its local memory), but such ordering cannot be
> > guaranteed by the ACPI hierarchy.  Also, as described in 4,
> > online/offline operations are independent from ACPI.  
> 
> That's fine, the driver core is independant from ACPI.  I don't care how
> you do the scaning of your devices, but I do care about you creating new
> driver core pieces that duplicate the existing functionality of what we
> have today.
>
> In short, I like Rafael's proposal better, and I fail to see how
> anything you have stated here would matter in how this is implemented. :)

Doing everything within ACPI means we can only manage ACPI hotplug
operations, not online/offline operations.  But I understand that you
concern about adding a new framework with option C.  It is good to know
that you are fine with option B. :)  So, I will step back, and think
about what we can do within ACPI.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-01  7:23         ` Greg KH
@ 2013-02-01 22:12           ` Rafael J. Wysocki
  2013-02-02 14:58             ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-01 22:12 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > them, or give me a _really_ good reason why they will not work.
> > > 
> > > We cannot use the existing system devices or ACPI devices here.  During
> > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > device information in a platform-neutral way.  During hot-add, we first
> > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > 
> > But suppose we're smart and have ACPI scan handlers that will create
> > "physical" device nodes for those devices during the ACPI namespace scan.
> > Then, the platform-neutral nodes will be able to bind to those "physical"
> > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > this way that will reflect all of the dependencies we need to take into
> > account during hot-add and hot-remove operations.  That may not be what we
> > have today, but I don't see any *fundamental* obstacles preventing us from
> > using this approach.
> 
> I would _much_ rather see that be the solution here as I think it is the
> proper one.
> 
> > This is already done for PCI host bridges and platform devices and I don't
> > see why we can't do that for the other types of devices too.
> 
> I agree.
> 
> > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > when we try do eject a device at the top of a subtree and need to tear down
> > the entire subtree below it, but if that's going to lead to a system crash,
> > for example, we want to cancel the eject.  It seems to me that we'll need some
> > help from the driver core here.
> 
> I say do what we always have done here, if the user asked us to tear
> something down, let it happen as they are the ones that know best :)
> 
> Seriously, I guess this gets back to the "fail disconnect" idea that the
> ACPI developers keep harping on.  I thought we already resolved this
> properly by having them implement it in their bus code, no reason the
> same thing couldn't happen here, right?

Not really. :-)  We haven't ever resolved that particular issue I'm afraid.

> I don't think the core needs to do anything special, but if so, I'll be glad
> to review it.

OK, so this is the use case.  We have "eject" defined for something like
a container with a number of CPU cores, PCI host bridge, and a memory
controller under it.  And a few pretty much arbitrary I/O devices as a bonus.

Now, there's a button on the system case labeled as "Eject" and if that button
is pressed, we're supposed to _try_ to eject all of those things at once.  We
are allowed to fail that request, though, if that's problematic for some
reason, but we're supposed to let the BIOS know about that.

Do you seriously think that if that button is pressed, we should just proceed
with removing all that stuff no matter what?  That'd be kind of like Russian
roulette for whoever pressed that button, because s/he could only press it and
wait for the system to either crash or not.  Or maybe to crash a bit later
because of some delayed stuff that would hit one of those devices that had just
gone.  Surely not a situation any admin of a high-availability system would
like to be in. :-)

Quite frankly, I have no idea how that can be addressed in a single bus type,
let alone ACPI (which is not even a proper bus type, just something pretending
to be one).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-01 20:40             ` Toshi Kani
@ 2013-02-01 22:21               ` Rafael J. Wysocki
  2013-02-01 23:12                 ` Toshi Kani
  2013-02-02 15:01               ` Greg KH
  1 sibling, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-01 22:21 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Friday, February 01, 2013 01:40:10 PM Toshi Kani wrote:
> On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> >  > This is already done for PCI host bridges and platform devices and I don't
> > > > see why we can't do that for the other types of devices too.
> > > > 
> > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > help from the driver core here.
> > > 
> > > There are three different approaches suggested for system device
> > > hot-plug:
> > >  A. Proceed within system device bus scan.
> > >  B. Proceed within ACPI bus scan.
> > >  C. Proceed with a sequence (as a mini-boot).
> > > 
> > > Option A uses system devices as tokens, option B uses acpi devices as
> > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > 
> > > Here is summary of key questions & answers so far.  I hope this
> > > clarifies why I am suggesting option 3.
> > > 
> > > 1. What are the system devices?
> > > System devices provide system-wide core computing resources, which are
> > > essential to compose a computer system.  System devices are not
> > > connected to any particular standard buses.
> > 
> > Not a problem, lots of devices are not connected to any "particular
> > standard busses".  All this means is that system devices are connected
> > to the "system" bus, nothing more.
> 
> Can you give me a few examples of other devices that support hotplug and
> are not connected to any particular buses?  I will investigate them to
> see how they are managed to support hotplug.
> 
> > > 2. Why are the system devices special?
> > > The system devices are initialized during early boot-time, by multiple
> > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > provide low-level services to enable other subsystems to come up.
> > 
> > Sorry, no, that doesn't mean they are special, nothing here is unique
> > for the point of view of the driver model from any other device or bus.
> 
> I think system devices are unique in a sense that they are initialized
> before drivers run.
> 
> > > 3. Why can't initialize the system devices from the driver structure at
> > > boot?
> > > The driver structure is initialized at the end of the boot sequence and
> > > requires the low-level services from the system devices initialized
> > > beforehand.
> > 
> > Wait, what "driver structure"?  
> 
> Sorry it was not clear.  cpu_dev_init() and memory_dev_init() are called
> from driver_init() at the end of the boot sequence, and initialize
> system/cpu and system/memory devices.  I assume they are the system bus
> you are referring with option A.
> 
> > If you need to initialize the driver
> > core earlier, then do so.  Or, even better, just wait until enough of
> > the system has come up and then go initialize all of the devices you
> > have found so far as part of your boot process.
> 
> They are pseudo drivers that provide sysfs entry points of cpu and
> memory.  They do not actually initialize cpu and memory.  I do not think
> initializing cpu and memory fits into the driver model either, since
> drivers should run after cpu and memory are initialized.
> 
> > None of the above things you have stated seem to have anything to do
> > with your proposed patch, so I don't understand why you have mentioned
> > them...
> 
> You suggested option A before, which uses system bus scan to initialize
> all system devices at boot time as well as hot-plug.  I tried to say
> that this option would not be doable.
> 
> > > 4. Why do we need a new common framework?
> > > Sysfs CPU and memory on-lining/off-lining are performed within the CPU
> > > and memory modules.  They are common code and do not depend on ACPI.
> > > Therefore, a new common framework is necessary to integrate both
> > > on-lining/off-lining operation and hot-plugging operation of system
> > > devices into a single framework.
> > 
> > {sigh}
> > 
> > Removing and adding devices and handling hotplug operations is what the
> > driver core was written for, almost 10 years ago.  To somehow think that
> > your devices are "special" just because they don't use ACPI is odd,
> > because the driver core itself has nothing to do with ACPI.  Don't get
> > the current mix of x86 system code tied into ACPI confused with an
> > driver core issues here please.
> 
> CPU online/offline operation is performed within the CPU module.  Memory
> online/offline operation is performed within the memory module.  CPU and
> memory hotplug operations are performed within ACPI.  While they deal
> with the same set of devices, they operate independently and are not
> managed under a same framework.
> 
> I agree with you that not using ACPI is perfectly fine.  My point is
> that ACPI framework won't be able to manage operations that do not use
> ACPI.
> 
> > > 5. Why can't do everything with ACPI bus scan?
> > > Software dependency among system devices may not be dictated by the ACPI
> > > hierarchy.  For instance, memory should be initialized before CPUs (i.e.
> > > a new cpu may need its local memory), but such ordering cannot be
> > > guaranteed by the ACPI hierarchy.  Also, as described in 4,
> > > online/offline operations are independent from ACPI.  
> > 
> > That's fine, the driver core is independant from ACPI.  I don't care how
> > you do the scaning of your devices, but I do care about you creating new
> > driver core pieces that duplicate the existing functionality of what we
> > have today.
> >
> > In short, I like Rafael's proposal better, and I fail to see how
> > anything you have stated here would matter in how this is implemented. :)
> 
> Doing everything within ACPI means we can only manage ACPI hotplug
> operations, not online/offline operations.  But I understand that you
> concern about adding a new framework with option C.  It is good to know
> that you are fine with option B. :)  So, I will step back, and think
> about what we can do within ACPI.

Not much, because ACPI only knows about a subset of devices that may be
involved in that, and a limited one for that matter.  For one example,
anything connected through PCI and not having a corresponding ACPI object (i.e.
pretty much every add-in card in existence) will be unknown to ACPI.  And
say one of these things is a SATA controller with a number of disks under it
and so on.  ACPI won't even know that it exists.  Moreover, PCI won't know
that those disks exist.  Etc.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-01 22:21               ` Rafael J. Wysocki
@ 2013-02-01 23:12                 ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-02-01 23:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Fri, 2013-02-01 at 23:21 +0100, Rafael J. Wysocki wrote:
> On Friday, February 01, 2013 01:40:10 PM Toshi Kani wrote:
> > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > see why we can't do that for the other types of devices too.
> > > > > 
> > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > help from the driver core here.
> > > > 
> > > > There are three different approaches suggested for system device
> > > > hot-plug:
> > > >  A. Proceed within system device bus scan.
> > > >  B. Proceed within ACPI bus scan.
> > > >  C. Proceed with a sequence (as a mini-boot).
> > > > 
> > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > 
> > > > Here is summary of key questions & answers so far.  I hope this
> > > > clarifies why I am suggesting option 3.
> > > > 
> > > > 1. What are the system devices?
> > > > System devices provide system-wide core computing resources, which are
> > > > essential to compose a computer system.  System devices are not
> > > > connected to any particular standard buses.
> > > 
> > > Not a problem, lots of devices are not connected to any "particular
> > > standard busses".  All this means is that system devices are connected
> > > to the "system" bus, nothing more.
> > 
> > Can you give me a few examples of other devices that support hotplug and
> > are not connected to any particular buses?  I will investigate them to
> > see how they are managed to support hotplug.
> > 
> > > > 2. Why are the system devices special?
> > > > The system devices are initialized during early boot-time, by multiple
> > > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > > provide low-level services to enable other subsystems to come up.
> > > 
> > > Sorry, no, that doesn't mean they are special, nothing here is unique
> > > for the point of view of the driver model from any other device or bus.
> > 
> > I think system devices are unique in a sense that they are initialized
> > before drivers run.
> > 
> > > > 3. Why can't initialize the system devices from the driver structure at
> > > > boot?
> > > > The driver structure is initialized at the end of the boot sequence and
> > > > requires the low-level services from the system devices initialized
> > > > beforehand.
> > > 
> > > Wait, what "driver structure"?  
> > 
> > Sorry it was not clear.  cpu_dev_init() and memory_dev_init() are called
> > from driver_init() at the end of the boot sequence, and initialize
> > system/cpu and system/memory devices.  I assume they are the system bus
> > you are referring with option A.
> > 
> > > If you need to initialize the driver
> > > core earlier, then do so.  Or, even better, just wait until enough of
> > > the system has come up and then go initialize all of the devices you
> > > have found so far as part of your boot process.
> > 
> > They are pseudo drivers that provide sysfs entry points of cpu and
> > memory.  They do not actually initialize cpu and memory.  I do not think
> > initializing cpu and memory fits into the driver model either, since
> > drivers should run after cpu and memory are initialized.
> > 
> > > None of the above things you have stated seem to have anything to do
> > > with your proposed patch, so I don't understand why you have mentioned
> > > them...
> > 
> > You suggested option A before, which uses system bus scan to initialize
> > all system devices at boot time as well as hot-plug.  I tried to say
> > that this option would not be doable.
> > 
> > > > 4. Why do we need a new common framework?
> > > > Sysfs CPU and memory on-lining/off-lining are performed within the CPU
> > > > and memory modules.  They are common code and do not depend on ACPI.
> > > > Therefore, a new common framework is necessary to integrate both
> > > > on-lining/off-lining operation and hot-plugging operation of system
> > > > devices into a single framework.
> > > 
> > > {sigh}
> > > 
> > > Removing and adding devices and handling hotplug operations is what the
> > > driver core was written for, almost 10 years ago.  To somehow think that
> > > your devices are "special" just because they don't use ACPI is odd,
> > > because the driver core itself has nothing to do with ACPI.  Don't get
> > > the current mix of x86 system code tied into ACPI confused with an
> > > driver core issues here please.
> > 
> > CPU online/offline operation is performed within the CPU module.  Memory
> > online/offline operation is performed within the memory module.  CPU and
> > memory hotplug operations are performed within ACPI.  While they deal
> > with the same set of devices, they operate independently and are not
> > managed under a same framework.
> > 
> > I agree with you that not using ACPI is perfectly fine.  My point is
> > that ACPI framework won't be able to manage operations that do not use
> > ACPI.
> > 
> > > > 5. Why can't do everything with ACPI bus scan?
> > > > Software dependency among system devices may not be dictated by the ACPI
> > > > hierarchy.  For instance, memory should be initialized before CPUs (i.e.
> > > > a new cpu may need its local memory), but such ordering cannot be
> > > > guaranteed by the ACPI hierarchy.  Also, as described in 4,
> > > > online/offline operations are independent from ACPI.  
> > > 
> > > That's fine, the driver core is independant from ACPI.  I don't care how
> > > you do the scaning of your devices, but I do care about you creating new
> > > driver core pieces that duplicate the existing functionality of what we
> > > have today.
> > >
> > > In short, I like Rafael's proposal better, and I fail to see how
> > > anything you have stated here would matter in how this is implemented. :)
> > 
> > Doing everything within ACPI means we can only manage ACPI hotplug
> > operations, not online/offline operations.  But I understand that you
> > concern about adding a new framework with option C.  It is good to know
> > that you are fine with option B. :)  So, I will step back, and think
> > about what we can do within ACPI.
> 
> Not much, because ACPI only knows about a subset of devices that may be
> involved in that, and a limited one for that matter.  For one example,
> anything connected through PCI and not having a corresponding ACPI object (i.e.
> pretty much every add-in card in existence) will be unknown to ACPI.  And
> say one of these things is a SATA controller with a number of disks under it
> and so on.  ACPI won't even know that it exists.  Moreover, PCI won't know
> that those disks exist.  Etc.

Agreed.  Thanks for bringing I/Os into the picture.  I did not mention
them since they have not supported in this patchset, but we certainly
need to consider them into the design.

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-01 22:12           ` Rafael J. Wysocki
@ 2013-02-02 14:58             ` Greg KH
  2013-02-02 20:15               ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-02 14:58 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > them, or give me a _really_ good reason why they will not work.
> > > > 
> > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > device information in a platform-neutral way.  During hot-add, we first
> > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > 
> > > But suppose we're smart and have ACPI scan handlers that will create
> > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > this way that will reflect all of the dependencies we need to take into
> > > account during hot-add and hot-remove operations.  That may not be what we
> > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > using this approach.
> > 
> > I would _much_ rather see that be the solution here as I think it is the
> > proper one.
> > 
> > > This is already done for PCI host bridges and platform devices and I don't
> > > see why we can't do that for the other types of devices too.
> > 
> > I agree.
> > 
> > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > when we try do eject a device at the top of a subtree and need to tear down
> > > the entire subtree below it, but if that's going to lead to a system crash,
> > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > help from the driver core here.
> > 
> > I say do what we always have done here, if the user asked us to tear
> > something down, let it happen as they are the ones that know best :)
> > 
> > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > ACPI developers keep harping on.  I thought we already resolved this
> > properly by having them implement it in their bus code, no reason the
> > same thing couldn't happen here, right?
> 
> Not really. :-)  We haven't ever resolved that particular issue I'm afraid.

Ah, I didn't realize that.

> > I don't think the core needs to do anything special, but if so, I'll be glad
> > to review it.
> 
> OK, so this is the use case.  We have "eject" defined for something like
> a container with a number of CPU cores, PCI host bridge, and a memory
> controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> 
> Now, there's a button on the system case labeled as "Eject" and if that button
> is pressed, we're supposed to _try_ to eject all of those things at once.  We
> are allowed to fail that request, though, if that's problematic for some
> reason, but we're supposed to let the BIOS know about that.
> 
> Do you seriously think that if that button is pressed, we should just proceed
> with removing all that stuff no matter what?  That'd be kind of like Russian
> roulette for whoever pressed that button, because s/he could only press it and
> wait for the system to either crash or not.  Or maybe to crash a bit later
> because of some delayed stuff that would hit one of those devices that had just
> gone.  Surely not a situation any admin of a high-availability system would
> like to be in. :-)
> 
> Quite frankly, I have no idea how that can be addressed in a single bus type,
> let alone ACPI (which is not even a proper bus type, just something pretending
> to be one).

You don't have it as a single bus type, you have a controller somewhere,
off of the bus being destroyed, that handles sending remove events to
the device and tearing everything down.  PCI does this from the very
beginning.

I know it's more complicated with these types of devices, and I think we
are getting closer to the correct solution, I just don't want to ever
see duplicate devices in the driver model for the same physical device.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-01 20:40             ` Toshi Kani
  2013-02-01 22:21               ` Rafael J. Wysocki
@ 2013-02-02 15:01               ` Greg KH
  2013-02-04  0:28                 ` Toshi Kani
  1 sibling, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-02 15:01 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> >  > This is already done for PCI host bridges and platform devices and I don't
> > > > see why we can't do that for the other types of devices too.
> > > > 
> > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > help from the driver core here.
> > > 
> > > There are three different approaches suggested for system device
> > > hot-plug:
> > >  A. Proceed within system device bus scan.
> > >  B. Proceed within ACPI bus scan.
> > >  C. Proceed with a sequence (as a mini-boot).
> > > 
> > > Option A uses system devices as tokens, option B uses acpi devices as
> > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > 
> > > Here is summary of key questions & answers so far.  I hope this
> > > clarifies why I am suggesting option 3.
> > > 
> > > 1. What are the system devices?
> > > System devices provide system-wide core computing resources, which are
> > > essential to compose a computer system.  System devices are not
> > > connected to any particular standard buses.
> > 
> > Not a problem, lots of devices are not connected to any "particular
> > standard busses".  All this means is that system devices are connected
> > to the "system" bus, nothing more.
> 
> Can you give me a few examples of other devices that support hotplug and
> are not connected to any particular buses?  I will investigate them to
> see how they are managed to support hotplug.

Any device that is attached to any bus in the driver model can be
hotunplugged from userspace by telling it to be "unbound" from the
driver controlling it.  Try it for any platform device in your system to
see how it happens.

> > > 2. Why are the system devices special?
> > > The system devices are initialized during early boot-time, by multiple
> > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > provide low-level services to enable other subsystems to come up.
> > 
> > Sorry, no, that doesn't mean they are special, nothing here is unique
> > for the point of view of the driver model from any other device or bus.
> 
> I think system devices are unique in a sense that they are initialized
> before drivers run.

No, most all devices are "initialized" before a driver runs on it, USB
is one such example, PCI another, and I'm pretty sure that there are
others.

> > If you need to initialize the driver
> > core earlier, then do so.  Or, even better, just wait until enough of
> > the system has come up and then go initialize all of the devices you
> > have found so far as part of your boot process.
> 
> They are pseudo drivers that provide sysfs entry points of cpu and
> memory.  They do not actually initialize cpu and memory.  I do not think
> initializing cpu and memory fits into the driver model either, since
> drivers should run after cpu and memory are initialized.

We already represent CPUs in the sysfs tree, don't represent them in two
different places with two different structures.  Use the existing ones
please.

> > None of the above things you have stated seem to have anything to do
> > with your proposed patch, so I don't understand why you have mentioned
> > them...
> 
> You suggested option A before, which uses system bus scan to initialize
> all system devices at boot time as well as hot-plug.  I tried to say
> that this option would not be doable.

I haven't yet been convinced otherwise, sorry.  Please prove me wrong :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-02 14:58             ` Greg KH
@ 2013-02-02 20:15               ` Rafael J. Wysocki
  2013-02-02 22:18                 ` [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework) Rafael J. Wysocki
                                   ` (2 more replies)
  0 siblings, 3 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-02 20:15 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> > On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > > them, or give me a _really_ good reason why they will not work.
> > > > > 
> > > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > > device information in a platform-neutral way.  During hot-add, we first
> > > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > > 
> > > > But suppose we're smart and have ACPI scan handlers that will create
> > > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > > this way that will reflect all of the dependencies we need to take into
> > > > account during hot-add and hot-remove operations.  That may not be what we
> > > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > > using this approach.
> > > 
> > > I would _much_ rather see that be the solution here as I think it is the
> > > proper one.
> > > 
> > > > This is already done for PCI host bridges and platform devices and I don't
> > > > see why we can't do that for the other types of devices too.
> > > 
> > > I agree.
> > > 
> > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > help from the driver core here.
> > > 
> > > I say do what we always have done here, if the user asked us to tear
> > > something down, let it happen as they are the ones that know best :)
> > > 
> > > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > > ACPI developers keep harping on.  I thought we already resolved this
> > > properly by having them implement it in their bus code, no reason the
> > > same thing couldn't happen here, right?
> > 
> > Not really. :-)  We haven't ever resolved that particular issue I'm afraid.
> 
> Ah, I didn't realize that.
> 
> > > I don't think the core needs to do anything special, but if so, I'll be glad
> > > to review it.
> > 
> > OK, so this is the use case.  We have "eject" defined for something like
> > a container with a number of CPU cores, PCI host bridge, and a memory
> > controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> > 
> > Now, there's a button on the system case labeled as "Eject" and if that button
> > is pressed, we're supposed to _try_ to eject all of those things at once.  We
> > are allowed to fail that request, though, if that's problematic for some
> > reason, but we're supposed to let the BIOS know about that.
> > 
> > Do you seriously think that if that button is pressed, we should just proceed
> > with removing all that stuff no matter what?  That'd be kind of like Russian
> > roulette for whoever pressed that button, because s/he could only press it and
> > wait for the system to either crash or not.  Or maybe to crash a bit later
> > because of some delayed stuff that would hit one of those devices that had just
> > gone.  Surely not a situation any admin of a high-availability system would
> > like to be in. :-)
> > 
> > Quite frankly, I have no idea how that can be addressed in a single bus type,
> > let alone ACPI (which is not even a proper bus type, just something pretending
> > to be one).
> 
> You don't have it as a single bus type, you have a controller somewhere,
> off of the bus being destroyed, that handles sending remove events to
> the device and tearing everything down.  PCI does this from the very
> beginning.

Yes, but those are just remove events and we can only see how destructive they
were after the removal.  The point is to be able to figure out whether or not
we *want* to do the removal in the first place.

Say you have a computing node which signals a hardware problem in a processor
package (the container with CPU cores, memory, PCI host bridge etc.).  You
may want to eject that package, but you don't want to kill the system this
way.  So if the eject is doable, it is very much desirable to do it, but if it
is not doable, you'd rather shut the box down and do the replacement afterward.
That may be costly, however (maybe weeks of computations), so it should be
avoided if possible, but not at the expense of crashing the box if the eject
doesn't work out.

> I know it's more complicated with these types of devices, and I think we
> are getting closer to the correct solution, I just don't want to ever
> see duplicate devices in the driver model for the same physical device.

Do you mean two things based on struct device for the same hardware component?
That's been happening already pretty much forever for every PCI device known
to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
rather should be, but we're going to clean that up) only for convenience (to be
able to see the namespace structure and related things in sysfs).  So the stuff
under /sys/devices/LNXSYSTM\:00/ is not "real".  In my view it shouldn't even
be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
but that may be difficult to change without breaking user space (maybe we can
just symlink it from /sys/devices/ or something).  And the ACPI bus type
shouldn't even exist in my opinion.

There's much confusion in there and much work to clean that up, I agree, but
that's kind of separate from the hotplug thing.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework)
  2013-02-02 20:15               ` Rafael J. Wysocki
@ 2013-02-02 22:18                 ` Rafael J. Wysocki
  2013-02-04  1:24                   ` Greg KH
  2013-02-03 20:44                 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Rafael J. Wysocki
  2013-02-04  1:23                 ` Greg KH
  2 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-02 22:18 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Saturday, February 02, 2013 09:15:37 PM Rafael J. Wysocki wrote:
> On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
[...]
> 
> > I know it's more complicated with these types of devices, and I think we
> > are getting closer to the correct solution, I just don't want to ever
> > see duplicate devices in the driver model for the same physical device.
> 
> Do you mean two things based on struct device for the same hardware component?
> That's been happening already pretty much forever for every PCI device known
> to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> rather should be, but we're going to clean that up) only for convenience (to be
> able to see the namespace structure and related things in sysfs).  So the stuff
> under /sys/devices/LNXSYSTM\:00/ is not "real".  In my view it shouldn't even
> be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
> but that may be difficult to change without breaking user space (maybe we can
> just symlink it from /sys/devices/ or something).  And the ACPI bus type
> shouldn't even exist in my opinion.

Well, well.

In fact, the appended patch moves the whole ACPI device nodes tree under
/sys/firmware/acpi/ and I'm not seeing any negative consequences of that on my
test box (events work and so on).  User space is quite new on it, though, and
the patch is hackish.

Still ...


---
Prototype, no sign-off
---
 drivers/acpi/scan.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-pm/drivers/acpi/scan.c
===================================================================
--- linux-pm.orig/drivers/acpi/scan.c
+++ linux-pm/drivers/acpi/scan.c
@@ -1443,6 +1443,8 @@ void acpi_init_device_object(struct acpi
 	device->flags.match_driver = false;
 	device_initialize(&device->dev);
 	dev_set_uevent_suppress(&device->dev, true);
+	if (handle == ACPI_ROOT_OBJECT)
+		device->dev.kobj.parent = acpi_kobj;
 }
 
 void acpi_device_add_finalize(struct acpi_device *device)



-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-02 20:15               ` Rafael J. Wysocki
  2013-02-02 22:18                 ` [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework) Rafael J. Wysocki
@ 2013-02-03 20:44                 ` Rafael J. Wysocki
  2013-02-04 12:48                   ` Greg KH
  2013-02-04  1:23                 ` Greg KH
  2 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-03 20:44 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Saturday, February 02, 2013 09:15:37 PM Rafael J. Wysocki wrote:
> On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> > > On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > > > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > > > them, or give me a _really_ good reason why they will not work.
> > > > > > 
> > > > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > > > device information in a platform-neutral way.  During hot-add, we first
> > > > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > > > 
> > > > > But suppose we're smart and have ACPI scan handlers that will create
> > > > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > > > this way that will reflect all of the dependencies we need to take into
> > > > > account during hot-add and hot-remove operations.  That may not be what we
> > > > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > > > using this approach.
> > > > 
> > > > I would _much_ rather see that be the solution here as I think it is the
> > > > proper one.
> > > > 
> > > > > This is already done for PCI host bridges and platform devices and I don't
> > > > > see why we can't do that for the other types of devices too.
> > > > 
> > > > I agree.
> > > > 
> > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > help from the driver core here.
> > > > 
> > > > I say do what we always have done here, if the user asked us to tear
> > > > something down, let it happen as they are the ones that know best :)
> > > > 
> > > > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > > > ACPI developers keep harping on.  I thought we already resolved this
> > > > properly by having them implement it in their bus code, no reason the
> > > > same thing couldn't happen here, right?
> > > 
> > > Not really. :-)  We haven't ever resolved that particular issue I'm afraid.
> > 
> > Ah, I didn't realize that.
> > 
> > > > I don't think the core needs to do anything special, but if so, I'll be glad
> > > > to review it.
> > > 
> > > OK, so this is the use case.  We have "eject" defined for something like
> > > a container with a number of CPU cores, PCI host bridge, and a memory
> > > controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> > > 
> > > Now, there's a button on the system case labeled as "Eject" and if that button
> > > is pressed, we're supposed to _try_ to eject all of those things at once.  We
> > > are allowed to fail that request, though, if that's problematic for some
> > > reason, but we're supposed to let the BIOS know about that.
> > > 
> > > Do you seriously think that if that button is pressed, we should just proceed
> > > with removing all that stuff no matter what?  That'd be kind of like Russian
> > > roulette for whoever pressed that button, because s/he could only press it and
> > > wait for the system to either crash or not.  Or maybe to crash a bit later
> > > because of some delayed stuff that would hit one of those devices that had just
> > > gone.  Surely not a situation any admin of a high-availability system would
> > > like to be in. :-)
> > > 
> > > Quite frankly, I have no idea how that can be addressed in a single bus type,
> > > let alone ACPI (which is not even a proper bus type, just something pretending
> > > to be one).
> > 
> > You don't have it as a single bus type, you have a controller somewhere,
> > off of the bus being destroyed, that handles sending remove events to
> > the device and tearing everything down.  PCI does this from the very
> > beginning.
> 
> Yes, but those are just remove events and we can only see how destructive they
> were after the removal.  The point is to be able to figure out whether or not
> we *want* to do the removal in the first place.
> 
> Say you have a computing node which signals a hardware problem in a processor
> package (the container with CPU cores, memory, PCI host bridge etc.).  You
> may want to eject that package, but you don't want to kill the system this
> way.  So if the eject is doable, it is very much desirable to do it, but if it
> is not doable, you'd rather shut the box down and do the replacement afterward.
> That may be costly, however (maybe weeks of computations), so it should be
> avoided if possible, but not at the expense of crashing the box if the eject
> doesn't work out.

It seems to me that we could handle that with the help of a new flag, say
"no_eject", in struct device, a global mutex, and a function that will walk
the given subtree of the device hierarchy and check if "no_eject" is set for
any devices in there.  Plus a global "no_eject" switch, perhaps.

To be more precise, suppose we have a "no_eject" flag that is set for a device
when the kernel is about to start using it in such a way that ejecting it would
lead to serious trouble (i.e. it is a memory module holding the kernel's page
tables or something like that).  Next, suppose that we have a function called,
say, "device_may_eject()" that will walk the subtree of the device hierarchy
starting at the given node and return false if any of the devices in there have
"no_eject" set.  Further, require that device_may_eject() has to be called
under a global mutex called something like "device_eject_lock".  Then, we can
arrange things as follows:

1. When a device is about to be used for such purposes that it shouldn't be
   ejected, the relevant code should:
  (a) Acquire device_eject_lock.
  (b) Check if the device is still there and go to (e) if not.
  (c) Do whatever needs to be done to the device.
  (d) Set the device's no_eject flag.
  (e) Release device_eject_lock.

2. When an eject operation is about to be carried out on a subtree of the device
   hierarchy, the eject code (be it ACPI or something else) should:
  (a) Acquire device_eject_lock.
  (b) Call device_may_eject() on the starting device and go to (d) if it
      returns false.
  (c) Carry out the eject (that includes calling .remove() from all of the
      involved drivers in partiular).
  (d) Release device_eject_lock.

3. When it is OK to eject the device again, the relevant code should just clear
   its "no_eject" flag under device_eject_lock.

If we want to synchronize that with such things like boot or system suspend,
a global "no_eject" switch can be used for that (it needs to be manipulated
under device_eject_lock) and one more step (check the global "no_eject") is
needed between 2(a) and 2(b).

Does it look like a reasonable approach?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-02 15:01               ` Greg KH
@ 2013-02-04  0:28                 ` Toshi Kani
  2013-02-04 12:46                   ` Greg KH
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04  0:28 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > see why we can't do that for the other types of devices too.
> > > > > 
> > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > help from the driver core here.
> > > > 
> > > > There are three different approaches suggested for system device
> > > > hot-plug:
> > > >  A. Proceed within system device bus scan.
> > > >  B. Proceed within ACPI bus scan.
> > > >  C. Proceed with a sequence (as a mini-boot).
> > > > 
> > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > 
> > > > Here is summary of key questions & answers so far.  I hope this
> > > > clarifies why I am suggesting option 3.
> > > > 
> > > > 1. What are the system devices?
> > > > System devices provide system-wide core computing resources, which are
> > > > essential to compose a computer system.  System devices are not
> > > > connected to any particular standard buses.
> > > 
> > > Not a problem, lots of devices are not connected to any "particular
> > > standard busses".  All this means is that system devices are connected
> > > to the "system" bus, nothing more.
> > 
> > Can you give me a few examples of other devices that support hotplug and
> > are not connected to any particular buses?  I will investigate them to
> > see how they are managed to support hotplug.
> 
> Any device that is attached to any bus in the driver model can be
> hotunplugged from userspace by telling it to be "unbound" from the
> driver controlling it.  Try it for any platform device in your system to
> see how it happens.

The unbind operation, as I understand from you, is to detach a driver
from a device.  Yes, unbinding can be done for any devices.  It is
however different from hot-plug operation, which unplugs a device.

Today, the unbind operation to an ACPI cpu/memory devices causes
hot-unplug (offline) operation to them, which is one of the major issues
for us since unbind cannot fail.  This patchset addresses this issue by
making the unbind operation of ACPI cpu/memory devices to do the
unbinding only.  ACPI drivers no longer control cpu and memory as they
are supposed to be controlled by their drivers, cpu and memory modules.
The current hotplug code requires putting all device control stuff into
ACPI, which this patchset is trying to fix it.


> > > > 2. Why are the system devices special?
> > > > The system devices are initialized during early boot-time, by multiple
> > > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > > provide low-level services to enable other subsystems to come up.
> > > 
> > > Sorry, no, that doesn't mean they are special, nothing here is unique
> > > for the point of view of the driver model from any other device or bus.
> > 
> > I think system devices are unique in a sense that they are initialized
> > before drivers run.
> 
> No, most all devices are "initialized" before a driver runs on it, USB
> is one such example, PCI another, and I'm pretty sure that there are
> others.

USB devices can be initialized after the USB bus driver is initialized.
Similarly, PCI devices can be initialized after the PCI bus driver is
initialized.  However, CPU and memory are initialized without any
dependency to their bus driver since there is no such thing.

In addition, CPU and memory have two drivers -- their actual
drivers/subsystems and their ACPI drivers.  Their actual
drivers/subsystems initialize cpu and memory.  The ACPI drivers
initialize driver-specific data of their ACPI nodes.  During hot-plug
operation, however, the current code requires these ACPI drivers to do
their hot-plug operations.  This patchset keeps them separated during
hot-plug and let their actual drivers/subsystems to do the job.


> > > If you need to initialize the driver
> > > core earlier, then do so.  Or, even better, just wait until enough of
> > > the system has come up and then go initialize all of the devices you
> > > have found so far as part of your boot process.
> > 
> > They are pseudo drivers that provide sysfs entry points of cpu and
> > memory.  They do not actually initialize cpu and memory.  I do not think
> > initializing cpu and memory fits into the driver model either, since
> > drivers should run after cpu and memory are initialized.
> 
> We already represent CPUs in the sysfs tree, don't represent them in two
> different places with two different structures.  Use the existing ones
> please.

This patchset does not make any changes to sysfs.  It does however make
the system drivers to control the system devices, and ACPI to do the
ACPI stuff only.

The boot sequence calls the following steps to initialize memory and
cpu, as well as their acpi nodes.  These steps are independent, i.e. #1
and #2 run without ACPI.

 1. mm -> initialize memory -> create sysfs system/memory
 2. smp/cpu -> initialize cpu -> create sysfs system/cpu
 3. acpi core -> acpi scan -> create sysfs acpi nodes

While there are 3 separate steps at boot, the current hotplug code tries
to do everything in #3.  Therefore, this patchset provides a sequencer
(which is similar to the boot sequence) to run these steps during a
hot-plug operation as well.  Hence, we have consistent steps and role
mode between boot and hot-plug operations.


> > > None of the above things you have stated seem to have anything to do
> > > with your proposed patch, so I don't understand why you have mentioned
> > > them...
> > 
> > You suggested option A before, which uses system bus scan to initialize
> > all system devices at boot time as well as hot-plug.  I tried to say
> > that this option would not be doable.
> 
> I haven't yet been convinced otherwise, sorry.  Please prove me wrong :)

This patchset enables the system drivers (i.e. cpu and memory drivers)
to do the hot-plug operations, and ACPI core to do the ACPI stuff.

We should not go with the system driver only approach since we do need
to use ACPI stuff.  Also, we should not go with the ACPI only approach
since we do need to use the system (and other) drivers.  This patchset
provides a sequencer to manage these steps across multiple subsystems.


Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-02 20:15               ` Rafael J. Wysocki
  2013-02-02 22:18                 ` [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework) Rafael J. Wysocki
  2013-02-03 20:44                 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Rafael J. Wysocki
@ 2013-02-04  1:23                 ` Greg KH
  2013-02-04 13:41                   ` Rafael J. Wysocki
  2 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-04  1:23 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> > > On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > > > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > > > them, or give me a _really_ good reason why they will not work.
> > > > > > 
> > > > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > > > device information in a platform-neutral way.  During hot-add, we first
> > > > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > > > 
> > > > > But suppose we're smart and have ACPI scan handlers that will create
> > > > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > > > this way that will reflect all of the dependencies we need to take into
> > > > > account during hot-add and hot-remove operations.  That may not be what we
> > > > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > > > using this approach.
> > > > 
> > > > I would _much_ rather see that be the solution here as I think it is the
> > > > proper one.
> > > > 
> > > > > This is already done for PCI host bridges and platform devices and I don't
> > > > > see why we can't do that for the other types of devices too.
> > > > 
> > > > I agree.
> > > > 
> > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > help from the driver core here.
> > > > 
> > > > I say do what we always have done here, if the user asked us to tear
> > > > something down, let it happen as they are the ones that know best :)
> > > > 
> > > > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > > > ACPI developers keep harping on.  I thought we already resolved this
> > > > properly by having them implement it in their bus code, no reason the
> > > > same thing couldn't happen here, right?
> > > 
> > > Not really. :-)  We haven't ever resolved that particular issue I'm afraid.
> > 
> > Ah, I didn't realize that.
> > 
> > > > I don't think the core needs to do anything special, but if so, I'll be glad
> > > > to review it.
> > > 
> > > OK, so this is the use case.  We have "eject" defined for something like
> > > a container with a number of CPU cores, PCI host bridge, and a memory
> > > controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> > > 
> > > Now, there's a button on the system case labeled as "Eject" and if that button
> > > is pressed, we're supposed to _try_ to eject all of those things at once.  We
> > > are allowed to fail that request, though, if that's problematic for some
> > > reason, but we're supposed to let the BIOS know about that.
> > > 
> > > Do you seriously think that if that button is pressed, we should just proceed
> > > with removing all that stuff no matter what?  That'd be kind of like Russian
> > > roulette for whoever pressed that button, because s/he could only press it and
> > > wait for the system to either crash or not.  Or maybe to crash a bit later
> > > because of some delayed stuff that would hit one of those devices that had just
> > > gone.  Surely not a situation any admin of a high-availability system would
> > > like to be in. :-)
> > > 
> > > Quite frankly, I have no idea how that can be addressed in a single bus type,
> > > let alone ACPI (which is not even a proper bus type, just something pretending
> > > to be one).
> > 
> > You don't have it as a single bus type, you have a controller somewhere,
> > off of the bus being destroyed, that handles sending remove events to
> > the device and tearing everything down.  PCI does this from the very
> > beginning.
> 
> Yes, but those are just remove events and we can only see how destructive they
> were after the removal.  The point is to be able to figure out whether or not
> we *want* to do the removal in the first place.

Yes, but, you will always race if you try to test to see if you can shut
down a device and then trying to do it.  So walking the bus ahead of
time isn't a good idea.

And, we really don't have a viable way to recover if disconnect() fails,
do we.  What do we do in that situation, restore the other devices we
disconnected successfully?  How do we remember/know what they were?

PCI hotplug almost had this same problem until the designers finally
realized that they just had to accept the fact that removing a PCI
device could either happen by:
	- a user yanking out the device, at which time the OS better
	  clean up properly no matter what happens
	- the user asked nicely to remove a device, and the OS can take
	  as long as it wants to complete that action, including
	  stalling for noticable amounts of time before eventually,
	  always letting the action succeed.

I think the second thing is what you have to do here.  If a user tells
the OS it wants to remove these devices, you better do it.  If you
can't, because memory is being used by someone else, either move them
off, or just hope that nothing bad happens, before the user gets
frustrated and yanks out the CPU/memory module themselves physically :)

> Say you have a computing node which signals a hardware problem in a processor
> package (the container with CPU cores, memory, PCI host bridge etc.).  You
> may want to eject that package, but you don't want to kill the system this
> way.  So if the eject is doable, it is very much desirable to do it, but if it
> is not doable, you'd rather shut the box down and do the replacement afterward.
> That may be costly, however (maybe weeks of computations), so it should be
> avoided if possible, but not at the expense of crashing the box if the eject
> doesn't work out.

These same "situations" came up for PCI hotplug, and I still say the
same resolution there holds true, as described above.  The user wants to
remove something, so let them do it.  They always know best, and get mad
at us if we think otherwise :)

What does the ACPI spec say about this type of thing?  Surely the same
people that did the PCI Hotplug spec were consulted when doing this part
of the spec, right?  Yeah, I know, I can dream...

> > I know it's more complicated with these types of devices, and I think we
> > are getting closer to the correct solution, I just don't want to ever
> > see duplicate devices in the driver model for the same physical device.
> 
> Do you mean two things based on struct device for the same hardware component?
> That's been happening already pretty much forever for every PCI device known
> to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> rather should be, but we're going to clean that up) only for convenience (to be
> able to see the namespace structure and related things in sysfs).  So the stuff
> under /sys/devices/LNXSYSTM\:00/ is not "real".

Yes, I've never treated that as a "real" device because they (usually)
didn't ever bind to the "real" driver that controlled the device and how
it talked to the rest of the os (like a USB device for example.)  I
always thought just of it as a "shadow" of the firmware image, nothing
that should be directly operated on if at all possible.

But, as you are pointing out, maybe this needs to be changed.  Having
users have to look in one part of the tree for one interface to a
device, and another totally different part for a different interface to
the same physical device is crazy, don't you agree?

As to how to solve it, I really have no idea, I don't know ACPI that
well at all, and honestly, don't want to, I want to keep what little
hair I have left...

> In my view it shouldn't even
> be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),

I agree.

> but that may be difficult to change without breaking user space (maybe we can
> just symlink it from /sys/devices/ or something).  And the ACPI bus type
> shouldn't even exist in my opinion.
> 
> There's much confusion in there and much work to clean that up, I agree, but
> that's kind of separate from the hotplug thing.

I agree as well.

Best of luck.

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework)
  2013-02-02 22:18                 ` [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework) Rafael J. Wysocki
@ 2013-02-04  1:24                   ` Greg KH
  2013-02-04 12:34                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-04  1:24 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Sat, Feb 02, 2013 at 11:18:20PM +0100, Rafael J. Wysocki wrote:
> On Saturday, February 02, 2013 09:15:37 PM Rafael J. Wysocki wrote:
> > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> [...]
> > 
> > > I know it's more complicated with these types of devices, and I think we
> > > are getting closer to the correct solution, I just don't want to ever
> > > see duplicate devices in the driver model for the same physical device.
> > 
> > Do you mean two things based on struct device for the same hardware component?
> > That's been happening already pretty much forever for every PCI device known
> > to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> > rather should be, but we're going to clean that up) only for convenience (to be
> > able to see the namespace structure and related things in sysfs).  So the stuff
> > under /sys/devices/LNXSYSTM\:00/ is not "real".  In my view it shouldn't even
> > be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
> > but that may be difficult to change without breaking user space (maybe we can
> > just symlink it from /sys/devices/ or something).  And the ACPI bus type
> > shouldn't even exist in my opinion.
> 
> Well, well.
> 
> In fact, the appended patch moves the whole ACPI device nodes tree under
> /sys/firmware/acpi/ and I'm not seeing any negative consequences of that on my
> test box (events work and so on).  User space is quite new on it, though, and
> the patch is hackish.

Try booting a RHEL 5 image on this type of kernel, or some old Fedora
releases, they were sensitive to changes in sysfs.

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework)
  2013-02-04  1:24                   ` Greg KH
@ 2013-02-04 12:34                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 12:34 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Sunday, February 03, 2013 07:24:47 PM Greg KH wrote:
> On Sat, Feb 02, 2013 at 11:18:20PM +0100, Rafael J. Wysocki wrote:
> > On Saturday, February 02, 2013 09:15:37 PM Rafael J. Wysocki wrote:
> > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > [...]
> > > 
> > > > I know it's more complicated with these types of devices, and I think we
> > > > are getting closer to the correct solution, I just don't want to ever
> > > > see duplicate devices in the driver model for the same physical device.
> > > 
> > > Do you mean two things based on struct device for the same hardware component?
> > > That's been happening already pretty much forever for every PCI device known
> > > to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> > > rather should be, but we're going to clean that up) only for convenience (to be
> > > able to see the namespace structure and related things in sysfs).  So the stuff
> > > under /sys/devices/LNXSYSTM\:00/ is not "real".  In my view it shouldn't even
> > > be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
> > > but that may be difficult to change without breaking user space (maybe we can
> > > just symlink it from /sys/devices/ or something).  And the ACPI bus type
> > > shouldn't even exist in my opinion.
> > 
> > Well, well.
> > 
> > In fact, the appended patch moves the whole ACPI device nodes tree under
> > /sys/firmware/acpi/ and I'm not seeing any negative consequences of that on my
> > test box (events work and so on).  User space is quite new on it, though, and
> > the patch is hackish.
> 
> Try booting a RHEL 5 image on this type of kernel, or some old Fedora
> releases, they were sensitive to changes in sysfs.

Well, I've found a machine where it causes problems to happen.

I'll try to add a symlink from /sys/devices to that and see what happens then.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04  0:28                 ` Toshi Kani
@ 2013-02-04 12:46                   ` Greg KH
  2013-02-04 16:46                     ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-04 12:46 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > see why we can't do that for the other types of devices too.
> > > > > > 
> > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > help from the driver core here.
> > > > > 
> > > > > There are three different approaches suggested for system device
> > > > > hot-plug:
> > > > >  A. Proceed within system device bus scan.
> > > > >  B. Proceed within ACPI bus scan.
> > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > 
> > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > 
> > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > clarifies why I am suggesting option 3.
> > > > > 
> > > > > 1. What are the system devices?
> > > > > System devices provide system-wide core computing resources, which are
> > > > > essential to compose a computer system.  System devices are not
> > > > > connected to any particular standard buses.
> > > > 
> > > > Not a problem, lots of devices are not connected to any "particular
> > > > standard busses".  All this means is that system devices are connected
> > > > to the "system" bus, nothing more.
> > > 
> > > Can you give me a few examples of other devices that support hotplug and
> > > are not connected to any particular buses?  I will investigate them to
> > > see how they are managed to support hotplug.
> > 
> > Any device that is attached to any bus in the driver model can be
> > hotunplugged from userspace by telling it to be "unbound" from the
> > driver controlling it.  Try it for any platform device in your system to
> > see how it happens.
> 
> The unbind operation, as I understand from you, is to detach a driver
> from a device.  Yes, unbinding can be done for any devices.  It is
> however different from hot-plug operation, which unplugs a device.

Physically, yes, but to the driver involved, and the driver core, there
is no difference.  That was one of the primary goals of the driver core
creation so many years ago.

> Today, the unbind operation to an ACPI cpu/memory devices causes
> hot-unplug (offline) operation to them, which is one of the major issues
> for us since unbind cannot fail.  This patchset addresses this issue by
> making the unbind operation of ACPI cpu/memory devices to do the
> unbinding only.  ACPI drivers no longer control cpu and memory as they
> are supposed to be controlled by their drivers, cpu and memory modules.

I think that's the problem right there, solve that, please.

> > > > > 2. Why are the system devices special?
> > > > > The system devices are initialized during early boot-time, by multiple
> > > > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > > > provide low-level services to enable other subsystems to come up.
> > > > 
> > > > Sorry, no, that doesn't mean they are special, nothing here is unique
> > > > for the point of view of the driver model from any other device or bus.
> > > 
> > > I think system devices are unique in a sense that they are initialized
> > > before drivers run.
> > 
> > No, most all devices are "initialized" before a driver runs on it, USB
> > is one such example, PCI another, and I'm pretty sure that there are
> > others.
> 
> USB devices can be initialized after the USB bus driver is initialized.
> Similarly, PCI devices can be initialized after the PCI bus driver is
> initialized.  However, CPU and memory are initialized without any
> dependency to their bus driver since there is no such thing.

You can create such a thing if you want :)

> In addition, CPU and memory have two drivers -- their actual
> drivers/subsystems and their ACPI drivers.

Again, I feel that is the root of the problem.  Rafael seems to be
working on solving this, which I think is essencial to your work as
well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-03 20:44                 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Rafael J. Wysocki
@ 2013-02-04 12:48                   ` Greg KH
  2013-02-04 14:21                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-04 12:48 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > Yes, but those are just remove events and we can only see how destructive they
> > were after the removal.  The point is to be able to figure out whether or not
> > we *want* to do the removal in the first place.
> > 
> > Say you have a computing node which signals a hardware problem in a processor
> > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > may want to eject that package, but you don't want to kill the system this
> > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > is not doable, you'd rather shut the box down and do the replacement afterward.
> > That may be costly, however (maybe weeks of computations), so it should be
> > avoided if possible, but not at the expense of crashing the box if the eject
> > doesn't work out.
> 
> It seems to me that we could handle that with the help of a new flag, say
> "no_eject", in struct device, a global mutex, and a function that will walk
> the given subtree of the device hierarchy and check if "no_eject" is set for
> any devices in there.  Plus a global "no_eject" switch, perhaps.

I think this will always be racy, or at worst, slow things down on
normal device operations as you will always be having to grab this flag
whenever you want to do something new.

See my comments earlier about pci hotplug and the design decisions there
about "no eject" capabilities for why.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04  1:23                 ` Greg KH
@ 2013-02-04 13:41                   ` Rafael J. Wysocki
  2013-02-04 16:02                     ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 13:41 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > > On Fri, Feb 01, 2013 at 11:12:59PM +0100, Rafael J. Wysocki wrote:
> > > > On Friday, February 01, 2013 08:23:12 AM Greg KH wrote:
> > > > > On Thu, Jan 31, 2013 at 09:54:51PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > But, again, I'm going to ask why you aren't using the existing cpu /
> > > > > > > > memory / bridge / node devices that we have in the kernel.  Please use
> > > > > > > > them, or give me a _really_ good reason why they will not work.
> > > > > > > 
> > > > > > > We cannot use the existing system devices or ACPI devices here.  During
> > > > > > > hot-plug, ACPI handler sets this shp_device info, so that cpu and memory
> > > > > > > handlers (drivers/cpu.c and mm/memory_hotplug.c) can obtain their target
> > > > > > > device information in a platform-neutral way.  During hot-add, we first
> > > > > > > creates an ACPI device node (i.e. device under /sys/bus/acpi/devices),
> > > > > > > but platform-neutral modules cannot use them as they are ACPI-specific.
> > > > > > 
> > > > > > But suppose we're smart and have ACPI scan handlers that will create
> > > > > > "physical" device nodes for those devices during the ACPI namespace scan.
> > > > > > Then, the platform-neutral nodes will be able to bind to those "physical"
> > > > > > nodes.  Moreover, it should be possible to get a hierarchy of device objects
> > > > > > this way that will reflect all of the dependencies we need to take into
> > > > > > account during hot-add and hot-remove operations.  That may not be what we
> > > > > > have today, but I don't see any *fundamental* obstacles preventing us from
> > > > > > using this approach.
> > > > > 
> > > > > I would _much_ rather see that be the solution here as I think it is the
> > > > > proper one.
> > > > > 
> > > > > > This is already done for PCI host bridges and platform devices and I don't
> > > > > > see why we can't do that for the other types of devices too.
> > > > > 
> > > > > I agree.
> > > > > 
> > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > help from the driver core here.
> > > > > 
> > > > > I say do what we always have done here, if the user asked us to tear
> > > > > something down, let it happen as they are the ones that know best :)
> > > > > 
> > > > > Seriously, I guess this gets back to the "fail disconnect" idea that the
> > > > > ACPI developers keep harping on.  I thought we already resolved this
> > > > > properly by having them implement it in their bus code, no reason the
> > > > > same thing couldn't happen here, right?
> > > > 
> > > > Not really. :-)  We haven't ever resolved that particular issue I'm afraid.
> > > 
> > > Ah, I didn't realize that.
> > > 
> > > > > I don't think the core needs to do anything special, but if so, I'll be glad
> > > > > to review it.
> > > > 
> > > > OK, so this is the use case.  We have "eject" defined for something like
> > > > a container with a number of CPU cores, PCI host bridge, and a memory
> > > > controller under it.  And a few pretty much arbitrary I/O devices as a bonus.
> > > > 
> > > > Now, there's a button on the system case labeled as "Eject" and if that button
> > > > is pressed, we're supposed to _try_ to eject all of those things at once.  We
> > > > are allowed to fail that request, though, if that's problematic for some
> > > > reason, but we're supposed to let the BIOS know about that.
> > > > 
> > > > Do you seriously think that if that button is pressed, we should just proceed
> > > > with removing all that stuff no matter what?  That'd be kind of like Russian
> > > > roulette for whoever pressed that button, because s/he could only press it and
> > > > wait for the system to either crash or not.  Or maybe to crash a bit later
> > > > because of some delayed stuff that would hit one of those devices that had just
> > > > gone.  Surely not a situation any admin of a high-availability system would
> > > > like to be in. :-)
> > > > 
> > > > Quite frankly, I have no idea how that can be addressed in a single bus type,
> > > > let alone ACPI (which is not even a proper bus type, just something pretending
> > > > to be one).
> > > 
> > > You don't have it as a single bus type, you have a controller somewhere,
> > > off of the bus being destroyed, that handles sending remove events to
> > > the device and tearing everything down.  PCI does this from the very
> > > beginning.
> > 
> > Yes, but those are just remove events and we can only see how destructive they
> > were after the removal.  The point is to be able to figure out whether or not
> > we *want* to do the removal in the first place.
> 
> Yes, but, you will always race if you try to test to see if you can shut
> down a device and then trying to do it.  So walking the bus ahead of
> time isn't a good idea.
>
> And, we really don't have a viable way to recover if disconnect() fails,
> do we.  What do we do in that situation, restore the other devices we
> disconnected successfully?  How do we remember/know what they were?
> 
> PCI hotplug almost had this same problem until the designers finally
> realized that they just had to accept the fact that removing a PCI
> device could either happen by:
> 	- a user yanking out the device, at which time the OS better
> 	  clean up properly no matter what happens
> 	- the user asked nicely to remove a device, and the OS can take
> 	  as long as it wants to complete that action, including
> 	  stalling for noticable amounts of time before eventually,
> 	  always letting the action succeed.
> 
> I think the second thing is what you have to do here.  If a user tells
> the OS it wants to remove these devices, you better do it.  If you
> can't, because memory is being used by someone else, either move them
> off, or just hope that nothing bad happens, before the user gets
> frustrated and yanks out the CPU/memory module themselves physically :)

Well, that we can't help, but sometimes users really *want* the OS to tell them
if it is safe to unplug something at this particualr time (think about the
Windows' "safe remove" feature for USB sticks, for example; that came out of
users' demand AFAIR).

So in my opinion it would be good to give them an option to do "safe eject" or
"forcible eject", whichever they prefer.

> > Say you have a computing node which signals a hardware problem in a processor
> > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > may want to eject that package, but you don't want to kill the system this
> > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > is not doable, you'd rather shut the box down and do the replacement afterward.
> > That may be costly, however (maybe weeks of computations), so it should be
> > avoided if possible, but not at the expense of crashing the box if the eject
> > doesn't work out.
> 
> These same "situations" came up for PCI hotplug, and I still say the
> same resolution there holds true, as described above.  The user wants to
> remove something, so let them do it.  They always know best, and get mad
> at us if we think otherwise :)

Well, not necessarily.  Users sometimes really don't know what they are doing
and want us to give them a hint.  My opinion is that if we can give them a
hint, there's no reason not to.

> What does the ACPI spec say about this type of thing?  Surely the same
> people that did the PCI Hotplug spec were consulted when doing this part
> of the spec, right?  Yeah, I know, I can dream...

It's not very specific (as usual), but it gives hints. :-)

For example, there is the _OST method (Section 6.3.5 of ACPI 5) that we are
supposed to use to notify the platform of ejection failures and there are
status codes like "0x81: Device in use by application" or "0x82: Device busy"
that can be used in there.  So definitely the authors took ejection failures
for software-related reasons into consideration.

> > > I know it's more complicated with these types of devices, and I think we
> > > are getting closer to the correct solution, I just don't want to ever
> > > see duplicate devices in the driver model for the same physical device.
> > 
> > Do you mean two things based on struct device for the same hardware component?
> > That's been happening already pretty much forever for every PCI device known
> > to the ACPI layer, for PNP and many others.  However, those ACPI things are (or
> > rather should be, but we're going to clean that up) only for convenience (to be
> > able to see the namespace structure and related things in sysfs).  So the stuff
> > under /sys/devices/LNXSYSTM\:00/ is not "real".
> 
> Yes, I've never treated that as a "real" device because they (usually)
> didn't ever bind to the "real" driver that controlled the device and how
> it talked to the rest of the os (like a USB device for example.)  I
> always thought just of it as a "shadow" of the firmware image, nothing
> that should be directly operated on if at all possible.

Precisely.  That's why I'd like to move that stuff away from /sys/devices/
and I don't see a reason why these objects should be based on struct device.
They need kobjects to show up in sysfs, but apart from this they don't really
need anything from struct device as far as I can say.

> But, as you are pointing out, maybe this needs to be changed.  Having
> users have to look in one part of the tree for one interface to a
> device, and another totally different part for a different interface to
> the same physical device is crazy, don't you agree?

Well, it is confusing.  I don't have a problem with exposing the ACPI namespace
in the form of a directory structure in sysfs and I see some benefits from
doing that, but I'd like it to be clear what's represented by that directory
structure and I don't want people to confuse ACPI device objects with devices
(they are abstract interfaces to devices rather than anything else).

> As to how to solve it, I really have no idea, I don't know ACPI that
> well at all, and honestly, don't want to, I want to keep what little
> hair I have left...

I totally understand you. :-)

> > In my view it shouldn't even
> > be under /sys/devices/ (/sys/firmware/acpi/ seems to be a better place for it),
> 
> I agree.
> 
> > but that may be difficult to change without breaking user space (maybe we can
> > just symlink it from /sys/devices/ or something).  And the ACPI bus type
> > shouldn't even exist in my opinion.
> > 
> > There's much confusion in there and much work to clean that up, I agree, but
> > that's kind of separate from the hotplug thing.
> 
> I agree as well.
> 
> Best of luck.

Thanks. :-)

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 12:48                   ` Greg KH
@ 2013-02-04 14:21                     ` Rafael J. Wysocki
  2013-02-04 14:33                       ` Greg KH
  2013-02-04 16:19                       ` Toshi Kani
  0 siblings, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 14:21 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > Yes, but those are just remove events and we can only see how destructive they
> > > were after the removal.  The point is to be able to figure out whether or not
> > > we *want* to do the removal in the first place.
> > > 
> > > Say you have a computing node which signals a hardware problem in a processor
> > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > may want to eject that package, but you don't want to kill the system this
> > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > That may be costly, however (maybe weeks of computations), so it should be
> > > avoided if possible, but not at the expense of crashing the box if the eject
> > > doesn't work out.
> > 
> > It seems to me that we could handle that with the help of a new flag, say
> > "no_eject", in struct device, a global mutex, and a function that will walk
> > the given subtree of the device hierarchy and check if "no_eject" is set for
> > any devices in there.  Plus a global "no_eject" switch, perhaps.
> 
> I think this will always be racy, or at worst, slow things down on
> normal device operations as you will always be having to grab this flag
> whenever you want to do something new.

I don't see why this particular scheme should be racy, at least I don't see any
obvious races in it (although I'm not that good at races detection in general,
admittedly).

Also, I don't expect that flag to be used for everything, just for things known
to seriously break if forcible eject is done.  That may be not precise enough,
so that's a matter of defining its purpose more precisely.

We can do something like that on the ACPI level (ie. introduce a no_eject flag
in struct acpi_device and provide an iterface for the layers above ACPI to
manipulate it) but then devices without ACPI namespace objects won't be
covered.  That may not be a big deal, though.

So say dev is about to be used for something incompatible with ejecting, so to
speak.  Then, one would do platform_lock_eject(dev), which would check if dev
has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
platform_lock_eject(dev) would need to be checked to see if the device is not
gone.  If it returns success (0), one would do something to the device and
call platform_no_eject(dev) and then platform_unlock_eject(dev).

To clear no_eject one would just call platform_allow_to_eject(dev) that would
do all of the locking and clearing in one operation.

The ACPI eject side would be similar to the thing I described previously,
so it would (1) take acpi_eject_lock, (2) see if any struct acpi_device
involved has no_eject set and if not, then (3) do acpi_bus_trim(), (4)
carry out the eject and (5) release acpi_eject_lock.

Step (2) above might be optional, ie. if eject is forcible, we would just do
(3) etc. without (2).

The locking should prevent races from happening (and it should prevent two
ejects from happening at the same time too, which is not a bad thing by itself).

> See my comments earlier about pci hotplug and the design decisions there
> about "no eject" capabilities for why.

Well, I replied to that one too. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 14:21                     ` Rafael J. Wysocki
@ 2013-02-04 14:33                       ` Greg KH
  2013-02-04 20:07                         ` Rafael J. Wysocki
  2013-02-04 16:19                       ` Toshi Kani
  1 sibling, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-04 14:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > Yes, but those are just remove events and we can only see how destructive they
> > > > were after the removal.  The point is to be able to figure out whether or not
> > > > we *want* to do the removal in the first place.
> > > > 
> > > > Say you have a computing node which signals a hardware problem in a processor
> > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > may want to eject that package, but you don't want to kill the system this
> > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > doesn't work out.
> > > 
> > > It seems to me that we could handle that with the help of a new flag, say
> > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > 
> > I think this will always be racy, or at worst, slow things down on
> > normal device operations as you will always be having to grab this flag
> > whenever you want to do something new.
> 
> I don't see why this particular scheme should be racy, at least I don't see any
> obvious races in it (although I'm not that good at races detection in general,
> admittedly).
> 
> Also, I don't expect that flag to be used for everything, just for things known
> to seriously break if forcible eject is done.  That may be not precise enough,
> so that's a matter of defining its purpose more precisely.
> 
> We can do something like that on the ACPI level (ie. introduce a no_eject flag
> in struct acpi_device and provide an iterface for the layers above ACPI to
> manipulate it) but then devices without ACPI namespace objects won't be
> covered.  That may not be a big deal, though.
> 
> So say dev is about to be used for something incompatible with ejecting, so to
> speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> platform_lock_eject(dev) would need to be checked to see if the device is not
> gone.  If it returns success (0), one would do something to the device and
> call platform_no_eject(dev) and then platform_unlock_eject(dev).

How does a device "know" it is doing something that is incompatible with
ejecting?  That's a non-trivial task from what I can tell.

What happens if a device wants to set that flag, right after it was told
to eject and the device was in the middle of being removed?  How can you
"fail" the "I can't be removed me now, so don't" requirement that it now
has?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 13:41                   ` Rafael J. Wysocki
@ 2013-02-04 16:02                     ` Toshi Kani
  2013-02-04 19:48                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 16:02 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
  :
> > > Yes, but those are just remove events and we can only see how destructive they
> > > were after the removal.  The point is to be able to figure out whether or not
> > > we *want* to do the removal in the first place.
> > 
> > Yes, but, you will always race if you try to test to see if you can shut
> > down a device and then trying to do it.  So walking the bus ahead of
> > time isn't a good idea.
> >
> > And, we really don't have a viable way to recover if disconnect() fails,
> > do we.  What do we do in that situation, restore the other devices we
> > disconnected successfully?  How do we remember/know what they were?
> > 
> > PCI hotplug almost had this same problem until the designers finally
> > realized that they just had to accept the fact that removing a PCI
> > device could either happen by:
> > 	- a user yanking out the device, at which time the OS better
> > 	  clean up properly no matter what happens
> > 	- the user asked nicely to remove a device, and the OS can take
> > 	  as long as it wants to complete that action, including
> > 	  stalling for noticable amounts of time before eventually,
> > 	  always letting the action succeed.
> > 
> > I think the second thing is what you have to do here.  If a user tells
> > the OS it wants to remove these devices, you better do it.  If you
> > can't, because memory is being used by someone else, either move them
> > off, or just hope that nothing bad happens, before the user gets
> > frustrated and yanks out the CPU/memory module themselves physically :)
> 
> Well, that we can't help, but sometimes users really *want* the OS to tell them
> if it is safe to unplug something at this particualr time (think about the
> Windows' "safe remove" feature for USB sticks, for example; that came out of
> users' demand AFAIR).
> 
> So in my opinion it would be good to give them an option to do "safe eject" or
> "forcible eject", whichever they prefer.

For system device hot-plug, it always needs to be "safe eject".  This
feature will be implemented on mission critical servers, which are
managed by professional IT folks.  Crashing a server causes serious
money to the business.

A user yanking out a system device won't happen, and it immediately
crashes the system if it is done.  So, we have nothing to do with this
case.  The 2nd case can hang the operation, waiting forever to proceed,
which is still a serious issue for enterprise customers.


> > > Say you have a computing node which signals a hardware problem in a processor
> > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > may want to eject that package, but you don't want to kill the system this
> > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > That may be costly, however (maybe weeks of computations), so it should be
> > > avoided if possible, but not at the expense of crashing the box if the eject
> > > doesn't work out.
> > 
> > These same "situations" came up for PCI hotplug, and I still say the
> > same resolution there holds true, as described above.  The user wants to
> > remove something, so let them do it.  They always know best, and get mad
> > at us if we think otherwise :)
> 
> Well, not necessarily.  Users sometimes really don't know what they are doing
> and want us to give them a hint.  My opinion is that if we can give them a
> hint, there's no reason not to.
> 
> > What does the ACPI spec say about this type of thing?  Surely the same
> > people that did the PCI Hotplug spec were consulted when doing this part
> > of the spec, right?  Yeah, I know, I can dream...
> 
> It's not very specific (as usual), but it gives hints. :-)
> 
> For example, there is the _OST method (Section 6.3.5 of ACPI 5) that we are
> supposed to use to notify the platform of ejection failures and there are
> status codes like "0x81: Device in use by application" or "0x82: Device busy"
> that can be used in there.  So definitely the authors took ejection failures
> for software-related reasons into consideration.

That is correct.  Also, ACPI spec deliberately does not define
implementation details, so we defined DIG64 hotplug spec below (which I
contributed to the spec.)

http://www.dig64.org/home/DIG64_HPPF_R1_0.pdf

For example, Figure 2 in page 14 states memory hot-remove flow.  The
operation needs to either succeed or fail.  Crash or hang is not an
option.


Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 14:21                     ` Rafael J. Wysocki
  2013-02-04 14:33                       ` Greg KH
@ 2013-02-04 16:19                       ` Toshi Kani
  2013-02-04 19:43                         ` Rafael J. Wysocki
  1 sibling, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 16:19 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-02-04 at 15:21 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > Yes, but those are just remove events and we can only see how destructive they
> > > > were after the removal.  The point is to be able to figure out whether or not
> > > > we *want* to do the removal in the first place.
> > > > 
> > > > Say you have a computing node which signals a hardware problem in a processor
> > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > may want to eject that package, but you don't want to kill the system this
> > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > doesn't work out.
> > > 
> > > It seems to me that we could handle that with the help of a new flag, say
> > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > 
> > I think this will always be racy, or at worst, slow things down on
> > normal device operations as you will always be having to grab this flag
> > whenever you want to do something new.
> 
> I don't see why this particular scheme should be racy, at least I don't see any
> obvious races in it (although I'm not that good at races detection in general,
> admittedly).
> 
> Also, I don't expect that flag to be used for everything, just for things known
> to seriously break if forcible eject is done.  That may be not precise enough,
> so that's a matter of defining its purpose more precisely.
> 
> We can do something like that on the ACPI level (ie. introduce a no_eject flag
> in struct acpi_device and provide an iterface for the layers above ACPI to
> manipulate it) but then devices without ACPI namespace objects won't be
> covered.  That may not be a big deal, though.

I am afraid that bringing the device status management into the ACPI
level would not a good idea.  acpi_device should only reflect ACPI
device object information, not how its actual device is being used.

I like your initiative of acpi_scan_driver and I think scanning /
trimming of ACPI object info is what the ACPI drivers should do.


Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 12:46                   ` Greg KH
@ 2013-02-04 16:46                     ` Toshi Kani
  2013-02-04 19:45                       ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 16:46 UTC (permalink / raw)
  To: Greg KH
  Cc: Rafael J. Wysocki, lenb, akpm, linux-acpi, linux-kernel,
	linux-mm, linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki,
	jiang.liu, wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > 
> > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > help from the driver core here.
> > > > > > 
> > > > > > There are three different approaches suggested for system device
> > > > > > hot-plug:
> > > > > >  A. Proceed within system device bus scan.
> > > > > >  B. Proceed within ACPI bus scan.
> > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > 
> > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > 
> > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > clarifies why I am suggesting option 3.
> > > > > > 
> > > > > > 1. What are the system devices?
> > > > > > System devices provide system-wide core computing resources, which are
> > > > > > essential to compose a computer system.  System devices are not
> > > > > > connected to any particular standard buses.
> > > > > 
> > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > standard busses".  All this means is that system devices are connected
> > > > > to the "system" bus, nothing more.
> > > > 
> > > > Can you give me a few examples of other devices that support hotplug and
> > > > are not connected to any particular buses?  I will investigate them to
> > > > see how they are managed to support hotplug.
> > > 
> > > Any device that is attached to any bus in the driver model can be
> > > hotunplugged from userspace by telling it to be "unbound" from the
> > > driver controlling it.  Try it for any platform device in your system to
> > > see how it happens.
> > 
> > The unbind operation, as I understand from you, is to detach a driver
> > from a device.  Yes, unbinding can be done for any devices.  It is
> > however different from hot-plug operation, which unplugs a device.
> 
> Physically, yes, but to the driver involved, and the driver core, there
> is no difference.  That was one of the primary goals of the driver core
> creation so many years ago.
> 
> > Today, the unbind operation to an ACPI cpu/memory devices causes
> > hot-unplug (offline) operation to them, which is one of the major issues
> > for us since unbind cannot fail.  This patchset addresses this issue by
> > making the unbind operation of ACPI cpu/memory devices to do the
> > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > are supposed to be controlled by their drivers, cpu and memory modules.
> 
> I think that's the problem right there, solve that, please.

We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
can limit the ACPI drivers to do the scanning stuff only.   This is
precisely the intend of this patchset.  The real stuff, removing actual
devices, is done by the system device drivers/modules.


> > > > > > 2. Why are the system devices special?
> > > > > > The system devices are initialized during early boot-time, by multiple
> > > > > > subsystems, from the boot-up sequence, in pre-defined order.  They
> > > > > > provide low-level services to enable other subsystems to come up.
> > > > > 
> > > > > Sorry, no, that doesn't mean they are special, nothing here is unique
> > > > > for the point of view of the driver model from any other device or bus.
> > > > 
> > > > I think system devices are unique in a sense that they are initialized
> > > > before drivers run.
> > > 
> > > No, most all devices are "initialized" before a driver runs on it, USB
> > > is one such example, PCI another, and I'm pretty sure that there are
> > > others.
> > 
> > USB devices can be initialized after the USB bus driver is initialized.
> > Similarly, PCI devices can be initialized after the PCI bus driver is
> > initialized.  However, CPU and memory are initialized without any
> > dependency to their bus driver since there is no such thing.
> 
> You can create such a thing if you want :)

Well, a pseudo driver could be created for it, but it does not make any
difference.  Access to CPU and memory does not go thru any bus
controller visible to the OS.  CPU and memory are connected with links
(which are up at begging) and do not have bus structure any more.


> > In addition, CPU and memory have two drivers -- their actual
> > drivers/subsystems and their ACPI drivers.
> 
> Again, I feel that is the root of the problem.  Rafael seems to be
> working on solving this, which I think is essencial to your work as
> well.

Yes, Rafael is doing excellent work to turn ACPI drivers into ACPI
"scan" drivers, removing device driver portion, and keeping them as
attach / detach operation to ACPI device object.  My patchset is very
much aligned with this direction. :)


Thanks,
-Toshi



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 16:19                       ` Toshi Kani
@ 2013-02-04 19:43                         ` Rafael J. Wysocki
  0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 19:43 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 09:19:09 AM Toshi Kani wrote:
> On Mon, 2013-02-04 at 15:21 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > we *want* to do the removal in the first place.
> > > > > 
> > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > may want to eject that package, but you don't want to kill the system this
> > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > doesn't work out.
> > > > 
> > > > It seems to me that we could handle that with the help of a new flag, say
> > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > 
> > > I think this will always be racy, or at worst, slow things down on
> > > normal device operations as you will always be having to grab this flag
> > > whenever you want to do something new.
> > 
> > I don't see why this particular scheme should be racy, at least I don't see any
> > obvious races in it (although I'm not that good at races detection in general,
> > admittedly).
> > 
> > Also, I don't expect that flag to be used for everything, just for things known
> > to seriously break if forcible eject is done.  That may be not precise enough,
> > so that's a matter of defining its purpose more precisely.
> > 
> > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > in struct acpi_device and provide an iterface for the layers above ACPI to
> > manipulate it) but then devices without ACPI namespace objects won't be
> > covered.  That may not be a big deal, though.
> 
> I am afraid that bringing the device status management into the ACPI
> level would not a good idea.  acpi_device should only reflect ACPI
> device object information, not how its actual device is being used.
> 
> I like your initiative of acpi_scan_driver and I think scanning /
> trimming of ACPI object info is what the ACPI drivers should do.

ACPI drivers, yes, but the users of ACPI already rely on information
in struct acpi_device.  Like ACPI device power states, for example.

So platform_no_eject(dev) is not much different in that respect from
platform_pci_set_power_state(pci_dev).

The whole "eject" concept is somewhat ACPI-specific, though, and the eject
notifications come from ACPI, so I don't have a problem with limiting it to
ACPI-backed devices for the time being.

If it turns out the be useful outside of ACPI, then we can move it up to the
driver core.  For now I don't see a compelling reason to do that.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 16:46                     ` Toshi Kani
@ 2013-02-04 19:45                       ` Rafael J. Wysocki
  2013-02-04 20:59                         ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 19:45 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > 
> > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > help from the driver core here.
> > > > > > > 
> > > > > > > There are three different approaches suggested for system device
> > > > > > > hot-plug:
> > > > > > >  A. Proceed within system device bus scan.
> > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > 
> > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > 
> > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > clarifies why I am suggesting option 3.
> > > > > > > 
> > > > > > > 1. What are the system devices?
> > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > connected to any particular standard buses.
> > > > > > 
> > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > standard busses".  All this means is that system devices are connected
> > > > > > to the "system" bus, nothing more.
> > > > > 
> > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > are not connected to any particular buses?  I will investigate them to
> > > > > see how they are managed to support hotplug.
> > > > 
> > > > Any device that is attached to any bus in the driver model can be
> > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > driver controlling it.  Try it for any platform device in your system to
> > > > see how it happens.
> > > 
> > > The unbind operation, as I understand from you, is to detach a driver
> > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > however different from hot-plug operation, which unplugs a device.
> > 
> > Physically, yes, but to the driver involved, and the driver core, there
> > is no difference.  That was one of the primary goals of the driver core
> > creation so many years ago.
> > 
> > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > hot-unplug (offline) operation to them, which is one of the major issues
> > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > making the unbind operation of ACPI cpu/memory devices to do the
> > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > are supposed to be controlled by their drivers, cpu and memory modules.
> > 
> > I think that's the problem right there, solve that, please.
> 
> We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> can limit the ACPI drivers to do the scanning stuff only.   This is
> precisely the intend of this patchset.  The real stuff, removing actual
> devices, is done by the system device drivers/modules.

In case you haven't realized that yet, the $subject patchset has no future.

Let's just talk about how we can get what we need in more general terms.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 19:48                       ` Rafael J. Wysocki
@ 2013-02-04 19:46                         ` Toshi Kani
  2013-02-04 20:12                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 19:46 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> >   :
> > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > we *want* to do the removal in the first place.
> > > > 
> > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > time isn't a good idea.
> > > >
> > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > do we.  What do we do in that situation, restore the other devices we
> > > > disconnected successfully?  How do we remember/know what they were?
> > > > 
> > > > PCI hotplug almost had this same problem until the designers finally
> > > > realized that they just had to accept the fact that removing a PCI
> > > > device could either happen by:
> > > > 	- a user yanking out the device, at which time the OS better
> > > > 	  clean up properly no matter what happens
> > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > 	  as long as it wants to complete that action, including
> > > > 	  stalling for noticable amounts of time before eventually,
> > > > 	  always letting the action succeed.
> > > > 
> > > > I think the second thing is what you have to do here.  If a user tells
> > > > the OS it wants to remove these devices, you better do it.  If you
> > > > can't, because memory is being used by someone else, either move them
> > > > off, or just hope that nothing bad happens, before the user gets
> > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > 
> > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > if it is safe to unplug something at this particualr time (think about the
> > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > users' demand AFAIR).
> > > 
> > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > "forcible eject", whichever they prefer.
> > 
> > For system device hot-plug, it always needs to be "safe eject".  This
> > feature will be implemented on mission critical servers, which are
> > managed by professional IT folks.  Crashing a server causes serious
> > money to the business.
> 
> Well, "always" is a bit too strong a word as far as human behavior is concerned
> in my opinion.
> 
> That said I would be perfectly fine with not supporting the "forcible eject" to
> start with and waiting for the first request to add support for it.  I also
> would be fine with taking bets on how much time it's going to take for such a
> request to appear. :-)

Sounds good.  In my experience, though, it actually takes a LONG time to
convince customers that "safe eject" is actually safe.  Enterprise
customers are so afraid of doing anything risky that might cause the
system to crash or hang due to some defect.  I would be very surprised
to see a customer asking for a force operation when we do not guarantee
its outcome.  I have not seen such enterprise customers yet.

Thanks,
-Toshi 



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 16:02                     ` Toshi Kani
@ 2013-02-04 19:48                       ` Rafael J. Wysocki
  2013-02-04 19:46                         ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 19:48 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
>   :
> > > > Yes, but those are just remove events and we can only see how destructive they
> > > > were after the removal.  The point is to be able to figure out whether or not
> > > > we *want* to do the removal in the first place.
> > > 
> > > Yes, but, you will always race if you try to test to see if you can shut
> > > down a device and then trying to do it.  So walking the bus ahead of
> > > time isn't a good idea.
> > >
> > > And, we really don't have a viable way to recover if disconnect() fails,
> > > do we.  What do we do in that situation, restore the other devices we
> > > disconnected successfully?  How do we remember/know what they were?
> > > 
> > > PCI hotplug almost had this same problem until the designers finally
> > > realized that they just had to accept the fact that removing a PCI
> > > device could either happen by:
> > > 	- a user yanking out the device, at which time the OS better
> > > 	  clean up properly no matter what happens
> > > 	- the user asked nicely to remove a device, and the OS can take
> > > 	  as long as it wants to complete that action, including
> > > 	  stalling for noticable amounts of time before eventually,
> > > 	  always letting the action succeed.
> > > 
> > > I think the second thing is what you have to do here.  If a user tells
> > > the OS it wants to remove these devices, you better do it.  If you
> > > can't, because memory is being used by someone else, either move them
> > > off, or just hope that nothing bad happens, before the user gets
> > > frustrated and yanks out the CPU/memory module themselves physically :)
> > 
> > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > if it is safe to unplug something at this particualr time (think about the
> > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > users' demand AFAIR).
> > 
> > So in my opinion it would be good to give them an option to do "safe eject" or
> > "forcible eject", whichever they prefer.
> 
> For system device hot-plug, it always needs to be "safe eject".  This
> feature will be implemented on mission critical servers, which are
> managed by professional IT folks.  Crashing a server causes serious
> money to the business.

Well, "always" is a bit too strong a word as far as human behavior is concerned
in my opinion.

That said I would be perfectly fine with not supporting the "forcible eject" to
start with and waiting for the first request to add support for it.  I also
would be fine with taking bets on how much time it's going to take for such a
request to appear. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 14:33                       ` Greg KH
@ 2013-02-04 20:07                         ` Rafael J. Wysocki
  2013-02-04 22:13                           ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 20:07 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > we *want* to do the removal in the first place.
> > > > > 
> > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > may want to eject that package, but you don't want to kill the system this
> > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > doesn't work out.
> > > > 
> > > > It seems to me that we could handle that with the help of a new flag, say
> > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > 
> > > I think this will always be racy, or at worst, slow things down on
> > > normal device operations as you will always be having to grab this flag
> > > whenever you want to do something new.
> > 
> > I don't see why this particular scheme should be racy, at least I don't see any
> > obvious races in it (although I'm not that good at races detection in general,
> > admittedly).
> > 
> > Also, I don't expect that flag to be used for everything, just for things known
> > to seriously break if forcible eject is done.  That may be not precise enough,
> > so that's a matter of defining its purpose more precisely.
> > 
> > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > in struct acpi_device and provide an iterface for the layers above ACPI to
> > manipulate it) but then devices without ACPI namespace objects won't be
> > covered.  That may not be a big deal, though.
> > 
> > So say dev is about to be used for something incompatible with ejecting, so to
> > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > platform_lock_eject(dev) would need to be checked to see if the device is not
> > gone.  If it returns success (0), one would do something to the device and
> > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> 
> How does a device "know" it is doing something that is incompatible with
> ejecting?  That's a non-trivial task from what I can tell.

I agree that this is complicated in general.  But.

There are devices known to have software "offline" and "online" operations
such that after the "offline" the given device is guaranteed to be not used
until "online".  We have that for CPU cores, for example, and user space can
do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
"online" set the no_eject flag (under the lock as appropriate) and the
"offline" clear it?  And why don't we define such "online" and "offline" for
all of the other "system" stuff, like memory, PCI host bridges etc. and make it
behave analogously?

Then, it is quite simple to say which devices should use the no_eject flag:
devices that have "online" and "offline" exported to user space.  And guess
who's responsible for "offlining" all of those things before trying to eject
them: user space is.  From the kernel's point of view it is all clear.  Hands
clean. :-)

Now, there's a different problem how to expose all of the relevant information
to user space so that it knows what to "offline" for the specific eject
operation to succeed, but that's kind of separate and worth addressing
anyway.

> What happens if a device wants to set that flag, right after it was told
> to eject and the device was in the middle of being removed?  How can you
> "fail" the "I can't be removed me now, so don't" requirement that it now
> has?

This one is easy. :-)

If platform_lock_eject() is called when an eject is under way, it will block
on acpi_eject_lock until the eject is complete and if the device is gone as
a result of the eject, it will return an error code.

In turn, if an eject happens after platform_lock_eject(), it will block until
platform_unlock_eject() and if platform_no_eject() is called in between the
lock and unlock, it will notice the device with no_eject set and bail out.

Quite obviously, it would be a bug to call platform_lock_eject() from within an
eject code path.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 19:46                         ` Toshi Kani
@ 2013-02-04 20:12                           ` Rafael J. Wysocki
  2013-02-04 20:34                             ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 20:12 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 12:46:24 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > >   :
> > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > we *want* to do the removal in the first place.
> > > > > 
> > > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > > time isn't a good idea.
> > > > >
> > > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > > do we.  What do we do in that situation, restore the other devices we
> > > > > disconnected successfully?  How do we remember/know what they were?
> > > > > 
> > > > > PCI hotplug almost had this same problem until the designers finally
> > > > > realized that they just had to accept the fact that removing a PCI
> > > > > device could either happen by:
> > > > > 	- a user yanking out the device, at which time the OS better
> > > > > 	  clean up properly no matter what happens
> > > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > > 	  as long as it wants to complete that action, including
> > > > > 	  stalling for noticable amounts of time before eventually,
> > > > > 	  always letting the action succeed.
> > > > > 
> > > > > I think the second thing is what you have to do here.  If a user tells
> > > > > the OS it wants to remove these devices, you better do it.  If you
> > > > > can't, because memory is being used by someone else, either move them
> > > > > off, or just hope that nothing bad happens, before the user gets
> > > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > > 
> > > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > > if it is safe to unplug something at this particualr time (think about the
> > > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > > users' demand AFAIR).
> > > > 
> > > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > > "forcible eject", whichever they prefer.
> > > 
> > > For system device hot-plug, it always needs to be "safe eject".  This
> > > feature will be implemented on mission critical servers, which are
> > > managed by professional IT folks.  Crashing a server causes serious
> > > money to the business.
> > 
> > Well, "always" is a bit too strong a word as far as human behavior is concerned
> > in my opinion.
> > 
> > That said I would be perfectly fine with not supporting the "forcible eject" to
> > start with and waiting for the first request to add support for it.  I also
> > would be fine with taking bets on how much time it's going to take for such a
> > request to appear. :-)
> 
> Sounds good.  In my experience, though, it actually takes a LONG time to
> convince customers that "safe eject" is actually safe.  Enterprise
> customers are so afraid of doing anything risky that might cause the
> system to crash or hang due to some defect.  I would be very surprised
> to see a customer asking for a force operation when we do not guarantee
> its outcome.  I have not seen such enterprise customers yet.

But we're talking about a kernel that is supposed to run on mobile phones too,
among other things.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 20:12                           ` Rafael J. Wysocki
@ 2013-02-04 20:34                             ` Toshi Kani
  2013-02-04 23:19                               ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 20:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-02-04 at 21:12 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 12:46:24 PM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > > > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > > >   :
> > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > we *want* to do the removal in the first place.
> > > > > > 
> > > > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > > > time isn't a good idea.
> > > > > >
> > > > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > > > do we.  What do we do in that situation, restore the other devices we
> > > > > > disconnected successfully?  How do we remember/know what they were?
> > > > > > 
> > > > > > PCI hotplug almost had this same problem until the designers finally
> > > > > > realized that they just had to accept the fact that removing a PCI
> > > > > > device could either happen by:
> > > > > > 	- a user yanking out the device, at which time the OS better
> > > > > > 	  clean up properly no matter what happens
> > > > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > > > 	  as long as it wants to complete that action, including
> > > > > > 	  stalling for noticable amounts of time before eventually,
> > > > > > 	  always letting the action succeed.
> > > > > > 
> > > > > > I think the second thing is what you have to do here.  If a user tells
> > > > > > the OS it wants to remove these devices, you better do it.  If you
> > > > > > can't, because memory is being used by someone else, either move them
> > > > > > off, or just hope that nothing bad happens, before the user gets
> > > > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > > > 
> > > > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > > > if it is safe to unplug something at this particualr time (think about the
> > > > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > > > users' demand AFAIR).
> > > > > 
> > > > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > > > "forcible eject", whichever they prefer.
> > > > 
> > > > For system device hot-plug, it always needs to be "safe eject".  This
> > > > feature will be implemented on mission critical servers, which are
> > > > managed by professional IT folks.  Crashing a server causes serious
> > > > money to the business.
> > > 
> > > Well, "always" is a bit too strong a word as far as human behavior is concerned
> > > in my opinion.
> > > 
> > > That said I would be perfectly fine with not supporting the "forcible eject" to
> > > start with and waiting for the first request to add support for it.  I also
> > > would be fine with taking bets on how much time it's going to take for such a
> > > request to appear. :-)
> > 
> > Sounds good.  In my experience, though, it actually takes a LONG time to
> > convince customers that "safe eject" is actually safe.  Enterprise
> > customers are so afraid of doing anything risky that might cause the
> > system to crash or hang due to some defect.  I would be very surprised
> > to see a customer asking for a force operation when we do not guarantee
> > its outcome.  I have not seen such enterprise customers yet.
> 
> But we're talking about a kernel that is supposed to run on mobile phones too,
> among other things.

I think using this feature for RAS i.e. replacing a faulty device
on-line, will continue to be limited for high-end systems.  For low-end
systems, it does not make sense for customers to pay much $$ for this
feature.  They can just shut the system down for replacement, or they
can simply buy a new system instead of repairing.

That said, using this feature on VM for workload balancing does not
require any special hardware.  So, I can see someone willing to try out
to see how it goes with a force option on VM for personal use.   

Thanks,
-Toshi


 


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 19:45                       ` Rafael J. Wysocki
@ 2013-02-04 20:59                         ` Toshi Kani
  2013-02-04 23:23                           ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 20:59 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-02-04 at 20:45 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > > 
> > > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > > help from the driver core here.
> > > > > > > > 
> > > > > > > > There are three different approaches suggested for system device
> > > > > > > > hot-plug:
> > > > > > > >  A. Proceed within system device bus scan.
> > > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > > 
> > > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > > 
> > > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > > clarifies why I am suggesting option 3.
> > > > > > > > 
> > > > > > > > 1. What are the system devices?
> > > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > > connected to any particular standard buses.
> > > > > > > 
> > > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > > standard busses".  All this means is that system devices are connected
> > > > > > > to the "system" bus, nothing more.
> > > > > > 
> > > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > > are not connected to any particular buses?  I will investigate them to
> > > > > > see how they are managed to support hotplug.
> > > > > 
> > > > > Any device that is attached to any bus in the driver model can be
> > > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > > driver controlling it.  Try it for any platform device in your system to
> > > > > see how it happens.
> > > > 
> > > > The unbind operation, as I understand from you, is to detach a driver
> > > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > > however different from hot-plug operation, which unplugs a device.
> > > 
> > > Physically, yes, but to the driver involved, and the driver core, there
> > > is no difference.  That was one of the primary goals of the driver core
> > > creation so many years ago.
> > > 
> > > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > > hot-unplug (offline) operation to them, which is one of the major issues
> > > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > > making the unbind operation of ACPI cpu/memory devices to do the
> > > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > > are supposed to be controlled by their drivers, cpu and memory modules.
> > > 
> > > I think that's the problem right there, solve that, please.
> > 
> > We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> > can limit the ACPI drivers to do the scanning stuff only.   This is
> > precisely the intend of this patchset.  The real stuff, removing actual
> > devices, is done by the system device drivers/modules.
> 
> In case you haven't realized that yet, the $subject patchset has no future.

That's really disappointing, esp. the fact that this basic approach has
been proven to work on other OS for years...


> Let's just talk about how we can get what we need in more general terms.

So, are we heading to an approach of doing everything in ACPI?  I am not
clear about which direction we have agreed with or disagreed with.

As for the eject flag approach, I agree with Greg.


Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 20:07                         ` Rafael J. Wysocki
@ 2013-02-04 22:13                           ` Toshi Kani
  2013-02-04 23:52                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 22:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Mon, 2013-02-04 at 21:07 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> > On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > we *want* to do the removal in the first place.
> > > > > > 
> > > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > > may want to eject that package, but you don't want to kill the system this
> > > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > > doesn't work out.
> > > > > 
> > > > > It seems to me that we could handle that with the help of a new flag, say
> > > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > > 
> > > > I think this will always be racy, or at worst, slow things down on
> > > > normal device operations as you will always be having to grab this flag
> > > > whenever you want to do something new.
> > > 
> > > I don't see why this particular scheme should be racy, at least I don't see any
> > > obvious races in it (although I'm not that good at races detection in general,
> > > admittedly).
> > > 
> > > Also, I don't expect that flag to be used for everything, just for things known
> > > to seriously break if forcible eject is done.  That may be not precise enough,
> > > so that's a matter of defining its purpose more precisely.
> > > 
> > > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > > in struct acpi_device and provide an iterface for the layers above ACPI to
> > > manipulate it) but then devices without ACPI namespace objects won't be
> > > covered.  That may not be a big deal, though.
> > > 
> > > So say dev is about to be used for something incompatible with ejecting, so to
> > > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > > platform_lock_eject(dev) would need to be checked to see if the device is not
> > > gone.  If it returns success (0), one would do something to the device and
> > > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> > 
> > How does a device "know" it is doing something that is incompatible with
> > ejecting?  That's a non-trivial task from what I can tell.
> 
> I agree that this is complicated in general.  But.
> 
> There are devices known to have software "offline" and "online" operations
> such that after the "offline" the given device is guaranteed to be not used
> until "online".  We have that for CPU cores, for example, and user space can
> do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
> "online" set the no_eject flag (under the lock as appropriate) and the
> "offline" clear it?  And why don't we define such "online" and "offline" for
> all of the other "system" stuff, like memory, PCI host bridges etc. and make it
> behave analogously?
> 
> Then, it is quite simple to say which devices should use the no_eject flag:
> devices that have "online" and "offline" exported to user space.  And guess
> who's responsible for "offlining" all of those things before trying to eject
> them: user space is.  From the kernel's point of view it is all clear.  Hands
> clean. :-)
> 
> Now, there's a different problem how to expose all of the relevant information
> to user space so that it knows what to "offline" for the specific eject
> operation to succeed, but that's kind of separate and worth addressing
> anyway.

So, the idea is to run a user space program that off-lines all relevant
devices before trimming ACPI devices.  Is that right?  That sounds like
a worth idea to consider with.  This basically moves the "sequencer"
part into user space instead of the kernel space in my proposal.  I
agree that how to expose all of the relevant info to user space is an
issue.  Also, we will need to make sure that the user program always
runs per a kernel request and then informs a result back to the kernel,
so that the kernel can do the rest of an operation and inform a result
to FW with _OST or _EJ0.  This loop has to close.  I think it is going
to be more complicated than the kernel-only approach.

In addition, I am not sure if the "no_eject" flag in acpi_device is
really necessary here since the user program will inform the kernel if
all devices are off-line.  Also, the kernel will likely need to expose
the device info to the user program to tell which devices need to be
off-lined.  At that time, the kernel already knows if there is any
on-line device in the scope.


> > What happens if a device wants to set that flag, right after it was told
> > to eject and the device was in the middle of being removed?  How can you
> > "fail" the "I can't be removed me now, so don't" requirement that it now
> > has?
> 
> This one is easy. :-)
> 
> If platform_lock_eject() is called when an eject is under way, it will block
> on acpi_eject_lock until the eject is complete and if the device is gone as
> a result of the eject, it will return an error code.

In this case, we do really need to make sure that the user program does
not get killed in the middle of its operation since the kernel is
holding a lock while it is under way.


Thanks,
-Toshi


> In turn, if an eject happens after platform_lock_eject(), it will block until
> platform_unlock_eject() and if platform_no_eject() is called in between the
> lock and unlock, it will notice the device with no_eject set and bail out.
> 
> Quite obviously, it would be a bug to call platform_lock_eject() from within an
> eject code path.
> 
> Thanks,
> Rafael
> 
> 



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 20:34                             ` Toshi Kani
@ 2013-02-04 23:19                               ` Rafael J. Wysocki
  0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 23:19 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 01:34:18 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 21:12 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 12:46:24 PM Toshi Kani wrote:
> > > On Mon, 2013-02-04 at 20:48 +0100, Rafael J. Wysocki wrote:
> > > > On Monday, February 04, 2013 09:02:46 AM Toshi Kani wrote:
> > > > > On Mon, 2013-02-04 at 14:41 +0100, Rafael J. Wysocki wrote:
> > > > > > On Sunday, February 03, 2013 07:23:49 PM Greg KH wrote:
> > > > > > > On Sat, Feb 02, 2013 at 09:15:37PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > On Saturday, February 02, 2013 03:58:01 PM Greg KH wrote:
> > > > >   :
> > > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > > we *want* to do the removal in the first place.
> > > > > > > 
> > > > > > > Yes, but, you will always race if you try to test to see if you can shut
> > > > > > > down a device and then trying to do it.  So walking the bus ahead of
> > > > > > > time isn't a good idea.
> > > > > > >
> > > > > > > And, we really don't have a viable way to recover if disconnect() fails,
> > > > > > > do we.  What do we do in that situation, restore the other devices we
> > > > > > > disconnected successfully?  How do we remember/know what they were?
> > > > > > > 
> > > > > > > PCI hotplug almost had this same problem until the designers finally
> > > > > > > realized that they just had to accept the fact that removing a PCI
> > > > > > > device could either happen by:
> > > > > > > 	- a user yanking out the device, at which time the OS better
> > > > > > > 	  clean up properly no matter what happens
> > > > > > > 	- the user asked nicely to remove a device, and the OS can take
> > > > > > > 	  as long as it wants to complete that action, including
> > > > > > > 	  stalling for noticable amounts of time before eventually,
> > > > > > > 	  always letting the action succeed.
> > > > > > > 
> > > > > > > I think the second thing is what you have to do here.  If a user tells
> > > > > > > the OS it wants to remove these devices, you better do it.  If you
> > > > > > > can't, because memory is being used by someone else, either move them
> > > > > > > off, or just hope that nothing bad happens, before the user gets
> > > > > > > frustrated and yanks out the CPU/memory module themselves physically :)
> > > > > > 
> > > > > > Well, that we can't help, but sometimes users really *want* the OS to tell them
> > > > > > if it is safe to unplug something at this particualr time (think about the
> > > > > > Windows' "safe remove" feature for USB sticks, for example; that came out of
> > > > > > users' demand AFAIR).
> > > > > > 
> > > > > > So in my opinion it would be good to give them an option to do "safe eject" or
> > > > > > "forcible eject", whichever they prefer.
> > > > > 
> > > > > For system device hot-plug, it always needs to be "safe eject".  This
> > > > > feature will be implemented on mission critical servers, which are
> > > > > managed by professional IT folks.  Crashing a server causes serious
> > > > > money to the business.
> > > > 
> > > > Well, "always" is a bit too strong a word as far as human behavior is concerned
> > > > in my opinion.
> > > > 
> > > > That said I would be perfectly fine with not supporting the "forcible eject" to
> > > > start with and waiting for the first request to add support for it.  I also
> > > > would be fine with taking bets on how much time it's going to take for such a
> > > > request to appear. :-)
> > > 
> > > Sounds good.  In my experience, though, it actually takes a LONG time to
> > > convince customers that "safe eject" is actually safe.  Enterprise
> > > customers are so afraid of doing anything risky that might cause the
> > > system to crash or hang due to some defect.  I would be very surprised
> > > to see a customer asking for a force operation when we do not guarantee
> > > its outcome.  I have not seen such enterprise customers yet.
> > 
> > But we're talking about a kernel that is supposed to run on mobile phones too,
> > among other things.
> 
> I think using this feature for RAS i.e. replacing a faulty device
> on-line, will continue to be limited for high-end systems.  For low-end
> systems, it does not make sense for customers to pay much $$ for this
> feature.  They can just shut the system down for replacement, or they
> can simply buy a new system instead of repairing.
> 
> That said, using this feature on VM for workload balancing does not
> require any special hardware.  So, I can see someone willing to try out
> to see how it goes with a force option on VM for personal use.   

Besides, SMP was a $$ "enterprise" feature not so long ago, so things tend to
change. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 20:59                         ` Toshi Kani
@ 2013-02-04 23:23                           ` Rafael J. Wysocki
  2013-02-04 23:33                             ` Toshi Kani
  0 siblings, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 23:23 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 01:59:27 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 20:45 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> > > On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > > > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > > > 
> > > > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > > > help from the driver core here.
> > > > > > > > > 
> > > > > > > > > There are three different approaches suggested for system device
> > > > > > > > > hot-plug:
> > > > > > > > >  A. Proceed within system device bus scan.
> > > > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > > > 
> > > > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > > > 
> > > > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > > > clarifies why I am suggesting option 3.
> > > > > > > > > 
> > > > > > > > > 1. What are the system devices?
> > > > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > > > connected to any particular standard buses.
> > > > > > > > 
> > > > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > > > standard busses".  All this means is that system devices are connected
> > > > > > > > to the "system" bus, nothing more.
> > > > > > > 
> > > > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > > > are not connected to any particular buses?  I will investigate them to
> > > > > > > see how they are managed to support hotplug.
> > > > > > 
> > > > > > Any device that is attached to any bus in the driver model can be
> > > > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > > > driver controlling it.  Try it for any platform device in your system to
> > > > > > see how it happens.
> > > > > 
> > > > > The unbind operation, as I understand from you, is to detach a driver
> > > > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > > > however different from hot-plug operation, which unplugs a device.
> > > > 
> > > > Physically, yes, but to the driver involved, and the driver core, there
> > > > is no difference.  That was one of the primary goals of the driver core
> > > > creation so many years ago.
> > > > 
> > > > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > > > hot-unplug (offline) operation to them, which is one of the major issues
> > > > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > > > making the unbind operation of ACPI cpu/memory devices to do the
> > > > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > > > are supposed to be controlled by their drivers, cpu and memory modules.
> > > > 
> > > > I think that's the problem right there, solve that, please.
> > > 
> > > We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> > > can limit the ACPI drivers to do the scanning stuff only.   This is
> > > precisely the intend of this patchset.  The real stuff, removing actual
> > > devices, is done by the system device drivers/modules.
> > 
> > In case you haven't realized that yet, the $subject patchset has no future.
> 
> That's really disappointing, esp. the fact that this basic approach has
> been proven to work on other OS for years...
> 
> 
> > Let's just talk about how we can get what we need in more general terms.
> 
> So, are we heading to an approach of doing everything in ACPI?  I am not
> clear about which direction we have agreed with or disagreed with.
> 
> As for the eject flag approach, I agree with Greg.

Well, I'm not sure which of the Greg's thoughts you agree with. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 23:23                           ` Rafael J. Wysocki
@ 2013-02-04 23:33                             ` Toshi Kani
  0 siblings, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-02-04 23:33 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tue, 2013-02-05 at 00:23 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 01:59:27 PM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 20:45 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 09:46:18 AM Toshi Kani wrote:
> > > > On Mon, 2013-02-04 at 04:46 -0800, Greg KH wrote:
> > > > > On Sun, Feb 03, 2013 at 05:28:09PM -0700, Toshi Kani wrote:
> > > > > > On Sat, 2013-02-02 at 16:01 +0100, Greg KH wrote:
> > > > > > > On Fri, Feb 01, 2013 at 01:40:10PM -0700, Toshi Kani wrote:
> > > > > > > > On Fri, 2013-02-01 at 07:30 +0000, Greg KH wrote:
> > > > > > > > > On Thu, Jan 31, 2013 at 06:32:18PM -0700, Toshi Kani wrote:
> > > > > > > > >  > This is already done for PCI host bridges and platform devices and I don't
> > > > > > > > > > > see why we can't do that for the other types of devices too.
> > > > > > > > > > > 
> > > > > > > > > > > The only missing piece I see is a way to handle the "eject" problem, i.e.
> > > > > > > > > > > when we try do eject a device at the top of a subtree and need to tear down
> > > > > > > > > > > the entire subtree below it, but if that's going to lead to a system crash,
> > > > > > > > > > > for example, we want to cancel the eject.  It seems to me that we'll need some
> > > > > > > > > > > help from the driver core here.
> > > > > > > > > > 
> > > > > > > > > > There are three different approaches suggested for system device
> > > > > > > > > > hot-plug:
> > > > > > > > > >  A. Proceed within system device bus scan.
> > > > > > > > > >  B. Proceed within ACPI bus scan.
> > > > > > > > > >  C. Proceed with a sequence (as a mini-boot).
> > > > > > > > > > 
> > > > > > > > > > Option A uses system devices as tokens, option B uses acpi devices as
> > > > > > > > > > tokens, and option C uses resource tables as tokens, for their handlers.
> > > > > > > > > > 
> > > > > > > > > > Here is summary of key questions & answers so far.  I hope this
> > > > > > > > > > clarifies why I am suggesting option 3.
> > > > > > > > > > 
> > > > > > > > > > 1. What are the system devices?
> > > > > > > > > > System devices provide system-wide core computing resources, which are
> > > > > > > > > > essential to compose a computer system.  System devices are not
> > > > > > > > > > connected to any particular standard buses.
> > > > > > > > > 
> > > > > > > > > Not a problem, lots of devices are not connected to any "particular
> > > > > > > > > standard busses".  All this means is that system devices are connected
> > > > > > > > > to the "system" bus, nothing more.
> > > > > > > > 
> > > > > > > > Can you give me a few examples of other devices that support hotplug and
> > > > > > > > are not connected to any particular buses?  I will investigate them to
> > > > > > > > see how they are managed to support hotplug.
> > > > > > > 
> > > > > > > Any device that is attached to any bus in the driver model can be
> > > > > > > hotunplugged from userspace by telling it to be "unbound" from the
> > > > > > > driver controlling it.  Try it for any platform device in your system to
> > > > > > > see how it happens.
> > > > > > 
> > > > > > The unbind operation, as I understand from you, is to detach a driver
> > > > > > from a device.  Yes, unbinding can be done for any devices.  It is
> > > > > > however different from hot-plug operation, which unplugs a device.
> > > > > 
> > > > > Physically, yes, but to the driver involved, and the driver core, there
> > > > > is no difference.  That was one of the primary goals of the driver core
> > > > > creation so many years ago.
> > > > > 
> > > > > > Today, the unbind operation to an ACPI cpu/memory devices causes
> > > > > > hot-unplug (offline) operation to them, which is one of the major issues
> > > > > > for us since unbind cannot fail.  This patchset addresses this issue by
> > > > > > making the unbind operation of ACPI cpu/memory devices to do the
> > > > > > unbinding only.  ACPI drivers no longer control cpu and memory as they
> > > > > > are supposed to be controlled by their drivers, cpu and memory modules.
> > > > > 
> > > > > I think that's the problem right there, solve that, please.
> > > > 
> > > > We cannot eliminate the ACPI drivers since we have to scan ACPI.  But we
> > > > can limit the ACPI drivers to do the scanning stuff only.   This is
> > > > precisely the intend of this patchset.  The real stuff, removing actual
> > > > devices, is done by the system device drivers/modules.
> > > 
> > > In case you haven't realized that yet, the $subject patchset has no future.
> > 
> > That's really disappointing, esp. the fact that this basic approach has
> > been proven to work on other OS for years...
> > 
> > 
> > > Let's just talk about how we can get what we need in more general terms.
> > 
> > So, are we heading to an approach of doing everything in ACPI?  I am not
> > clear about which direction we have agreed with or disagreed with.
> > 
> > As for the eject flag approach, I agree with Greg.
> 
> Well, I'm not sure which of the Greg's thoughts you agree with. :-)

Sorry, that was the Greg's comment below.  But then, I saw your other
email clarifying that the no_eject flag only reflects online/offline
status, not how the device is being used.  So, I replied with my
thoughts in a separate email. :)

===
How does a device "know" it is doing something that is incompatible with
ejecting?  That's a non-trivial task from what I can tell.
===

Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 22:13                           ` Toshi Kani
@ 2013-02-04 23:52                             ` Rafael J. Wysocki
  2013-02-05  0:04                               ` Greg KH
  2013-02-05  0:55                               ` Toshi Kani
  0 siblings, 2 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-04 23:52 UTC (permalink / raw)
  To: Toshi Kani
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 03:13:29 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 21:07 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> > > On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > > > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > we *want* to do the removal in the first place.
> > > > > > > 
> > > > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > > > may want to eject that package, but you don't want to kill the system this
> > > > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > > > doesn't work out.
> > > > > > 
> > > > > > It seems to me that we could handle that with the help of a new flag, say
> > > > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > > > 
> > > > > I think this will always be racy, or at worst, slow things down on
> > > > > normal device operations as you will always be having to grab this flag
> > > > > whenever you want to do something new.
> > > > 
> > > > I don't see why this particular scheme should be racy, at least I don't see any
> > > > obvious races in it (although I'm not that good at races detection in general,
> > > > admittedly).
> > > > 
> > > > Also, I don't expect that flag to be used for everything, just for things known
> > > > to seriously break if forcible eject is done.  That may be not precise enough,
> > > > so that's a matter of defining its purpose more precisely.
> > > > 
> > > > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > > > in struct acpi_device and provide an iterface for the layers above ACPI to
> > > > manipulate it) but then devices without ACPI namespace objects won't be
> > > > covered.  That may not be a big deal, though.
> > > > 
> > > > So say dev is about to be used for something incompatible with ejecting, so to
> > > > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > > > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > > > platform_lock_eject(dev) would need to be checked to see if the device is not
> > > > gone.  If it returns success (0), one would do something to the device and
> > > > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> > > 
> > > How does a device "know" it is doing something that is incompatible with
> > > ejecting?  That's a non-trivial task from what I can tell.
> > 
> > I agree that this is complicated in general.  But.
> > 
> > There are devices known to have software "offline" and "online" operations
> > such that after the "offline" the given device is guaranteed to be not used
> > until "online".  We have that for CPU cores, for example, and user space can
> > do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
> > "online" set the no_eject flag (under the lock as appropriate) and the
> > "offline" clear it?  And why don't we define such "online" and "offline" for
> > all of the other "system" stuff, like memory, PCI host bridges etc. and make it
> > behave analogously?
> > 
> > Then, it is quite simple to say which devices should use the no_eject flag:
> > devices that have "online" and "offline" exported to user space.  And guess
> > who's responsible for "offlining" all of those things before trying to eject
> > them: user space is.  From the kernel's point of view it is all clear.  Hands
> > clean. :-)
> > 
> > Now, there's a different problem how to expose all of the relevant information
> > to user space so that it knows what to "offline" for the specific eject
> > operation to succeed, but that's kind of separate and worth addressing
> > anyway.
> 
> So, the idea is to run a user space program that off-lines all relevant
> devices before trimming ACPI devices.  Is that right?  That sounds like
> a worth idea to consider with.  This basically moves the "sequencer"
> part into user space instead of the kernel space in my proposal.  I
> agree that how to expose all of the relevant info to user space is an
> issue.  Also, we will need to make sure that the user program always
> runs per a kernel request and then informs a result back to the kernel,
> so that the kernel can do the rest of an operation and inform a result
> to FW with _OST or _EJ0.  This loop has to close.  I think it is going
> to be more complicated than the kernel-only approach.

I actually didn't think about that.  The point is that trying to offline
everything *synchronously* may just be pointless, because it may be
offlined upfront, before the eject is even requested.  So the sequence
would be to first offline things that we'll want to eject from user space
and then to send the eject request (e.g. via sysfs too).

Eject requests from eject buttons and things like that may just fail if
some components involved that should be offline are online.  The fact that
we might be able to offline them synchronously if we tried doesn't matter,
pretty much as it doesn't matter for hot-swappable disks.

You'd probably never try to hot-remove a disk before unmounting filesystems
mounted from it or failing it as a RAID component and nobody sane wants the
kernel to do things like that automatically when the user presses the eject
button.  In my opinion we should treat memory eject, or CPU package eject, or
PCI host bridge eject in exactly the same way: Don't eject if it is not
prepared for ejecting in the first place.

And if you think about it, that makes things *massively* simpler, because now
the kernel doesn't heed to worry about all of those "synchronous removal"
scenarions that very well may involve every single device in the system and
the whole problem is nicely split into several separate "implement
offline/online" problems that are subsystem-specific and a single
"eject if everything relevant is offline" problem which is kind of trivial.
Plus the one of exposing information to user space, which is separate too.

Now, each of them can be worked on separately, *tested* separately and
debugged separately if need be and it is much easier to isolate failures
and so on.

> In addition, I am not sure if the "no_eject" flag in acpi_device is
> really necessary here since the user program will inform the kernel if
> all devices are off-line.  Also, the kernel will likely need to expose
> the device info to the user program to tell which devices need to be
> off-lined.  At that time, the kernel already knows if there is any
> on-line device in the scope.

Well, that depends no what "the kernel" means and how it knows that.  Surely
the "online" components have to be marked somehow so that it is easy to check
if they are in the scope in the subsystem-independent way, so why don't we use
something like the no_eject flag for that?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 23:52                             ` Rafael J. Wysocki
@ 2013-02-05  0:04                               ` Greg KH
  2013-02-05  1:02                                 ` Rafael J. Wysocki
  2013-02-05 11:11                                 ` Rafael J. Wysocki
  2013-02-05  0:55                               ` Toshi Kani
  1 sibling, 2 replies; 83+ messages in thread
From: Greg KH @ 2013-02-05  0:04 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> You'd probably never try to hot-remove a disk before unmounting filesystems
> mounted from it or failing it as a RAID component and nobody sane wants the
> kernel to do things like that automatically when the user presses the eject
> button.  In my opinion we should treat memory eject, or CPU package eject, or
> PCI host bridge eject in exactly the same way: Don't eject if it is not
> prepared for ejecting in the first place.

Bad example, we have disks hot-removed all the time without any
filesystems being unmounted, and have supported this since the 2.2 days
(although we didn't get it "right" until 2.6.)

PCI Host bridge eject is the same as PCI eject today, the user asks us
to do it, and we can not fail it from happening.  We also can have them
removed without us being told about it in the first place, and can
properly clean up from it all.

> And if you think about it, that makes things *massively* simpler, because now
> the kernel doesn't heed to worry about all of those "synchronous removal"
> scenarions that very well may involve every single device in the system and
> the whole problem is nicely split into several separate "implement
> offline/online" problems that are subsystem-specific and a single
> "eject if everything relevant is offline" problem which is kind of trivial.
> Plus the one of exposing information to user space, which is separate too.
> 
> Now, each of them can be worked on separately, *tested* separately and
> debugged separately if need be and it is much easier to isolate failures
> and so on.

So you are agreeing with me in that we can not fail hot removing any
device, nice :)

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-04 23:52                             ` Rafael J. Wysocki
  2013-02-05  0:04                               ` Greg KH
@ 2013-02-05  0:55                               ` Toshi Kani
  1 sibling, 0 replies; 83+ messages in thread
From: Toshi Kani @ 2013-02-05  0:55 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Greg KH, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tue, 2013-02-05 at 00:52 +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 03:13:29 PM Toshi Kani wrote:
> > On Mon, 2013-02-04 at 21:07 +0100, Rafael J. Wysocki wrote:
> > > On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> > > > On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > > > > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > > > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > > we *want* to do the removal in the first place.
> > > > > > > > 
> > > > > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > > > > may want to eject that package, but you don't want to kill the system this
> > > > > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > > > > doesn't work out.
> > > > > > > 
> > > > > > > It seems to me that we could handle that with the help of a new flag, say
> > > > > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > > > > 
> > > > > > I think this will always be racy, or at worst, slow things down on
> > > > > > normal device operations as you will always be having to grab this flag
> > > > > > whenever you want to do something new.
> > > > > 
> > > > > I don't see why this particular scheme should be racy, at least I don't see any
> > > > > obvious races in it (although I'm not that good at races detection in general,
> > > > > admittedly).
> > > > > 
> > > > > Also, I don't expect that flag to be used for everything, just for things known
> > > > > to seriously break if forcible eject is done.  That may be not precise enough,
> > > > > so that's a matter of defining its purpose more precisely.
> > > > > 
> > > > > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > > > > in struct acpi_device and provide an iterface for the layers above ACPI to
> > > > > manipulate it) but then devices without ACPI namespace objects won't be
> > > > > covered.  That may not be a big deal, though.
> > > > > 
> > > > > So say dev is about to be used for something incompatible with ejecting, so to
> > > > > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > > > > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > > > > platform_lock_eject(dev) would need to be checked to see if the device is not
> > > > > gone.  If it returns success (0), one would do something to the device and
> > > > > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> > > > 
> > > > How does a device "know" it is doing something that is incompatible with
> > > > ejecting?  That's a non-trivial task from what I can tell.
> > > 
> > > I agree that this is complicated in general.  But.
> > > 
> > > There are devices known to have software "offline" and "online" operations
> > > such that after the "offline" the given device is guaranteed to be not used
> > > until "online".  We have that for CPU cores, for example, and user space can
> > > do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
> > > "online" set the no_eject flag (under the lock as appropriate) and the
> > > "offline" clear it?  And why don't we define such "online" and "offline" for
> > > all of the other "system" stuff, like memory, PCI host bridges etc. and make it
> > > behave analogously?
> > > 
> > > Then, it is quite simple to say which devices should use the no_eject flag:
> > > devices that have "online" and "offline" exported to user space.  And guess
> > > who's responsible for "offlining" all of those things before trying to eject
> > > them: user space is.  From the kernel's point of view it is all clear.  Hands
> > > clean. :-)
> > > 
> > > Now, there's a different problem how to expose all of the relevant information
> > > to user space so that it knows what to "offline" for the specific eject
> > > operation to succeed, but that's kind of separate and worth addressing
> > > anyway.
> > 
> > So, the idea is to run a user space program that off-lines all relevant
> > devices before trimming ACPI devices.  Is that right?  That sounds like
> > a worth idea to consider with.  This basically moves the "sequencer"
> > part into user space instead of the kernel space in my proposal.  I
> > agree that how to expose all of the relevant info to user space is an
> > issue.  Also, we will need to make sure that the user program always
> > runs per a kernel request and then informs a result back to the kernel,
> > so that the kernel can do the rest of an operation and inform a result
> > to FW with _OST or _EJ0.  This loop has to close.  I think it is going
> > to be more complicated than the kernel-only approach.
> 
> I actually didn't think about that.  The point is that trying to offline
> everything *synchronously* may just be pointless, because it may be
> offlined upfront, before the eject is even requested.  So the sequence
> would be to first offline things that we'll want to eject from user space
> and then to send the eject request (e.g. via sysfs too).
> 
> Eject requests from eject buttons and things like that may just fail if
> some components involved that should be offline are online.  The fact that
> we might be able to offline them synchronously if we tried doesn't matter,
> pretty much as it doesn't matter for hot-swappable disks.
> 
> You'd probably never try to hot-remove a disk before unmounting filesystems
> mounted from it or failing it as a RAID component and nobody sane wants the
> kernel to do things like that automatically when the user presses the eject
> button.  In my opinion we should treat memory eject, or CPU package eject, or
> PCI host bridge eject in exactly the same way: Don't eject if it is not
> prepared for ejecting in the first place.
> 
> And if you think about it, that makes things *massively* simpler, because now
> the kernel doesn't heed to worry about all of those "synchronous removal"
> scenarions that very well may involve every single device in the system and
> the whole problem is nicely split into several separate "implement
> offline/online" problems that are subsystem-specific and a single
> "eject if everything relevant is offline" problem which is kind of trivial.
> Plus the one of exposing information to user space, which is separate too.

Oh, I see.  Yes, it certainly makes things really simpler.  It will
bring burden to a user, but it could be solved with proper tools.  I
totally agree that I/Os should be removed beforehand.  For CPUs and
memory, it would be a bad TCE for asking a user to find a right set of
the devices to off-line, but this could be addressed with proper tools.
I think we need to check if memory block (a unit of sysfs memory
online/offline) and an ACPI memory object actually corresponds nicely.
But in high-level, this sounds like a workable plan.


> Now, each of them can be worked on separately, *tested* separately and
> debugged separately if need be and it is much easier to isolate failures
> and so on.

Right, but it is also the case with "synchronous removal" as long as we
have sysfs online interface.  The difference is that this approach only
supports sysfs interface for off-lining.


> > In addition, I am not sure if the "no_eject" flag in acpi_device is
> > really necessary here since the user program will inform the kernel if
> > all devices are off-line.  Also, the kernel will likely need to expose
> > the device info to the user program to tell which devices need to be
> > off-lined.  At that time, the kernel already knows if there is any
> > on-line device in the scope.
> 
> Well, that depends no what "the kernel" means and how it knows that.  Surely
> the "online" components have to be marked somehow so that it is easy to check
> if they are in the scope in the subsystem-independent way, so why don't we use
> something like the no_eject flag for that?

Yes, I see your point.  My previous comment assumed that the kernel
would have to obtain device info and tell a user program to off-line
them.  In such case, I thought we would have to walk thru the actual
device tree and see online/offline info anyway.  But, since we are not
doing anything like that, having the flag in acpi_device seems to be a
reasonable way to avoid dealing with the actual device tree.


Thanks,
-Toshi


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-05  0:04                               ` Greg KH
@ 2013-02-05  1:02                                 ` Rafael J. Wysocki
  2013-02-05 11:11                                 ` Rafael J. Wysocki
  1 sibling, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-05  1:02 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > You'd probably never try to hot-remove a disk before unmounting filesystems
> > mounted from it or failing it as a RAID component and nobody sane wants the
> > kernel to do things like that automatically when the user presses the eject
> > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > prepared for ejecting in the first place.
> 
> Bad example, we have disks hot-removed all the time without any
> filesystems being unmounted, and have supported this since the 2.2 days
> (although we didn't get it "right" until 2.6.)

Well, that wasn't my point.

My point was that we have tools for unmounting filesystems from disks that
the user wants to hot-remove and the user is supposed to use those tools
before hot-removing the disks.  At least I wouldn't recommend anyone to
do otherwise. :-)

Now, for memory hot-removal we don't have anything like that, as far as I
can say, so my point was why don't we add memory "offline" that can be
done and tested separately from hot-removal and use that before we go and
hot-remove stuff?  And analogously for PCI host bridges etc.?

[Now, there's a question if an "eject" button on the system case, if there is
one, should *always* cause the eject to happen even though things are not
"offline".  My opinion is that not necessarily, because users may not be aware
that they are doing something wrong.

Quite analogously, does the power button always cause the system to shut down?
No.  So why the heck should an eject button always cause an eject to happen?
I see no reason.

That said, the most straightforward approach may be simply to let user space
disable eject events for specific devices when it wants and only enable them
when it knows that the given devices are ready for removal.

But I'm digressing.]

> PCI Host bridge eject is the same as PCI eject today, the user asks us
> to do it, and we can not fail it from happening.  We also can have them
> removed without us being told about it in the first place, and can
> properly clean up from it all.

Well, are you sure we'll always clean up?  I kind of have my doubts. :-)

> > And if you think about it, that makes things *massively* simpler, because now
> > the kernel doesn't heed to worry about all of those "synchronous removal"
> > scenarions that very well may involve every single device in the system and
> > the whole problem is nicely split into several separate "implement
> > offline/online" problems that are subsystem-specific and a single
> > "eject if everything relevant is offline" problem which is kind of trivial.
> > Plus the one of exposing information to user space, which is separate too.
> > 
> > Now, each of them can be worked on separately, *tested* separately and
> > debugged separately if need be and it is much easier to isolate failures
> > and so on.
> 
> So you are agreeing with me in that we can not fail hot removing any
> device, nice :)

That depends on how you define hot-removing.  If you regard the "offline"
as a separate operation that can be carried out independently and hot-remove
as the last step causing the device to actually go away, then I agree that
it can't fail.  The "offline" itself, however, is a different matter (pretty
much like unmounting a file system).

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-05  0:04                               ` Greg KH
  2013-02-05  1:02                                 ` Rafael J. Wysocki
@ 2013-02-05 11:11                                 ` Rafael J. Wysocki
  2013-02-05 18:39                                   ` Greg KH
  1 sibling, 1 reply; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-05 11:11 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > You'd probably never try to hot-remove a disk before unmounting filesystems
> > mounted from it or failing it as a RAID component and nobody sane wants the
> > kernel to do things like that automatically when the user presses the eject
> > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > prepared for ejecting in the first place.
> 
> Bad example, we have disks hot-removed all the time without any
> filesystems being unmounted, and have supported this since the 2.2 days
> (although we didn't get it "right" until 2.6.)

I actually don't think it is really bad, because it exposes the problem nicely.

Namely, there are two arguments that can be made here.  The first one is the
usability argument: Users should always be allowed to do what they want,
because it is [explicit content] annoying if software pretends to know better
what to do than the user (it is a convenience argument too, because usually
it's *easier* to allow users to do what they want).  The second one is the
data integrity argument: Operations that may lead to data loss should never
be carried out, because it is [explicit content] disappointing to lose valuable
stuff by a stupid mistake if software allows that mistake to be made (that also
may be costly in terms of real money).

You seem to believe that we should always follow the usability argument, while
Toshi seems to be thinking that (at least in the case of the "system" devices),
the data integrity argument is more important.  They are both valid arguments,
however, and they are in conflict, so this is a matter of balance.

You're saying that in the case of disks we always follow the usability argument
entirely.  I'm fine with that, although I suspect that some people may not be
considering this as the right balance.

Toshi seems to be thinking that for the hotplug of memory/CPUs/host bridges we
should always follow the data integrity argument entirely, because the users of
that feature value their data so much that they pretty much don't care about
usability.  That very well may be the case, so I'm fine with that too, although
I'm sure there are people who'll argue that this is not the right balance
either.

Now, the point is that we *can* do what Toshi is arguing for and that doesn't
seem to be overly complicated, so my question is: Why don't we do that, at
least to start with?  If it turns out eventually that the users care about
usability too, after all, we can add a switch to adjust things more to their
liking.  Still, we can very well do that later.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-05 11:11                                 ` Rafael J. Wysocki
@ 2013-02-05 18:39                                   ` Greg KH
  2013-02-05 21:13                                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 83+ messages in thread
From: Greg KH @ 2013-02-05 18:39 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tue, Feb 05, 2013 at 12:11:17PM +0100, Rafael J. Wysocki wrote:
> On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> > On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > > You'd probably never try to hot-remove a disk before unmounting filesystems
> > > mounted from it or failing it as a RAID component and nobody sane wants the
> > > kernel to do things like that automatically when the user presses the eject
> > > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > > prepared for ejecting in the first place.
> > 
> > Bad example, we have disks hot-removed all the time without any
> > filesystems being unmounted, and have supported this since the 2.2 days
> > (although we didn't get it "right" until 2.6.)
> 
> I actually don't think it is really bad, because it exposes the problem nicely.
> 
> Namely, there are two arguments that can be made here.  The first one is the
> usability argument: Users should always be allowed to do what they want,
> because it is [explicit content] annoying if software pretends to know better
> what to do than the user (it is a convenience argument too, because usually
> it's *easier* to allow users to do what they want).  The second one is the
> data integrity argument: Operations that may lead to data loss should never
> be carried out, because it is [explicit content] disappointing to lose valuable
> stuff by a stupid mistake if software allows that mistake to be made (that also
> may be costly in terms of real money).
> 
> You seem to believe that we should always follow the usability argument, while
> Toshi seems to be thinking that (at least in the case of the "system" devices),
> the data integrity argument is more important.  They are both valid arguments,
> however, and they are in conflict, so this is a matter of balance.
> 
> You're saying that in the case of disks we always follow the usability argument
> entirely.  I'm fine with that, although I suspect that some people may not be
> considering this as the right balance.
> 
> Toshi seems to be thinking that for the hotplug of memory/CPUs/host bridges we
> should always follow the data integrity argument entirely, because the users of
> that feature value their data so much that they pretty much don't care about
> usability.  That very well may be the case, so I'm fine with that too, although
> I'm sure there are people who'll argue that this is not the right balance
> either.
> 
> Now, the point is that we *can* do what Toshi is arguing for and that doesn't
> seem to be overly complicated, so my question is: Why don't we do that, at
> least to start with?  If it turns out eventually that the users care about
> usability too, after all, we can add a switch to adjust things more to their
> liking.  Still, we can very well do that later.

Ok, I'd much rather deal with reviewing actual implementations than
talking about theory at this point in time, so let's see what you all
can come up with next and I'll be glad to review it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
  2013-02-05 18:39                                   ` Greg KH
@ 2013-02-05 21:13                                     ` Rafael J. Wysocki
  0 siblings, 0 replies; 83+ messages in thread
From: Rafael J. Wysocki @ 2013-02-05 21:13 UTC (permalink / raw)
  To: Greg KH
  Cc: Toshi Kani, lenb, akpm, linux-acpi, linux-kernel, linux-mm,
	linuxppc-dev, linux-s390, bhelgaas, isimatu.yasuaki, jiang.liu,
	wency, guohanjun, yinghai, srivatsa.bhat

On Tuesday, February 05, 2013 10:39:48 AM Greg KH wrote:
> On Tue, Feb 05, 2013 at 12:11:17PM +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 04:04:47 PM Greg KH wrote:
> > > On Tue, Feb 05, 2013 at 12:52:30AM +0100, Rafael J. Wysocki wrote:
> > > > You'd probably never try to hot-remove a disk before unmounting filesystems
> > > > mounted from it or failing it as a RAID component and nobody sane wants the
> > > > kernel to do things like that automatically when the user presses the eject
> > > > button.  In my opinion we should treat memory eject, or CPU package eject, or
> > > > PCI host bridge eject in exactly the same way: Don't eject if it is not
> > > > prepared for ejecting in the first place.
> > > 
> > > Bad example, we have disks hot-removed all the time without any
> > > filesystems being unmounted, and have supported this since the 2.2 days
> > > (although we didn't get it "right" until 2.6.)
> > 
> > I actually don't think it is really bad, because it exposes the problem nicely.
> > 
> > Namely, there are two arguments that can be made here.  The first one is the
> > usability argument: Users should always be allowed to do what they want,
> > because it is [explicit content] annoying if software pretends to know better
> > what to do than the user (it is a convenience argument too, because usually
> > it's *easier* to allow users to do what they want).  The second one is the
> > data integrity argument: Operations that may lead to data loss should never
> > be carried out, because it is [explicit content] disappointing to lose valuable
> > stuff by a stupid mistake if software allows that mistake to be made (that also
> > may be costly in terms of real money).
> > 
> > You seem to believe that we should always follow the usability argument, while
> > Toshi seems to be thinking that (at least in the case of the "system" devices),
> > the data integrity argument is more important.  They are both valid arguments,
> > however, and they are in conflict, so this is a matter of balance.
> > 
> > You're saying that in the case of disks we always follow the usability argument
> > entirely.  I'm fine with that, although I suspect that some people may not be
> > considering this as the right balance.
> > 
> > Toshi seems to be thinking that for the hotplug of memory/CPUs/host bridges we
> > should always follow the data integrity argument entirely, because the users of
> > that feature value their data so much that they pretty much don't care about
> > usability.  That very well may be the case, so I'm fine with that too, although
> > I'm sure there are people who'll argue that this is not the right balance
> > either.
> > 
> > Now, the point is that we *can* do what Toshi is arguing for and that doesn't
> > seem to be overly complicated, so my question is: Why don't we do that, at
> > least to start with?  If it turns out eventually that the users care about
> > usability too, after all, we can add a switch to adjust things more to their
> > liking.  Still, we can very well do that later.
> 
> Ok, I'd much rather deal with reviewing actual implementations than
> talking about theory at this point in time, so let's see what you all
> can come up with next and I'll be glad to review it.

Sure, thanks a lot for your comments so far!

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2013-02-05 21:07 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
2013-01-11 21:23   ` Rafael J. Wysocki
2013-01-14 15:33     ` Toshi Kani
2013-01-14 18:48       ` Rafael J. Wysocki
2013-01-14 19:02         ` Toshi Kani
2013-01-30  4:48           ` Greg KH
2013-01-31  1:15             ` Toshi Kani
2013-01-31  5:24               ` Greg KH
2013-01-31 14:42                 ` Toshi Kani
2013-01-30  4:53   ` Greg KH
2013-01-31  1:46     ` Toshi Kani
2013-01-30  4:58   ` Greg KH
2013-01-31  2:57     ` Toshi Kani
2013-01-31 20:54       ` Rafael J. Wysocki
2013-02-01  1:32         ` Toshi Kani
2013-02-01  7:30           ` Greg KH
2013-02-01 20:40             ` Toshi Kani
2013-02-01 22:21               ` Rafael J. Wysocki
2013-02-01 23:12                 ` Toshi Kani
2013-02-02 15:01               ` Greg KH
2013-02-04  0:28                 ` Toshi Kani
2013-02-04 12:46                   ` Greg KH
2013-02-04 16:46                     ` Toshi Kani
2013-02-04 19:45                       ` Rafael J. Wysocki
2013-02-04 20:59                         ` Toshi Kani
2013-02-04 23:23                           ` Rafael J. Wysocki
2013-02-04 23:33                             ` Toshi Kani
2013-02-01  7:23         ` Greg KH
2013-02-01 22:12           ` Rafael J. Wysocki
2013-02-02 14:58             ` Greg KH
2013-02-02 20:15               ` Rafael J. Wysocki
2013-02-02 22:18                 ` [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework) Rafael J. Wysocki
2013-02-04  1:24                   ` Greg KH
2013-02-04 12:34                     ` Rafael J. Wysocki
2013-02-03 20:44                 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Rafael J. Wysocki
2013-02-04 12:48                   ` Greg KH
2013-02-04 14:21                     ` Rafael J. Wysocki
2013-02-04 14:33                       ` Greg KH
2013-02-04 20:07                         ` Rafael J. Wysocki
2013-02-04 22:13                           ` Toshi Kani
2013-02-04 23:52                             ` Rafael J. Wysocki
2013-02-05  0:04                               ` Greg KH
2013-02-05  1:02                                 ` Rafael J. Wysocki
2013-02-05 11:11                                 ` Rafael J. Wysocki
2013-02-05 18:39                                   ` Greg KH
2013-02-05 21:13                                     ` Rafael J. Wysocki
2013-02-05  0:55                               ` Toshi Kani
2013-02-04 16:19                       ` Toshi Kani
2013-02-04 19:43                         ` Rafael J. Wysocki
2013-02-04  1:23                 ` Greg KH
2013-02-04 13:41                   ` Rafael J. Wysocki
2013-02-04 16:02                     ` Toshi Kani
2013-02-04 19:48                       ` Rafael J. Wysocki
2013-02-04 19:46                         ` Toshi Kani
2013-02-04 20:12                           ` Rafael J. Wysocki
2013-02-04 20:34                             ` Toshi Kani
2013-02-04 23:19                               ` Rafael J. Wysocki
2013-01-10 23:40 ` [RFC PATCH v2 02/12] ACPI: " Toshi Kani
2013-01-11 21:25   ` Rafael J. Wysocki
2013-01-14 15:53     ` Toshi Kani
2013-01-14 18:47       ` Rafael J. Wysocki
2013-01-14 18:42         ` Toshi Kani
2013-01-14 19:07           ` Rafael J. Wysocki
2013-01-14 19:21             ` Toshi Kani
2013-01-30  4:51               ` Greg KH
2013-01-31  1:38                 ` Toshi Kani
2013-01-14 19:21             ` Greg KH
2013-01-14 19:29               ` Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 03/12] drivers/base: Add " Toshi Kani
2013-01-30  4:54   ` Greg KH
2013-01-31  1:48     ` Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 04/12] cpu: Add cpu hotplug handlers Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 05/12] mm: Add memory " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 06/12] ACPI: Add ACPI bus " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 07/12] ACPI: Add ACPI resource hotplug handler Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 08/12] ACPI: Update processor driver for hotplug framework Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 09/12] ACPI: Update memory " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 10/12] ACPI: Update container " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 11/12] cpu: Update sysfs cpu/online " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 12/12] ACPI: Update sysfs eject " Toshi Kani
2013-01-17  0:50 ` [RFC PATCH v2 00/12] System device hot-plug framework Rafael J. Wysocki
2013-01-17 17:59   ` Toshi Kani

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).