All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] ACPI/IPMI: Fix several issues in the current codes
       [not found] <cover.1370652213.git.lv.zheng@intel.com>
@ 2013-07-23  8:08   ` Lv Zheng
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
  1 sibling, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:08 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patchset tries to fix the following kernel bug:
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
This is fixed by [PATCH 05].

The bug shows IPMI operation region may appear in a device not under the
IPMI system interface device's scope, thus it's required to install the
ACPI IPMI operation region handler from the root of the ACPI namespace.

The original acpi_ipmi implementation includes several issues that break
the test process.  This patchset also includes a re-design of acpi_ipmi
module to make the test possible.
 
[PATCH 01-05] are bug-fix patches that can be applied to the kernels whose
              version is > 2.6.38.  This can be confirmed with:
              # git tag --contains e92b297c
[PATCH 06] is also a bug-fix patch.
           The drivers/acpi/osl.c part can be back ported to the kernels
           whose version > 2.6.14.  This can be confirmed with:
           # git tag --contains 4be44fcd
           The drivers/acpi/acpi_ipmi.c part can be applied on top of
           [PATCH 01-05].
[PATCH 07] is a tuning patch for acpi_ipmi.c.
[PATCH 08-10] are cleanup patches for acpi_ipmi.c.
[PATCH 11] is a cleanup patch not for acpi_ipmi.c.
[PATCH 12-13] are test patches.
              [PATCH 12] may be accepted by upstream kernel as a useful
                         facility to test the loading/unloading of the
                         modules.
              [PATCH 13] should not be merged by any published kernel as it
                         is a driver for a pseudo device with a PnP ID that
                         does not exist in the real machines.

This patchset has passed the test around a fake device accessing IPMI
operation region fields on an IPMI capable platform.  A stress test of
module(acpi_ipmi) load/unload has been performed on such platform.  No
races can be found and the IPMI operation region handler is functioning
now.  It is not possible to test module(ipmi_si) load/unload as it can't be
unloaded due to its' transfer flushing implementation.

Lv Zheng (13):
  ACPI/IPMI: Fix potential response buffer overflow
  ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
  ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
  ACPI/IPMI: Fix issue caused by the per-device registration of the
    IPMI operation region handler
  ACPI/IPMI: Add reference counting for ACPI operation region handlers
  ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  ACPI/IPMI: Cleanup several acpi_ipmi_device members
  ACPI/IPMI: Cleanup some initialization codes
  ACPI/IPMI: Cleanup some inclusion codes
  ACPI/IPMI: Cleanup some Kconfig codes
  Testing: Add module load/unload test suite
  ACPI/IPMI: Add IPMI operation region test device driver

 drivers/acpi/Kconfig                          |   71 +++-
 drivers/acpi/Makefile                         |    1 +
 drivers/acpi/acpi_ipmi.c                      |  513 +++++++++++++++----------
 drivers/acpi/ipmi_test.c                      |  254 ++++++++++++
 drivers/acpi/osl.c                            |  224 +++++++++++
 include/acpi/acpi_bus.h                       |    5 +
 tools/testing/module-unloading/endless_cat.sh |   32 ++
 tools/testing/module-unloading/endless_mod.sh |   81 ++++
 8 files changed, 977 insertions(+), 204 deletions(-)
 create mode 100644 drivers/acpi/ipmi_test.c
 create mode 100755 tools/testing/module-unloading/endless_cat.sh
 create mode 100755 tools/testing/module-unloading/endless_mod.sh

-- 
1.7.10

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH 00/13] ACPI/IPMI: Fix several issues in the current codes
@ 2013-07-23  8:08   ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:08 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patchset tries to fix the following kernel bug:
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
This is fixed by [PATCH 05].

The bug shows IPMI operation region may appear in a device not under the
IPMI system interface device's scope, thus it's required to install the
ACPI IPMI operation region handler from the root of the ACPI namespace.

The original acpi_ipmi implementation includes several issues that break
the test process.  This patchset also includes a re-design of acpi_ipmi
module to make the test possible.
 
[PATCH 01-05] are bug-fix patches that can be applied to the kernels whose
              version is > 2.6.38.  This can be confirmed with:
              # git tag --contains e92b297c
[PATCH 06] is also a bug-fix patch.
           The drivers/acpi/osl.c part can be back ported to the kernels
           whose version > 2.6.14.  This can be confirmed with:
           # git tag --contains 4be44fcd
           The drivers/acpi/acpi_ipmi.c part can be applied on top of
           [PATCH 01-05].
[PATCH 07] is a tuning patch for acpi_ipmi.c.
[PATCH 08-10] are cleanup patches for acpi_ipmi.c.
[PATCH 11] is a cleanup patch not for acpi_ipmi.c.
[PATCH 12-13] are test patches.
              [PATCH 12] may be accepted by upstream kernel as a useful
                         facility to test the loading/unloading of the
                         modules.
              [PATCH 13] should not be merged by any published kernel as it
                         is a driver for a pseudo device with a PnP ID that
                         does not exist in the real machines.

This patchset has passed the test around a fake device accessing IPMI
operation region fields on an IPMI capable platform.  A stress test of
module(acpi_ipmi) load/unload has been performed on such platform.  No
races can be found and the IPMI operation region handler is functioning
now.  It is not possible to test module(ipmi_si) load/unload as it can't be
unloaded due to its' transfer flushing implementation.

Lv Zheng (13):
  ACPI/IPMI: Fix potential response buffer overflow
  ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
  ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
  ACPI/IPMI: Fix issue caused by the per-device registration of the
    IPMI operation region handler
  ACPI/IPMI: Add reference counting for ACPI operation region handlers
  ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  ACPI/IPMI: Cleanup several acpi_ipmi_device members
  ACPI/IPMI: Cleanup some initialization codes
  ACPI/IPMI: Cleanup some inclusion codes
  ACPI/IPMI: Cleanup some Kconfig codes
  Testing: Add module load/unload test suite
  ACPI/IPMI: Add IPMI operation region test device driver

 drivers/acpi/Kconfig                          |   71 +++-
 drivers/acpi/Makefile                         |    1 +
 drivers/acpi/acpi_ipmi.c                      |  513 +++++++++++++++----------
 drivers/acpi/ipmi_test.c                      |  254 ++++++++++++
 drivers/acpi/osl.c                            |  224 +++++++++++
 include/acpi/acpi_bus.h                       |    5 +
 tools/testing/module-unloading/endless_cat.sh |   32 ++
 tools/testing/module-unloading/endless_mod.sh |   81 ++++
 8 files changed, 977 insertions(+), 204 deletions(-)
 create mode 100644 drivers/acpi/ipmi_test.c
 create mode 100755 tools/testing/module-unloading/endless_cat.sh
 create mode 100755 tools/testing/module-unloading/endless_mod.sh

-- 
1.7.10


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:08     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:08 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch enhances sanity checks on message size to avoid potential buffer
overflow.

The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while the
ACPI specification defined IPMI message size is 64 bytes.  The difference
is not handled by the original codes.  This may cause crash in the response
handling codes.
This patch fixes this gap and also combines rx_data/tx_data to use single
data/len pair since they need not be seperated.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  100 ++++++++++++++++++++++++++++------------------
 1 file changed, 61 insertions(+), 39 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index f40acef..28e2b4c 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -51,6 +51,7 @@ MODULE_LICENSE("GPL");
 #define ACPI_IPMI_UNKNOWN		0x07
 /* the IPMI timeout is 5s */
 #define IPMI_TIMEOUT			(5 * HZ)
+#define ACPI_IPMI_MAX_MSG_LENGTH	64
 
 struct acpi_ipmi_device {
 	/* the device list attached to driver_data.ipmi_devices */
@@ -89,11 +90,9 @@ struct acpi_ipmi_msg {
 	struct completion tx_complete;
 	struct kernel_ipmi_msg tx_message;
 	int	msg_done;
-	/* tx data . And copy it from ACPI object buffer */
-	u8	tx_data[64];
-	int	tx_len;
-	u8	rx_data[64];
-	int	rx_len;
+	/* tx/rx data . And copy it from/to ACPI object buffer */
+	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
+	u8	rx_len;
 	struct acpi_ipmi_device *device;
 };
 
@@ -101,7 +100,7 @@ struct acpi_ipmi_msg {
 struct acpi_ipmi_buffer {
 	u8 status;
 	u8 length;
-	u8 data[64];
+	u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
 };
 
 static void ipmi_register_bmc(int iface, struct device *dev);
@@ -140,9 +139,9 @@ static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 
 #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
 #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
-static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
-				acpi_physical_address address,
-				acpi_integer *value)
+static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
+				    acpi_physical_address address,
+				    acpi_integer *value)
 {
 	struct kernel_ipmi_msg *msg;
 	struct acpi_ipmi_buffer *buffer;
@@ -155,15 +154,21 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 	 */
 	msg->netfn = IPMI_OP_RGN_NETFN(address);
 	msg->cmd = IPMI_OP_RGN_CMD(address);
-	msg->data = tx_msg->tx_data;
+	msg->data = tx_msg->data;
 	/*
 	 * value is the parameter passed by the IPMI opregion space handler.
 	 * It points to the IPMI request message buffer
 	 */
 	buffer = (struct acpi_ipmi_buffer *)value;
 	/* copy the tx message data */
+	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
+		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+			      "Unexpected request (msg len %d).\n",
+			      buffer->length);
+		return -EINVAL;
+	}
 	msg->data_len = buffer->length;
-	memcpy(tx_msg->tx_data, buffer->data, msg->data_len);
+	memcpy(tx_msg->data, buffer->data, msg->data_len);
 	/*
 	 * now the default type is SYSTEM_INTERFACE and channel type is BMC.
 	 * If the netfn is APP_REQUEST and the cmd is SEND_MESSAGE,
@@ -181,10 +186,12 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 	device->curr_msgid++;
 	tx_msg->tx_msgid = device->curr_msgid;
 	mutex_unlock(&device->tx_msg_lock);
+
+	return 0;
 }
 
 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
-		acpi_integer *value, int rem_time)
+				      acpi_integer *value, int rem_time)
 {
 	struct acpi_ipmi_buffer *buffer;
 
@@ -206,13 +213,14 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 		buffer->status = ACPI_IPMI_UNKNOWN;
 		return;
 	}
+
 	/*
 	 * If the IPMI response message is obtained correctly, the status code
 	 * will be ACPI_IPMI_OK
 	 */
 	buffer->status = ACPI_IPMI_OK;
 	buffer->length = msg->rx_len;
-	memcpy(buffer->data, msg->rx_data, msg->rx_len);
+	memcpy(buffer->data, msg->data, msg->rx_len);
 }
 
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
@@ -244,12 +252,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
 
 	if (msg->user != ipmi_device->user_interface) {
-		dev_warn(&pnp_dev->dev, "Unexpected response is returned. "
-			"returned user %p, expected user %p\n",
-			msg->user, ipmi_device->user_interface);
-		ipmi_free_recv_msg(msg);
-		return;
+		dev_warn(&pnp_dev->dev,
+			 "Unexpected response is returned. returned user %p, expected user %p\n",
+			 msg->user, ipmi_device->user_interface);
+		goto out_msg;
 	}
+
 	mutex_lock(&ipmi_device->tx_msg_lock);
 	list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
 		if (msg->msgid == tx_msg->tx_msgid) {
@@ -257,24 +265,31 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 			break;
 		}
 	}
-
 	mutex_unlock(&ipmi_device->tx_msg_lock);
+
 	if (!msg_found) {
-		dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
-			"returned.\n", msg->msgid);
-		ipmi_free_recv_msg(msg);
-		return;
+		dev_warn(&pnp_dev->dev,
+			 "Unexpected response (msg id %ld) is returned.\n",
+			 msg->msgid);
+		goto out_msg;
 	}
 
-	if (msg->msg.data_len) {
-		/* copy the response data to Rx_data buffer */
-		memcpy(tx_msg->rx_data, msg->msg_data, msg->msg.data_len);
-		tx_msg->rx_len = msg->msg.data_len;
-		tx_msg->msg_done = 1;
+	/* copy the response data to Rx_data buffer */
+	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
+		dev_WARN_ONCE(&pnp_dev->dev, true,
+			      "Unexpected response (msg len %d).\n",
+			      msg->msg.data_len);
+		goto out_comp;
 	}
+	tx_msg->rx_len = msg->msg.data_len;
+	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
+	tx_msg->msg_done = 1;
+
+out_comp:
 	complete(&tx_msg->tx_complete);
+out_msg:
 	ipmi_free_recv_msg(msg);
-};
+}
 
 static void ipmi_register_bmc(int iface, struct device *dev)
 {
@@ -353,6 +368,7 @@ static void ipmi_bmc_gone(int iface)
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 }
+
 /* --------------------------------------------------------------------------
  *			Address Space Management
  * -------------------------------------------------------------------------- */
@@ -371,13 +387,14 @@ static void ipmi_bmc_gone(int iface)
 
 static acpi_status
 acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
-		      u32 bits, acpi_integer *value,
-		      void *handler_context, void *region_context)
+			u32 bits, acpi_integer *value,
+			void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
 	struct acpi_ipmi_device *ipmi_device = handler_context;
 	int err, rem_time;
 	acpi_status status;
+
 	/*
 	 * IPMI opregion message.
 	 * IPMI message is firstly written to the BMC and system software
@@ -394,28 +411,33 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if (!tx_msg)
 		return AE_NO_MEMORY;
 
-	acpi_format_ipmi_msg(tx_msg, address, value);
+	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
+		status = AE_TYPE;
+		goto out_msg;
+	}
+
 	mutex_lock(&ipmi_device->tx_msg_lock);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
 	mutex_unlock(&ipmi_device->tx_msg_lock);
 	err = ipmi_request_settime(ipmi_device->user_interface,
-					&tx_msg->addr,
-					tx_msg->tx_msgid,
-					&tx_msg->tx_message,
-					NULL, 0, 0, 0);
+				   &tx_msg->addr,
+				   tx_msg->tx_msgid,
+				   &tx_msg->tx_message,
+				   NULL, 0, 0, 0);
 	if (err) {
 		status = AE_ERROR;
-		goto end_label;
+		goto out_list;
 	}
 	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
-					IPMI_TIMEOUT);
+					       IPMI_TIMEOUT);
 	acpi_format_ipmi_response(tx_msg, value, rem_time);
 	status = AE_OK;
 
-end_label:
+out_list:
 	mutex_lock(&ipmi_device->tx_msg_lock);
 	list_del(&tx_msg->head);
 	mutex_unlock(&ipmi_device->tx_msg_lock);
+out_msg:
 	kfree(tx_msg);
 	return status;
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow
@ 2013-07-23  8:08     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:08 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch enhances sanity checks on message size to avoid potential buffer
overflow.

The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while the
ACPI specification defined IPMI message size is 64 bytes.  The difference
is not handled by the original codes.  This may cause crash in the response
handling codes.
This patch fixes this gap and also combines rx_data/tx_data to use single
data/len pair since they need not be seperated.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  100 ++++++++++++++++++++++++++++------------------
 1 file changed, 61 insertions(+), 39 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index f40acef..28e2b4c 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -51,6 +51,7 @@ MODULE_LICENSE("GPL");
 #define ACPI_IPMI_UNKNOWN		0x07
 /* the IPMI timeout is 5s */
 #define IPMI_TIMEOUT			(5 * HZ)
+#define ACPI_IPMI_MAX_MSG_LENGTH	64
 
 struct acpi_ipmi_device {
 	/* the device list attached to driver_data.ipmi_devices */
@@ -89,11 +90,9 @@ struct acpi_ipmi_msg {
 	struct completion tx_complete;
 	struct kernel_ipmi_msg tx_message;
 	int	msg_done;
-	/* tx data . And copy it from ACPI object buffer */
-	u8	tx_data[64];
-	int	tx_len;
-	u8	rx_data[64];
-	int	rx_len;
+	/* tx/rx data . And copy it from/to ACPI object buffer */
+	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
+	u8	rx_len;
 	struct acpi_ipmi_device *device;
 };
 
@@ -101,7 +100,7 @@ struct acpi_ipmi_msg {
 struct acpi_ipmi_buffer {
 	u8 status;
 	u8 length;
-	u8 data[64];
+	u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
 };
 
 static void ipmi_register_bmc(int iface, struct device *dev);
@@ -140,9 +139,9 @@ static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 
 #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
 #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
-static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
-				acpi_physical_address address,
-				acpi_integer *value)
+static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
+				    acpi_physical_address address,
+				    acpi_integer *value)
 {
 	struct kernel_ipmi_msg *msg;
 	struct acpi_ipmi_buffer *buffer;
@@ -155,15 +154,21 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 	 */
 	msg->netfn = IPMI_OP_RGN_NETFN(address);
 	msg->cmd = IPMI_OP_RGN_CMD(address);
-	msg->data = tx_msg->tx_data;
+	msg->data = tx_msg->data;
 	/*
 	 * value is the parameter passed by the IPMI opregion space handler.
 	 * It points to the IPMI request message buffer
 	 */
 	buffer = (struct acpi_ipmi_buffer *)value;
 	/* copy the tx message data */
+	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
+		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+			      "Unexpected request (msg len %d).\n",
+			      buffer->length);
+		return -EINVAL;
+	}
 	msg->data_len = buffer->length;
-	memcpy(tx_msg->tx_data, buffer->data, msg->data_len);
+	memcpy(tx_msg->data, buffer->data, msg->data_len);
 	/*
 	 * now the default type is SYSTEM_INTERFACE and channel type is BMC.
 	 * If the netfn is APP_REQUEST and the cmd is SEND_MESSAGE,
@@ -181,10 +186,12 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 	device->curr_msgid++;
 	tx_msg->tx_msgid = device->curr_msgid;
 	mutex_unlock(&device->tx_msg_lock);
+
+	return 0;
 }
 
 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
-		acpi_integer *value, int rem_time)
+				      acpi_integer *value, int rem_time)
 {
 	struct acpi_ipmi_buffer *buffer;
 
@@ -206,13 +213,14 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 		buffer->status = ACPI_IPMI_UNKNOWN;
 		return;
 	}
+
 	/*
 	 * If the IPMI response message is obtained correctly, the status code
 	 * will be ACPI_IPMI_OK
 	 */
 	buffer->status = ACPI_IPMI_OK;
 	buffer->length = msg->rx_len;
-	memcpy(buffer->data, msg->rx_data, msg->rx_len);
+	memcpy(buffer->data, msg->data, msg->rx_len);
 }
 
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
@@ -244,12 +252,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
 
 	if (msg->user != ipmi_device->user_interface) {
-		dev_warn(&pnp_dev->dev, "Unexpected response is returned. "
-			"returned user %p, expected user %p\n",
-			msg->user, ipmi_device->user_interface);
-		ipmi_free_recv_msg(msg);
-		return;
+		dev_warn(&pnp_dev->dev,
+			 "Unexpected response is returned. returned user %p, expected user %p\n",
+			 msg->user, ipmi_device->user_interface);
+		goto out_msg;
 	}
+
 	mutex_lock(&ipmi_device->tx_msg_lock);
 	list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
 		if (msg->msgid == tx_msg->tx_msgid) {
@@ -257,24 +265,31 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 			break;
 		}
 	}
-
 	mutex_unlock(&ipmi_device->tx_msg_lock);
+
 	if (!msg_found) {
-		dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
-			"returned.\n", msg->msgid);
-		ipmi_free_recv_msg(msg);
-		return;
+		dev_warn(&pnp_dev->dev,
+			 "Unexpected response (msg id %ld) is returned.\n",
+			 msg->msgid);
+		goto out_msg;
 	}
 
-	if (msg->msg.data_len) {
-		/* copy the response data to Rx_data buffer */
-		memcpy(tx_msg->rx_data, msg->msg_data, msg->msg.data_len);
-		tx_msg->rx_len = msg->msg.data_len;
-		tx_msg->msg_done = 1;
+	/* copy the response data to Rx_data buffer */
+	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
+		dev_WARN_ONCE(&pnp_dev->dev, true,
+			      "Unexpected response (msg len %d).\n",
+			      msg->msg.data_len);
+		goto out_comp;
 	}
+	tx_msg->rx_len = msg->msg.data_len;
+	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
+	tx_msg->msg_done = 1;
+
+out_comp:
 	complete(&tx_msg->tx_complete);
+out_msg:
 	ipmi_free_recv_msg(msg);
-};
+}
 
 static void ipmi_register_bmc(int iface, struct device *dev)
 {
@@ -353,6 +368,7 @@ static void ipmi_bmc_gone(int iface)
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 }
+
 /* --------------------------------------------------------------------------
  *			Address Space Management
  * -------------------------------------------------------------------------- */
@@ -371,13 +387,14 @@ static void ipmi_bmc_gone(int iface)
 
 static acpi_status
 acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
-		      u32 bits, acpi_integer *value,
-		      void *handler_context, void *region_context)
+			u32 bits, acpi_integer *value,
+			void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
 	struct acpi_ipmi_device *ipmi_device = handler_context;
 	int err, rem_time;
 	acpi_status status;
+
 	/*
 	 * IPMI opregion message.
 	 * IPMI message is firstly written to the BMC and system software
@@ -394,28 +411,33 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if (!tx_msg)
 		return AE_NO_MEMORY;
 
-	acpi_format_ipmi_msg(tx_msg, address, value);
+	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
+		status = AE_TYPE;
+		goto out_msg;
+	}
+
 	mutex_lock(&ipmi_device->tx_msg_lock);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
 	mutex_unlock(&ipmi_device->tx_msg_lock);
 	err = ipmi_request_settime(ipmi_device->user_interface,
-					&tx_msg->addr,
-					tx_msg->tx_msgid,
-					&tx_msg->tx_message,
-					NULL, 0, 0, 0);
+				   &tx_msg->addr,
+				   tx_msg->tx_msgid,
+				   &tx_msg->tx_message,
+				   NULL, 0, 0, 0);
 	if (err) {
 		status = AE_ERROR;
-		goto end_label;
+		goto out_list;
 	}
 	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
-					IPMI_TIMEOUT);
+					       IPMI_TIMEOUT);
 	acpi_format_ipmi_response(tx_msg, value, rem_time);
 	status = AE_OK;
 
-end_label:
+out_list:
 	mutex_lock(&ipmi_device->tx_msg_lock);
 	list_del(&tx_msg->head);
 	mutex_unlock(&ipmi_device->tx_msg_lock);
+out_msg:
 	kfree(tx_msg);
 	return status;
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 02/13] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:09     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch quick fixes the issues indicated by the test results that
ipmi_msg_handler() is invoked in atomic context.

BUG: scheduling while atomic: kipmi0/18933/0x10000100
Modules linked in: ipmi_si acpi_ipmi ...
CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G       AW    3.10.0-rc7+ #2
Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010
 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8
 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4
 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8
Call Trace:
 <IRQ>  [<ffffffff814c4a1e>] dump_stack+0x19/0x1b
 [<ffffffff814bfbab>] __schedule_bug+0x46/0x54
 [<ffffffff814c73e0>] __schedule+0x83/0x59c
 [<ffffffff81058853>] __cond_resched+0x22/0x2d
 [<ffffffff814c794b>] _cond_resched+0x14/0x1d
 [<ffffffff814c6d82>] mutex_lock+0x11/0x32
 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58
 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si]
 [<ffffffff812bf6e4>] deliver_response+0x55/0x5a
 [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65
 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19
 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc
 [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si]
 ...

Known issues:
- Replacing tx_msg_lock with spinlock is not performance friendly
  Current solution works but does not have the best performance because it
  is better to make atomic context run as fast as possible.  Given there
  are no many IPMI messages created by ACPI, performance of current
  solution may be OK.  It can be better via linking ipmi_recv_msg into an
  RX message queue and process it in other contexts.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 28e2b4c..b37c189 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -39,6 +39,7 @@
 #include <linux/ipmi.h>
 #include <linux/device.h>
 #include <linux/pnp.h>
+#include <linux/spinlock.h>
 
 MODULE_AUTHOR("Zhao Yakui");
 MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
@@ -58,7 +59,7 @@ struct acpi_ipmi_device {
 	struct list_head head;
 	/* the IPMI request message list */
 	struct list_head tx_msg_list;
-	struct mutex	tx_msg_lock;
+	spinlock_t	tx_msg_lock;
 	acpi_handle handle;
 	struct pnp_dev *pnp_dev;
 	ipmi_user_t	user_interface;
@@ -146,6 +147,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	struct kernel_ipmi_msg *msg;
 	struct acpi_ipmi_buffer *buffer;
 	struct acpi_ipmi_device *device;
+	unsigned long flags;
 
 	msg = &tx_msg->tx_message;
 	/*
@@ -182,10 +184,10 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 
 	/* Get the msgid */
 	device = tx_msg->device;
-	mutex_lock(&device->tx_msg_lock);
+	spin_lock_irqsave(&device->tx_msg_lock, flags);
 	device->curr_msgid++;
 	tx_msg->tx_msgid = device->curr_msgid;
-	mutex_unlock(&device->tx_msg_lock);
+	spin_unlock_irqrestore(&device->tx_msg_lock, flags);
 
 	return 0;
 }
@@ -250,6 +252,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	int msg_found = 0;
 	struct acpi_ipmi_msg *tx_msg;
 	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+	unsigned long flags;
 
 	if (msg->user != ipmi_device->user_interface) {
 		dev_warn(&pnp_dev->dev,
@@ -258,14 +261,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		goto out_msg;
 	}
 
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
 		if (msg->msgid == tx_msg->tx_msgid) {
 			msg_found = 1;
 			break;
 		}
 	}
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev,
@@ -394,6 +397,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	struct acpi_ipmi_device *ipmi_device = handler_context;
 	int err, rem_time;
 	acpi_status status;
+	unsigned long flags;
 
 	/*
 	 * IPMI opregion message.
@@ -416,9 +420,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 		goto out_msg;
 	}
 
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 	err = ipmi_request_settime(ipmi_device->user_interface,
 				   &tx_msg->addr,
 				   tx_msg->tx_msgid,
@@ -434,9 +438,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	status = AE_OK;
 
 out_list:
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_del(&tx_msg->head);
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	kfree(tx_msg);
 	return status;
@@ -479,7 +483,7 @@ static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
 
 	INIT_LIST_HEAD(&ipmi_device->head);
 
-	mutex_init(&ipmi_device->tx_msg_lock);
+	spin_lock_init(&ipmi_device->tx_msg_lock);
 	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
 	ipmi_install_space_handler(ipmi_device);
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 02/13] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
@ 2013-07-23  8:09     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch quick fixes the issues indicated by the test results that
ipmi_msg_handler() is invoked in atomic context.

BUG: scheduling while atomic: kipmi0/18933/0x10000100
Modules linked in: ipmi_si acpi_ipmi ...
CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G       AW    3.10.0-rc7+ #2
Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010
 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8
 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4
 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8
Call Trace:
 <IRQ>  [<ffffffff814c4a1e>] dump_stack+0x19/0x1b
 [<ffffffff814bfbab>] __schedule_bug+0x46/0x54
 [<ffffffff814c73e0>] __schedule+0x83/0x59c
 [<ffffffff81058853>] __cond_resched+0x22/0x2d
 [<ffffffff814c794b>] _cond_resched+0x14/0x1d
 [<ffffffff814c6d82>] mutex_lock+0x11/0x32
 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58
 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si]
 [<ffffffff812bf6e4>] deliver_response+0x55/0x5a
 [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65
 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19
 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc
 [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si]
 ...

Known issues:
- Replacing tx_msg_lock with spinlock is not performance friendly
  Current solution works but does not have the best performance because it
  is better to make atomic context run as fast as possible.  Given there
  are no many IPMI messages created by ACPI, performance of current
  solution may be OK.  It can be better via linking ipmi_recv_msg into an
  RX message queue and process it in other contexts.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 28e2b4c..b37c189 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -39,6 +39,7 @@
 #include <linux/ipmi.h>
 #include <linux/device.h>
 #include <linux/pnp.h>
+#include <linux/spinlock.h>
 
 MODULE_AUTHOR("Zhao Yakui");
 MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
@@ -58,7 +59,7 @@ struct acpi_ipmi_device {
 	struct list_head head;
 	/* the IPMI request message list */
 	struct list_head tx_msg_list;
-	struct mutex	tx_msg_lock;
+	spinlock_t	tx_msg_lock;
 	acpi_handle handle;
 	struct pnp_dev *pnp_dev;
 	ipmi_user_t	user_interface;
@@ -146,6 +147,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	struct kernel_ipmi_msg *msg;
 	struct acpi_ipmi_buffer *buffer;
 	struct acpi_ipmi_device *device;
+	unsigned long flags;
 
 	msg = &tx_msg->tx_message;
 	/*
@@ -182,10 +184,10 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 
 	/* Get the msgid */
 	device = tx_msg->device;
-	mutex_lock(&device->tx_msg_lock);
+	spin_lock_irqsave(&device->tx_msg_lock, flags);
 	device->curr_msgid++;
 	tx_msg->tx_msgid = device->curr_msgid;
-	mutex_unlock(&device->tx_msg_lock);
+	spin_unlock_irqrestore(&device->tx_msg_lock, flags);
 
 	return 0;
 }
@@ -250,6 +252,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	int msg_found = 0;
 	struct acpi_ipmi_msg *tx_msg;
 	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+	unsigned long flags;
 
 	if (msg->user != ipmi_device->user_interface) {
 		dev_warn(&pnp_dev->dev,
@@ -258,14 +261,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		goto out_msg;
 	}
 
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
 		if (msg->msgid == tx_msg->tx_msgid) {
 			msg_found = 1;
 			break;
 		}
 	}
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev,
@@ -394,6 +397,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	struct acpi_ipmi_device *ipmi_device = handler_context;
 	int err, rem_time;
 	acpi_status status;
+	unsigned long flags;
 
 	/*
 	 * IPMI opregion message.
@@ -416,9 +420,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 		goto out_msg;
 	}
 
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 	err = ipmi_request_settime(ipmi_device->user_interface,
 				   &tx_msg->addr,
 				   tx_msg->tx_msgid,
@@ -434,9 +438,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	status = AE_OK;
 
 out_list:
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_del(&tx_msg->head);
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	kfree(tx_msg);
 	return status;
@@ -479,7 +483,7 @@ static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
 
 	INIT_LIST_HEAD(&ipmi_device->head);
 
-	mutex_init(&ipmi_device->tx_msg_lock);
+	spin_lock_init(&ipmi_device->tx_msg_lock);
 	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
 	ipmi_install_space_handler(ipmi_device);
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:09     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch fixes races caused by unprotected ACPI IPMI transfers.

We can see the following crashes may occur:
1. There is no tx_msg_lock held for iterating tx_msg_list in
   ipmi_flush_tx_msg() while it is parellel unlinked on failure in
   acpi_ipmi_space_handler() under protection of tx_msg_lock.
2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
   while it is parellel accessed in ipmi_flush_tx_msg() and
   ipmi_msg_handler().

This patch enhances tx_msg_lock to protect all tx_msg accesses to solve
this issue.  Then tx_msg_lock is always held around complete() and tx_msg
accesses.
Calling smp_wmb() before setting msg_done flag so that messages completed
due to flushing will not be handled as 'done' messages while their contents
are not vaild.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index b37c189..527ee43 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 	struct acpi_ipmi_msg *tx_msg, *temp;
 	int count = HZ / 10;
 	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
+	unsigned long flags;
 
+	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
 	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
 		/* wake up the sleep thread on the Tx msg */
 		complete(&tx_msg->tx_complete);
 	}
+	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
 
 	/* wait for about 100ms to flush the tx message list */
 	while (count--) {
@@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 			break;
 		}
 	}
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev,
 			 "Unexpected response (msg id %ld) is returned.\n",
 			 msg->msgid);
-		goto out_msg;
+		goto out_lock;
 	}
 
 	/* copy the response data to Rx_data buffer */
@@ -286,10 +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	}
 	tx_msg->rx_len = msg->msg.data_len;
 	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
+	/* tx_msg content must be valid before setting msg_done flag */
+	smp_wmb();
 	tx_msg->msg_done = 1;
 
 out_comp:
 	complete(&tx_msg->tx_complete);
+out_lock:
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	ipmi_free_recv_msg(msg);
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-23  8:09     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch fixes races caused by unprotected ACPI IPMI transfers.

We can see the following crashes may occur:
1. There is no tx_msg_lock held for iterating tx_msg_list in
   ipmi_flush_tx_msg() while it is parellel unlinked on failure in
   acpi_ipmi_space_handler() under protection of tx_msg_lock.
2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
   while it is parellel accessed in ipmi_flush_tx_msg() and
   ipmi_msg_handler().

This patch enhances tx_msg_lock to protect all tx_msg accesses to solve
this issue.  Then tx_msg_lock is always held around complete() and tx_msg
accesses.
Calling smp_wmb() before setting msg_done flag so that messages completed
due to flushing will not be handled as 'done' messages while their contents
are not vaild.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index b37c189..527ee43 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 	struct acpi_ipmi_msg *tx_msg, *temp;
 	int count = HZ / 10;
 	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
+	unsigned long flags;
 
+	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
 	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
 		/* wake up the sleep thread on the Tx msg */
 		complete(&tx_msg->tx_complete);
 	}
+	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
 
 	/* wait for about 100ms to flush the tx message list */
 	while (count--) {
@@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 			break;
 		}
 	}
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev,
 			 "Unexpected response (msg id %ld) is returned.\n",
 			 msg->msgid);
-		goto out_msg;
+		goto out_lock;
 	}
 
 	/* copy the response data to Rx_data buffer */
@@ -286,10 +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	}
 	tx_msg->rx_len = msg->msg.data_len;
 	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
+	/* tx_msg content must be valid before setting msg_done flag */
+	smp_wmb();
 	tx_msg->msg_done = 1;
 
 out_comp:
 	complete(&tx_msg->tx_complete);
+out_lock:
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	ipmi_free_recv_msg(msg);
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:09     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch uses reference counting to fix the race caused by the
unprotected ACPI IPMI user.

As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler()
can happen before setting user_interface to NULL and codes after the check
in acpi_ipmi_space_handler() can happen after user_interface becoming NULL,
then the on-going acpi_ipmi_space_handler() still can pass an invalid
acpi_ipmi_device->user_interface to ipmi_request_settime().  Such race
condition is not allowed by the IPMI layer's API design as crash will
happen in ipmi_request_settime().
In IPMI layer, smi_gone()/new_smi() callbacks are protected by
smi_watchers_mutex, thus their invocations are serialized.  But as a new
smi can re-use the freed intf_num, it requires that the callback
implementation must not use intf_num as an identification mean or it must
ensure all references to the previous smi are all dropped before exiting
smi_gone() callback.  In case of acpi_ipmi module, this means
ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are completed
before exiting ipmi_flush_tx_msg().

This patch follows ipmi_devintf.c design:
1. Invoking ipmi_destroy_user() after the reference count of
   acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
   rule on ipmi_destroy_user() and ipmi_request_settime().
2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
   this acpi_ipmi_device are all freed, this can be used to implement the
   new flushing mechanism.  Note complete() must be retried so that the
   on-going tx_msg won't block flushing at the point to add tx_msg into
   tx_msg_list where reference of acpi_ipmi_device is held.  This matches
   the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
   the list so that no new tx_msg can be created after entering flushing
   process.
4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.

The forthcoming IPMI operation region handler installation changes also
requires acpi_ipmi_device be handled in the reference counting style.

Authorship is also updated due to this design change.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  249 +++++++++++++++++++++++++++-------------------
 1 file changed, 149 insertions(+), 100 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 527ee43..cbf25e0 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -1,8 +1,9 @@
 /*
  *  acpi_ipmi.c - ACPI IPMI opregion
  *
- *  Copyright (C) 2010 Intel Corporation
- *  Copyright (C) 2010 Zhao Yakui <yakui.zhao@intel.com>
+ *  Copyright (C) 2010, 2013 Intel Corporation
+ *    Author: Zhao Yakui <yakui.zhao@intel.com>
+ *            Lv Zheng <lv.zheng@intel.com>
  *
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  *
@@ -67,6 +68,7 @@ struct acpi_ipmi_device {
 	long curr_msgid;
 	unsigned long flags;
 	struct ipmi_smi_info smi_data;
+	atomic_t refcnt;
 };
 
 struct ipmi_driver_data {
@@ -107,8 +109,8 @@ struct acpi_ipmi_buffer {
 static void ipmi_register_bmc(int iface, struct device *dev);
 static void ipmi_bmc_gone(int iface);
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device);
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
+static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
+static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
 
 static struct ipmi_driver_data driver_data = {
 	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
 	},
 };
 
+static struct acpi_ipmi_device *
+ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+{
+	struct acpi_ipmi_device *ipmi_device;
+	int err;
+	ipmi_user_t user;
+
+	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
+	if (!ipmi_device)
+		return NULL;
+
+	atomic_set(&ipmi_device->refcnt, 1);
+	INIT_LIST_HEAD(&ipmi_device->head);
+	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
+	spin_lock_init(&ipmi_device->tx_msg_lock);
+
+	ipmi_device->handle = handle;
+	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
+	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+	ipmi_device->ipmi_ifnum = iface;
+
+	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
+			       ipmi_device, &user);
+	if (err) {
+		put_device(smi_data->dev);
+		kfree(ipmi_device);
+		return NULL;
+	}
+	ipmi_device->user_interface = user;
+	ipmi_install_space_handler(ipmi_device);
+
+	return ipmi_device;
+}
+
+static struct acpi_ipmi_device *
+acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
+{
+	if (ipmi_device)
+		atomic_inc(&ipmi_device->refcnt);
+	return ipmi_device;
+}
+
+static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
+{
+	ipmi_remove_space_handler(ipmi_device);
+	ipmi_destroy_user(ipmi_device->user_interface);
+	put_device(ipmi_device->smi_data.dev);
+	kfree(ipmi_device);
+}
+
+static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
+{
+	if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
+		ipmi_dev_release(ipmi_device);
+}
+
+static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
+{
+	int dev_found = 0;
+	struct acpi_ipmi_device *ipmi_device;
+
+	mutex_lock(&driver_data.ipmi_lock);
+	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
+		if (ipmi_device->ipmi_ifnum == iface) {
+			dev_found = 1;
+			acpi_ipmi_dev_get(ipmi_device);
+			break;
+		}
+	}
+	mutex_unlock(&driver_data.ipmi_lock);
+
+	return dev_found ? ipmi_device : NULL;
+}
+
 static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 {
 	struct acpi_ipmi_msg *ipmi_msg;
@@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 {
 	struct acpi_ipmi_msg *tx_msg, *temp;
-	int count = HZ / 10;
-	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 	unsigned long flags;
 
-	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
-	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
-		/* wake up the sleep thread on the Tx msg */
-		complete(&tx_msg->tx_complete);
-	}
-	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
-
-	/* wait for about 100ms to flush the tx message list */
-	while (count--) {
-		if (list_empty(&ipmi->tx_msg_list))
-			break;
-		schedule_timeout(1);
+	/*
+	 * NOTE: Synchronous Flushing
+	 * Wait until refnct dropping to 1 - no other users unless this
+	 * context.  This function should always be called before
+	 * acpi_ipmi_device destruction.
+	 */
+	while (atomic_read(&ipmi->refcnt) > 1) {
+		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+		list_for_each_entry_safe(tx_msg, temp,
+					 &ipmi->tx_msg_list, head) {
+			/* wake up the sleep thread on the Tx msg */
+			complete(&tx_msg->tx_complete);
+		}
+		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
 	}
-	if (!list_empty(&ipmi->tx_msg_list))
-		dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
 }
 
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
@@ -304,22 +379,26 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 {
 	struct acpi_ipmi_device *ipmi_device, *temp;
 	struct pnp_dev *pnp_dev;
-	ipmi_user_t		user;
 	int err;
 	struct ipmi_smi_info smi_data;
 	acpi_handle handle;
 
 	err = ipmi_get_smi_info(iface, &smi_data);
-
 	if (err)
 		return;
 
-	if (smi_data.addr_src != SI_ACPI) {
-		put_device(smi_data.dev);
-		return;
-	}
-
+	if (smi_data.addr_src != SI_ACPI)
+		goto err_ref;
 	handle = smi_data.addr_info.acpi_info.acpi_handle;
+	if (!handle)
+		goto err_ref;
+	pnp_dev = to_pnp_dev(smi_data.dev);
+
+	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+	if (!ipmi_device) {
+		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+		goto err_ref;
+	}
 
 	mutex_lock(&driver_data.ipmi_lock);
 	list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
@@ -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 		 * to the device list, don't add it again.
 		 */
 		if (temp->handle == handle)
-			goto out;
+			goto err_lock;
 	}
 
-	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
-
-	if (!ipmi_device)
-		goto out;
-
-	pnp_dev = to_pnp_dev(smi_data.dev);
-	ipmi_device->handle = handle;
-	ipmi_device->pnp_dev = pnp_dev;
-
-	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
-					ipmi_device, &user);
-	if (err) {
-		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
-		kfree(ipmi_device);
-		goto out;
-	}
-	acpi_add_ipmi_device(ipmi_device);
-	ipmi_device->user_interface = user;
-	ipmi_device->ipmi_ifnum = iface;
+	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
 	mutex_unlock(&driver_data.ipmi_lock);
-	memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct ipmi_smi_info));
+	put_device(smi_data.dev);
 	return;
 
-out:
+err_lock:
 	mutex_unlock(&driver_data.ipmi_lock);
+	ipmi_dev_release(ipmi_device);
+err_ref:
 	put_device(smi_data.dev);
 	return;
 }
 
 static void ipmi_bmc_gone(int iface)
 {
-	struct acpi_ipmi_device *ipmi_device, *temp;
+	int dev_found = 0;
+	struct acpi_ipmi_device *ipmi_device;
 
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry_safe(ipmi_device, temp,
-				&driver_data.ipmi_devices, head) {
-		if (ipmi_device->ipmi_ifnum != iface)
-			continue;
-
-		acpi_remove_ipmi_device(ipmi_device);
-		put_device(ipmi_device->smi_data.dev);
-		kfree(ipmi_device);
-		break;
+	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
+		if (ipmi_device->ipmi_ifnum == iface) {
+			dev_found = 1;
+			break;
+		}
 	}
+	if (dev_found)
+		list_del(&ipmi_device->head);
 	mutex_unlock(&driver_data.ipmi_lock);
+
+	if (dev_found) {
+		ipmi_flush_tx_msg(ipmi_device);
+		acpi_ipmi_dev_put(ipmi_device);
+	}
 }
 
 /* --------------------------------------------------------------------------
@@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 			void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
-	struct acpi_ipmi_device *ipmi_device = handler_context;
+	int iface = (long)handler_context;
+	struct acpi_ipmi_device *ipmi_device;
 	int err, rem_time;
 	acpi_status status;
 	unsigned long flags;
@@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	if (!ipmi_device->user_interface)
+	ipmi_device = acpi_ipmi_get_targeted_smi(iface);
+	if (!ipmi_device)
 		return AE_NOT_EXIST;
 
 	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
-	if (!tx_msg)
-		return AE_NO_MEMORY;
+	if (!tx_msg) {
+		status = AE_NO_MEMORY;
+		goto out_ref;
+	}
 
 	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
 		status = AE_TYPE;
@@ -449,6 +520,8 @@ out_list:
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	kfree(tx_msg);
+out_ref:
+	acpi_ipmi_dev_put(ipmi_device);
 	return status;
 }
 
@@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
 	status = acpi_install_address_space_handler(ipmi->handle,
 						    ACPI_ADR_SPACE_IPMI,
 						    &acpi_ipmi_space_handler,
-						    NULL, ipmi);
+						    NULL, (void *)((long)ipmi->ipmi_ifnum));
 	if (ACPI_FAILURE(status)) {
 		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
@@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
 	return 0;
 }
 
-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
-
-	INIT_LIST_HEAD(&ipmi_device->head);
-
-	spin_lock_init(&ipmi_device->tx_msg_lock);
-	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
-	ipmi_install_space_handler(ipmi_device);
-
-	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
-}
-
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
-	/*
-	 * If the IPMI user interface is created, it should be
-	 * destroyed.
-	 */
-	if (ipmi_device->user_interface) {
-		ipmi_destroy_user(ipmi_device->user_interface);
-		ipmi_device->user_interface = NULL;
-	}
-	/* flush the Tx_msg list */
-	if (!list_empty(&ipmi_device->tx_msg_list))
-		ipmi_flush_tx_msg(ipmi_device);
-
-	list_del(&ipmi_device->head);
-	ipmi_remove_space_handler(ipmi_device);
-}
-
 static int __init acpi_ipmi_init(void)
 {
 	int result = 0;
@@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)
 
 static void __exit acpi_ipmi_exit(void)
 {
-	struct acpi_ipmi_device *ipmi_device, *temp;
+	struct acpi_ipmi_device *ipmi_device;
 
 	if (acpi_disabled)
 		return;
@@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
 	 * handler and free it.
 	 */
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry_safe(ipmi_device, temp,
-				&driver_data.ipmi_devices, head) {
-		acpi_remove_ipmi_device(ipmi_device);
-		put_device(ipmi_device->smi_data.dev);
-		kfree(ipmi_device);
+	while (!list_empty(&driver_data.ipmi_devices)) {
+		ipmi_device = list_first_entry(&driver_data.ipmi_devices,
+					       struct acpi_ipmi_device,
+					       head);
+		list_del(&ipmi_device->head);
+		mutex_unlock(&driver_data.ipmi_lock);
+
+		ipmi_flush_tx_msg(ipmi_device);
+		acpi_ipmi_dev_put(ipmi_device);
+
+		mutex_lock(&driver_data.ipmi_lock);
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 }
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
@ 2013-07-23  8:09     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch uses reference counting to fix the race caused by the
unprotected ACPI IPMI user.

As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler()
can happen before setting user_interface to NULL and codes after the check
in acpi_ipmi_space_handler() can happen after user_interface becoming NULL,
then the on-going acpi_ipmi_space_handler() still can pass an invalid
acpi_ipmi_device->user_interface to ipmi_request_settime().  Such race
condition is not allowed by the IPMI layer's API design as crash will
happen in ipmi_request_settime().
In IPMI layer, smi_gone()/new_smi() callbacks are protected by
smi_watchers_mutex, thus their invocations are serialized.  But as a new
smi can re-use the freed intf_num, it requires that the callback
implementation must not use intf_num as an identification mean or it must
ensure all references to the previous smi are all dropped before exiting
smi_gone() callback.  In case of acpi_ipmi module, this means
ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are completed
before exiting ipmi_flush_tx_msg().

This patch follows ipmi_devintf.c design:
1. Invoking ipmi_destroy_user() after the reference count of
   acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
   rule on ipmi_destroy_user() and ipmi_request_settime().
2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
   this acpi_ipmi_device are all freed, this can be used to implement the
   new flushing mechanism.  Note complete() must be retried so that the
   on-going tx_msg won't block flushing at the point to add tx_msg into
   tx_msg_list where reference of acpi_ipmi_device is held.  This matches
   the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
   the list so that no new tx_msg can be created after entering flushing
   process.
4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.

The forthcoming IPMI operation region handler installation changes also
requires acpi_ipmi_device be handled in the reference counting style.

Authorship is also updated due to this design change.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  249 +++++++++++++++++++++++++++-------------------
 1 file changed, 149 insertions(+), 100 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 527ee43..cbf25e0 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -1,8 +1,9 @@
 /*
  *  acpi_ipmi.c - ACPI IPMI opregion
  *
- *  Copyright (C) 2010 Intel Corporation
- *  Copyright (C) 2010 Zhao Yakui <yakui.zhao@intel.com>
+ *  Copyright (C) 2010, 2013 Intel Corporation
+ *    Author: Zhao Yakui <yakui.zhao@intel.com>
+ *            Lv Zheng <lv.zheng@intel.com>
  *
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  *
@@ -67,6 +68,7 @@ struct acpi_ipmi_device {
 	long curr_msgid;
 	unsigned long flags;
 	struct ipmi_smi_info smi_data;
+	atomic_t refcnt;
 };
 
 struct ipmi_driver_data {
@@ -107,8 +109,8 @@ struct acpi_ipmi_buffer {
 static void ipmi_register_bmc(int iface, struct device *dev);
 static void ipmi_bmc_gone(int iface);
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device);
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
+static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
+static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
 
 static struct ipmi_driver_data driver_data = {
 	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
 	},
 };
 
+static struct acpi_ipmi_device *
+ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+{
+	struct acpi_ipmi_device *ipmi_device;
+	int err;
+	ipmi_user_t user;
+
+	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
+	if (!ipmi_device)
+		return NULL;
+
+	atomic_set(&ipmi_device->refcnt, 1);
+	INIT_LIST_HEAD(&ipmi_device->head);
+	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
+	spin_lock_init(&ipmi_device->tx_msg_lock);
+
+	ipmi_device->handle = handle;
+	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
+	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+	ipmi_device->ipmi_ifnum = iface;
+
+	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
+			       ipmi_device, &user);
+	if (err) {
+		put_device(smi_data->dev);
+		kfree(ipmi_device);
+		return NULL;
+	}
+	ipmi_device->user_interface = user;
+	ipmi_install_space_handler(ipmi_device);
+
+	return ipmi_device;
+}
+
+static struct acpi_ipmi_device *
+acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
+{
+	if (ipmi_device)
+		atomic_inc(&ipmi_device->refcnt);
+	return ipmi_device;
+}
+
+static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
+{
+	ipmi_remove_space_handler(ipmi_device);
+	ipmi_destroy_user(ipmi_device->user_interface);
+	put_device(ipmi_device->smi_data.dev);
+	kfree(ipmi_device);
+}
+
+static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
+{
+	if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
+		ipmi_dev_release(ipmi_device);
+}
+
+static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
+{
+	int dev_found = 0;
+	struct acpi_ipmi_device *ipmi_device;
+
+	mutex_lock(&driver_data.ipmi_lock);
+	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
+		if (ipmi_device->ipmi_ifnum == iface) {
+			dev_found = 1;
+			acpi_ipmi_dev_get(ipmi_device);
+			break;
+		}
+	}
+	mutex_unlock(&driver_data.ipmi_lock);
+
+	return dev_found ? ipmi_device : NULL;
+}
+
 static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 {
 	struct acpi_ipmi_msg *ipmi_msg;
@@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 {
 	struct acpi_ipmi_msg *tx_msg, *temp;
-	int count = HZ / 10;
-	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 	unsigned long flags;
 
-	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
-	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
-		/* wake up the sleep thread on the Tx msg */
-		complete(&tx_msg->tx_complete);
-	}
-	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
-
-	/* wait for about 100ms to flush the tx message list */
-	while (count--) {
-		if (list_empty(&ipmi->tx_msg_list))
-			break;
-		schedule_timeout(1);
+	/*
+	 * NOTE: Synchronous Flushing
+	 * Wait until refnct dropping to 1 - no other users unless this
+	 * context.  This function should always be called before
+	 * acpi_ipmi_device destruction.
+	 */
+	while (atomic_read(&ipmi->refcnt) > 1) {
+		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+		list_for_each_entry_safe(tx_msg, temp,
+					 &ipmi->tx_msg_list, head) {
+			/* wake up the sleep thread on the Tx msg */
+			complete(&tx_msg->tx_complete);
+		}
+		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
 	}
-	if (!list_empty(&ipmi->tx_msg_list))
-		dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
 }
 
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
@@ -304,22 +379,26 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 {
 	struct acpi_ipmi_device *ipmi_device, *temp;
 	struct pnp_dev *pnp_dev;
-	ipmi_user_t		user;
 	int err;
 	struct ipmi_smi_info smi_data;
 	acpi_handle handle;
 
 	err = ipmi_get_smi_info(iface, &smi_data);
-
 	if (err)
 		return;
 
-	if (smi_data.addr_src != SI_ACPI) {
-		put_device(smi_data.dev);
-		return;
-	}
-
+	if (smi_data.addr_src != SI_ACPI)
+		goto err_ref;
 	handle = smi_data.addr_info.acpi_info.acpi_handle;
+	if (!handle)
+		goto err_ref;
+	pnp_dev = to_pnp_dev(smi_data.dev);
+
+	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+	if (!ipmi_device) {
+		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+		goto err_ref;
+	}
 
 	mutex_lock(&driver_data.ipmi_lock);
 	list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
@@ -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 		 * to the device list, don't add it again.
 		 */
 		if (temp->handle == handle)
-			goto out;
+			goto err_lock;
 	}
 
-	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
-
-	if (!ipmi_device)
-		goto out;
-
-	pnp_dev = to_pnp_dev(smi_data.dev);
-	ipmi_device->handle = handle;
-	ipmi_device->pnp_dev = pnp_dev;
-
-	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
-					ipmi_device, &user);
-	if (err) {
-		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
-		kfree(ipmi_device);
-		goto out;
-	}
-	acpi_add_ipmi_device(ipmi_device);
-	ipmi_device->user_interface = user;
-	ipmi_device->ipmi_ifnum = iface;
+	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
 	mutex_unlock(&driver_data.ipmi_lock);
-	memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct ipmi_smi_info));
+	put_device(smi_data.dev);
 	return;
 
-out:
+err_lock:
 	mutex_unlock(&driver_data.ipmi_lock);
+	ipmi_dev_release(ipmi_device);
+err_ref:
 	put_device(smi_data.dev);
 	return;
 }
 
 static void ipmi_bmc_gone(int iface)
 {
-	struct acpi_ipmi_device *ipmi_device, *temp;
+	int dev_found = 0;
+	struct acpi_ipmi_device *ipmi_device;
 
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry_safe(ipmi_device, temp,
-				&driver_data.ipmi_devices, head) {
-		if (ipmi_device->ipmi_ifnum != iface)
-			continue;
-
-		acpi_remove_ipmi_device(ipmi_device);
-		put_device(ipmi_device->smi_data.dev);
-		kfree(ipmi_device);
-		break;
+	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
+		if (ipmi_device->ipmi_ifnum == iface) {
+			dev_found = 1;
+			break;
+		}
 	}
+	if (dev_found)
+		list_del(&ipmi_device->head);
 	mutex_unlock(&driver_data.ipmi_lock);
+
+	if (dev_found) {
+		ipmi_flush_tx_msg(ipmi_device);
+		acpi_ipmi_dev_put(ipmi_device);
+	}
 }
 
 /* --------------------------------------------------------------------------
@@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 			void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
-	struct acpi_ipmi_device *ipmi_device = handler_context;
+	int iface = (long)handler_context;
+	struct acpi_ipmi_device *ipmi_device;
 	int err, rem_time;
 	acpi_status status;
 	unsigned long flags;
@@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	if (!ipmi_device->user_interface)
+	ipmi_device = acpi_ipmi_get_targeted_smi(iface);
+	if (!ipmi_device)
 		return AE_NOT_EXIST;
 
 	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
-	if (!tx_msg)
-		return AE_NO_MEMORY;
+	if (!tx_msg) {
+		status = AE_NO_MEMORY;
+		goto out_ref;
+	}
 
 	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
 		status = AE_TYPE;
@@ -449,6 +520,8 @@ out_list:
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	kfree(tx_msg);
+out_ref:
+	acpi_ipmi_dev_put(ipmi_device);
 	return status;
 }
 
@@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
 	status = acpi_install_address_space_handler(ipmi->handle,
 						    ACPI_ADR_SPACE_IPMI,
 						    &acpi_ipmi_space_handler,
-						    NULL, ipmi);
+						    NULL, (void *)((long)ipmi->ipmi_ifnum));
 	if (ACPI_FAILURE(status)) {
 		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
@@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
 	return 0;
 }
 
-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
-
-	INIT_LIST_HEAD(&ipmi_device->head);
-
-	spin_lock_init(&ipmi_device->tx_msg_lock);
-	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
-	ipmi_install_space_handler(ipmi_device);
-
-	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
-}
-
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
-	/*
-	 * If the IPMI user interface is created, it should be
-	 * destroyed.
-	 */
-	if (ipmi_device->user_interface) {
-		ipmi_destroy_user(ipmi_device->user_interface);
-		ipmi_device->user_interface = NULL;
-	}
-	/* flush the Tx_msg list */
-	if (!list_empty(&ipmi_device->tx_msg_list))
-		ipmi_flush_tx_msg(ipmi_device);
-
-	list_del(&ipmi_device->head);
-	ipmi_remove_space_handler(ipmi_device);
-}
-
 static int __init acpi_ipmi_init(void)
 {
 	int result = 0;
@@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)
 
 static void __exit acpi_ipmi_exit(void)
 {
-	struct acpi_ipmi_device *ipmi_device, *temp;
+	struct acpi_ipmi_device *ipmi_device;
 
 	if (acpi_disabled)
 		return;
@@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
 	 * handler and free it.
 	 */
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry_safe(ipmi_device, temp,
-				&driver_data.ipmi_devices, head) {
-		acpi_remove_ipmi_device(ipmi_device);
-		put_device(ipmi_device->smi_data.dev);
-		kfree(ipmi_device);
+	while (!list_empty(&driver_data.ipmi_devices)) {
+		ipmi_device = list_first_entry(&driver_data.ipmi_devices,
+					       struct acpi_ipmi_device,
+					       head);
+		list_del(&ipmi_device->head);
+		mutex_unlock(&driver_data.ipmi_lock);
+
+		ipmi_flush_tx_msg(ipmi_device);
+		acpi_ipmi_dev_put(ipmi_device);
+
+		mutex_lock(&driver_data.ipmi_lock);
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 05/13] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:09     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

It is found on a real machine, in its ACPI namespace, the IPMI
OperationRegions (in the ACPI000D - ACPI power meter) are not defined under
the IPMI system interface device (the IPI0001 with KCS type returned from
_IFT control method):
  Device (PMI0)
  {
      Name (_HID, "ACPI000D")  // _HID: Hardware ID
      OperationRegion (SYSI, IPMI, 0x0600, 0x0100)
      Field (SYSI, BufferAcc, Lock, Preserve)
      {
          AccessAs (BufferAcc, 0x01),
          Offset (0x58),
          SCMD,   8,
          GCMD,   8
      }

      OperationRegion (POWR, IPMI, 0x3000, 0x0100)
      Field (POWR, BufferAcc, Lock, Preserve)
      {
          AccessAs (BufferAcc, 0x01),
          Offset (0xB3),
          GPMM,   8
      }
  }

  Device (PCI0)
  {
      Device (ISA)
      {
          Device (NIPM)
          {
              Name (_HID, EisaId ("IPI0001"))  // _HID: Hardware ID
              Method (_IFT, 0, NotSerialized)  // _IFT: IPMI Interface Type
              {
                  Return (0x01)
              }
          }
      }
  }
Current ACPI_IPMI code registers IPMI operation region handler on a
per-device basis, so that for above namespace, the IPMI operation region
handler is registered only under the scope of \_SB.PCI0.ISA.NIPM.  Thus
when an IPMI operation region field of \PMI0 is accessed, there are errors
reported on such platform:
  ACPI Error: No handlers for Region [IPMI]
  ACPI Error: Region IPMI(7) has no handler
The solution is to install IPMI operation region handler from root node so
that every object that defines IPMI OperationRegion can get an address
space handler registered.

When an IPMI operation region field is accessed, the Network Function
(0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are
passed to the operation region handler, there is no system interface
specified by the BIOS.  The patch tries to select one system interface by
monitoring the system interface notification.  IPMI messages passed from
the ACPI codes are sent to this selected global IPMI system interface.

Known issues:
- How to select the IPMI system interface:
  Currently, the ACPI_IPMI always selects the first registered one with the
  ACPI handle set (i.e., defined in the ACPI namespace).  It's hard to
  determine the selection when there are multiple IPMI system interfaces
  defined in the ACPI namespace.
  According to the IPMI specification:
  A BMC device may make available multiple system interfaces, but only one
  management controller is allowed to be 'active' BMC that provides BMC
  functionality for the system (in case of a 'partitioned' system, there
  can be only one active BMC per partition).  Only the system interface(s)
  for the active BMC allowed to respond to the 'Get Device Id' command.
  According to the ipmi_si desigin:
  The ipmi_si registeration notifications can only happen after a
  successful "Get Device ID" command.
  Thus it should be OK for non-partitioned systems to do such selection.
  But we do not have too much knowledges on 'partitioned' systems.
- Lack of smi_gone()/new_smi() testability:
  It is not possible to do module(ipmi_si) load/unload test, and I can't
  find any multiple IPMI system interfaces platforms available for testing.
  There might be issues in the untested code path.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  111 +++++++++++++++++++++++-----------------------
 1 file changed, 55 insertions(+), 56 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index cbf25e0..5f8f495 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -46,7 +46,8 @@ MODULE_AUTHOR("Zhao Yakui");
 MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
 MODULE_LICENSE("GPL");
 
-#define IPMI_FLAGS_HANDLER_INSTALL	0
+#undef PREFIX
+#define PREFIX				"ACPI: IPMI: "
 
 #define ACPI_IPMI_OK			0
 #define ACPI_IPMI_TIMEOUT		0x10
@@ -66,7 +67,6 @@ struct acpi_ipmi_device {
 	ipmi_user_t	user_interface;
 	int ipmi_ifnum; /* IPMI interface number */
 	long curr_msgid;
-	unsigned long flags;
 	struct ipmi_smi_info smi_data;
 	atomic_t refcnt;
 };
@@ -76,6 +76,14 @@ struct ipmi_driver_data {
 	struct ipmi_smi_watcher	bmc_events;
 	struct ipmi_user_hndl	ipmi_hndlrs;
 	struct mutex		ipmi_lock;
+	/*
+	 * NOTE: IPMI System Interface Selection
+	 * There is no system interface specified by the IPMI operation
+	 * region access.  We try to select one system interface with ACPI
+	 * handle set.  IPMI messages passed from the ACPI codes are sent
+	 * to this selected global IPMI system interface.
+	 */
+	struct acpi_ipmi_device *selected_smi;
 };
 
 struct acpi_ipmi_msg {
@@ -109,8 +117,6 @@ struct acpi_ipmi_buffer {
 static void ipmi_register_bmc(int iface, struct device *dev);
 static void ipmi_bmc_gone(int iface);
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
 
 static struct ipmi_driver_data driver_data = {
 	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -153,7 +159,6 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
 		return NULL;
 	}
 	ipmi_device->user_interface = user;
-	ipmi_install_space_handler(ipmi_device);
 
 	return ipmi_device;
 }
@@ -168,7 +173,6 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
 
 static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
 {
-	ipmi_remove_space_handler(ipmi_device);
 	ipmi_destroy_user(ipmi_device->user_interface);
 	put_device(ipmi_device->smi_data.dev);
 	kfree(ipmi_device);
@@ -180,22 +184,15 @@ static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
 		ipmi_dev_release(ipmi_device);
 }
 
-static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
+static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
 {
-	int dev_found = 0;
 	struct acpi_ipmi_device *ipmi_device;
 
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
-		if (ipmi_device->ipmi_ifnum == iface) {
-			dev_found = 1;
-			acpi_ipmi_dev_get(ipmi_device);
-			break;
-		}
-	}
+	ipmi_device = acpi_ipmi_dev_get(driver_data.selected_smi);
 	mutex_unlock(&driver_data.ipmi_lock);
 
-	return dev_found ? ipmi_device : NULL;
+	return ipmi_device;
 }
 
 static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
@@ -410,6 +407,9 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 			goto err_lock;
 	}
 
+	if (!driver_data.selected_smi)
+		driver_data.selected_smi = acpi_ipmi_dev_get(ipmi_device);
+
 	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
 	mutex_unlock(&driver_data.ipmi_lock);
 	put_device(smi_data.dev);
@@ -426,22 +426,34 @@ err_ref:
 static void ipmi_bmc_gone(int iface)
 {
 	int dev_found = 0;
-	struct acpi_ipmi_device *ipmi_device;
+	struct acpi_ipmi_device *ipmi_gone, *ipmi_new;
 
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
-		if (ipmi_device->ipmi_ifnum == iface) {
+	list_for_each_entry(ipmi_gone, &driver_data.ipmi_devices, head) {
+		if (ipmi_gone->ipmi_ifnum == iface) {
 			dev_found = 1;
 			break;
 		}
 	}
-	if (dev_found)
-		list_del(&ipmi_device->head);
+	if (dev_found) {
+		list_del(&ipmi_gone->head);
+		if (driver_data.selected_smi == ipmi_gone) {
+			acpi_ipmi_dev_put(ipmi_gone);
+			driver_data.selected_smi = NULL;
+		}
+	}
+	if (!driver_data.selected_smi &&
+	    !list_empty(&driver_data.ipmi_devices)) {
+		ipmi_new = list_first_entry(&driver_data.ipmi_devices,
+					    struct acpi_ipmi_device,
+					    head);
+		driver_data.selected_smi = acpi_ipmi_dev_get(ipmi_new);
+	}
 	mutex_unlock(&driver_data.ipmi_lock);
 
 	if (dev_found) {
-		ipmi_flush_tx_msg(ipmi_device);
-		acpi_ipmi_dev_put(ipmi_device);
+		ipmi_flush_tx_msg(ipmi_gone);
+		acpi_ipmi_dev_put(ipmi_gone);
 	}
 }
 
@@ -467,7 +479,6 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 			void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
-	int iface = (long)handler_context;
 	struct acpi_ipmi_device *ipmi_device;
 	int err, rem_time;
 	acpi_status status;
@@ -482,7 +493,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	ipmi_device = acpi_ipmi_get_targeted_smi(iface);
+	ipmi_device = acpi_ipmi_get_selected_smi();
 	if (!ipmi_device)
 		return AE_NOT_EXIST;
 
@@ -525,48 +536,28 @@ out_ref:
 	return status;
 }
 
-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi)
-{
-	if (!test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
-		return;
-
-	acpi_remove_address_space_handler(ipmi->handle,
-				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler);
-
-	clear_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
-}
-
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
+static int __init acpi_ipmi_init(void)
 {
+	int result = 0;
 	acpi_status status;
 
-	if (test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
-		return 0;
+	if (acpi_disabled)
+		return result;
 
-	status = acpi_install_address_space_handler(ipmi->handle,
+	mutex_init(&driver_data.ipmi_lock);
+
+	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
 						    ACPI_ADR_SPACE_IPMI,
 						    &acpi_ipmi_space_handler,
-						    NULL, (void *)((long)ipmi->ipmi_ifnum));
+						    NULL, NULL);
 	if (ACPI_FAILURE(status)) {
-		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
-		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
-			"handle\n");
+		pr_warn("Can't register IPMI opregion space handle\n");
 		return -EINVAL;
 	}
-	set_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
-	return 0;
-}
-
-static int __init acpi_ipmi_init(void)
-{
-	int result = 0;
-
-	if (acpi_disabled)
-		return result;
-
-	mutex_init(&driver_data.ipmi_lock);
 
 	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
+	if (result)
+		pr_err("Can't register IPMI system interface watcher\n");
 
 	return result;
 }
@@ -592,6 +583,10 @@ static void __exit acpi_ipmi_exit(void)
 					       struct acpi_ipmi_device,
 					       head);
 		list_del(&ipmi_device->head);
+		if (ipmi_device == driver_data.selected_smi) {
+			acpi_ipmi_dev_put(driver_data.selected_smi);
+			driver_data.selected_smi = NULL;
+		}
 		mutex_unlock(&driver_data.ipmi_lock);
 
 		ipmi_flush_tx_msg(ipmi_device);
@@ -600,6 +595,10 @@ static void __exit acpi_ipmi_exit(void)
 		mutex_lock(&driver_data.ipmi_lock);
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
+
+	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
+					  ACPI_ADR_SPACE_IPMI,
+					  &acpi_ipmi_space_handler);
 }
 
 module_init(acpi_ipmi_init);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 05/13] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler
@ 2013-07-23  8:09     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

It is found on a real machine, in its ACPI namespace, the IPMI
OperationRegions (in the ACPI000D - ACPI power meter) are not defined under
the IPMI system interface device (the IPI0001 with KCS type returned from
_IFT control method):
  Device (PMI0)
  {
      Name (_HID, "ACPI000D")  // _HID: Hardware ID
      OperationRegion (SYSI, IPMI, 0x0600, 0x0100)
      Field (SYSI, BufferAcc, Lock, Preserve)
      {
          AccessAs (BufferAcc, 0x01),
          Offset (0x58),
          SCMD,   8,
          GCMD,   8
      }

      OperationRegion (POWR, IPMI, 0x3000, 0x0100)
      Field (POWR, BufferAcc, Lock, Preserve)
      {
          AccessAs (BufferAcc, 0x01),
          Offset (0xB3),
          GPMM,   8
      }
  }

  Device (PCI0)
  {
      Device (ISA)
      {
          Device (NIPM)
          {
              Name (_HID, EisaId ("IPI0001"))  // _HID: Hardware ID
              Method (_IFT, 0, NotSerialized)  // _IFT: IPMI Interface Type
              {
                  Return (0x01)
              }
          }
      }
  }
Current ACPI_IPMI code registers IPMI operation region handler on a
per-device basis, so that for above namespace, the IPMI operation region
handler is registered only under the scope of \_SB.PCI0.ISA.NIPM.  Thus
when an IPMI operation region field of \PMI0 is accessed, there are errors
reported on such platform:
  ACPI Error: No handlers for Region [IPMI]
  ACPI Error: Region IPMI(7) has no handler
The solution is to install IPMI operation region handler from root node so
that every object that defines IPMI OperationRegion can get an address
space handler registered.

When an IPMI operation region field is accessed, the Network Function
(0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are
passed to the operation region handler, there is no system interface
specified by the BIOS.  The patch tries to select one system interface by
monitoring the system interface notification.  IPMI messages passed from
the ACPI codes are sent to this selected global IPMI system interface.

Known issues:
- How to select the IPMI system interface:
  Currently, the ACPI_IPMI always selects the first registered one with the
  ACPI handle set (i.e., defined in the ACPI namespace).  It's hard to
  determine the selection when there are multiple IPMI system interfaces
  defined in the ACPI namespace.
  According to the IPMI specification:
  A BMC device may make available multiple system interfaces, but only one
  management controller is allowed to be 'active' BMC that provides BMC
  functionality for the system (in case of a 'partitioned' system, there
  can be only one active BMC per partition).  Only the system interface(s)
  for the active BMC allowed to respond to the 'Get Device Id' command.
  According to the ipmi_si desigin:
  The ipmi_si registeration notifications can only happen after a
  successful "Get Device ID" command.
  Thus it should be OK for non-partitioned systems to do such selection.
  But we do not have too much knowledges on 'partitioned' systems.
- Lack of smi_gone()/new_smi() testability:
  It is not possible to do module(ipmi_si) load/unload test, and I can't
  find any multiple IPMI system interfaces platforms available for testing.
  There might be issues in the untested code path.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  111 +++++++++++++++++++++++-----------------------
 1 file changed, 55 insertions(+), 56 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index cbf25e0..5f8f495 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -46,7 +46,8 @@ MODULE_AUTHOR("Zhao Yakui");
 MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
 MODULE_LICENSE("GPL");
 
-#define IPMI_FLAGS_HANDLER_INSTALL	0
+#undef PREFIX
+#define PREFIX				"ACPI: IPMI: "
 
 #define ACPI_IPMI_OK			0
 #define ACPI_IPMI_TIMEOUT		0x10
@@ -66,7 +67,6 @@ struct acpi_ipmi_device {
 	ipmi_user_t	user_interface;
 	int ipmi_ifnum; /* IPMI interface number */
 	long curr_msgid;
-	unsigned long flags;
 	struct ipmi_smi_info smi_data;
 	atomic_t refcnt;
 };
@@ -76,6 +76,14 @@ struct ipmi_driver_data {
 	struct ipmi_smi_watcher	bmc_events;
 	struct ipmi_user_hndl	ipmi_hndlrs;
 	struct mutex		ipmi_lock;
+	/*
+	 * NOTE: IPMI System Interface Selection
+	 * There is no system interface specified by the IPMI operation
+	 * region access.  We try to select one system interface with ACPI
+	 * handle set.  IPMI messages passed from the ACPI codes are sent
+	 * to this selected global IPMI system interface.
+	 */
+	struct acpi_ipmi_device *selected_smi;
 };
 
 struct acpi_ipmi_msg {
@@ -109,8 +117,6 @@ struct acpi_ipmi_buffer {
 static void ipmi_register_bmc(int iface, struct device *dev);
 static void ipmi_bmc_gone(int iface);
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
 
 static struct ipmi_driver_data driver_data = {
 	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -153,7 +159,6 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
 		return NULL;
 	}
 	ipmi_device->user_interface = user;
-	ipmi_install_space_handler(ipmi_device);
 
 	return ipmi_device;
 }
@@ -168,7 +173,6 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
 
 static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
 {
-	ipmi_remove_space_handler(ipmi_device);
 	ipmi_destroy_user(ipmi_device->user_interface);
 	put_device(ipmi_device->smi_data.dev);
 	kfree(ipmi_device);
@@ -180,22 +184,15 @@ static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
 		ipmi_dev_release(ipmi_device);
 }
 
-static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
+static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
 {
-	int dev_found = 0;
 	struct acpi_ipmi_device *ipmi_device;
 
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
-		if (ipmi_device->ipmi_ifnum == iface) {
-			dev_found = 1;
-			acpi_ipmi_dev_get(ipmi_device);
-			break;
-		}
-	}
+	ipmi_device = acpi_ipmi_dev_get(driver_data.selected_smi);
 	mutex_unlock(&driver_data.ipmi_lock);
 
-	return dev_found ? ipmi_device : NULL;
+	return ipmi_device;
 }
 
 static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
@@ -410,6 +407,9 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 			goto err_lock;
 	}
 
+	if (!driver_data.selected_smi)
+		driver_data.selected_smi = acpi_ipmi_dev_get(ipmi_device);
+
 	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
 	mutex_unlock(&driver_data.ipmi_lock);
 	put_device(smi_data.dev);
@@ -426,22 +426,34 @@ err_ref:
 static void ipmi_bmc_gone(int iface)
 {
 	int dev_found = 0;
-	struct acpi_ipmi_device *ipmi_device;
+	struct acpi_ipmi_device *ipmi_gone, *ipmi_new;
 
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
-		if (ipmi_device->ipmi_ifnum == iface) {
+	list_for_each_entry(ipmi_gone, &driver_data.ipmi_devices, head) {
+		if (ipmi_gone->ipmi_ifnum == iface) {
 			dev_found = 1;
 			break;
 		}
 	}
-	if (dev_found)
-		list_del(&ipmi_device->head);
+	if (dev_found) {
+		list_del(&ipmi_gone->head);
+		if (driver_data.selected_smi == ipmi_gone) {
+			acpi_ipmi_dev_put(ipmi_gone);
+			driver_data.selected_smi = NULL;
+		}
+	}
+	if (!driver_data.selected_smi &&
+	    !list_empty(&driver_data.ipmi_devices)) {
+		ipmi_new = list_first_entry(&driver_data.ipmi_devices,
+					    struct acpi_ipmi_device,
+					    head);
+		driver_data.selected_smi = acpi_ipmi_dev_get(ipmi_new);
+	}
 	mutex_unlock(&driver_data.ipmi_lock);
 
 	if (dev_found) {
-		ipmi_flush_tx_msg(ipmi_device);
-		acpi_ipmi_dev_put(ipmi_device);
+		ipmi_flush_tx_msg(ipmi_gone);
+		acpi_ipmi_dev_put(ipmi_gone);
 	}
 }
 
@@ -467,7 +479,6 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 			void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
-	int iface = (long)handler_context;
 	struct acpi_ipmi_device *ipmi_device;
 	int err, rem_time;
 	acpi_status status;
@@ -482,7 +493,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	ipmi_device = acpi_ipmi_get_targeted_smi(iface);
+	ipmi_device = acpi_ipmi_get_selected_smi();
 	if (!ipmi_device)
 		return AE_NOT_EXIST;
 
@@ -525,48 +536,28 @@ out_ref:
 	return status;
 }
 
-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi)
-{
-	if (!test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
-		return;
-
-	acpi_remove_address_space_handler(ipmi->handle,
-				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler);
-
-	clear_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
-}
-
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
+static int __init acpi_ipmi_init(void)
 {
+	int result = 0;
 	acpi_status status;
 
-	if (test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
-		return 0;
+	if (acpi_disabled)
+		return result;
 
-	status = acpi_install_address_space_handler(ipmi->handle,
+	mutex_init(&driver_data.ipmi_lock);
+
+	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
 						    ACPI_ADR_SPACE_IPMI,
 						    &acpi_ipmi_space_handler,
-						    NULL, (void *)((long)ipmi->ipmi_ifnum));
+						    NULL, NULL);
 	if (ACPI_FAILURE(status)) {
-		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
-		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
-			"handle\n");
+		pr_warn("Can't register IPMI opregion space handle\n");
 		return -EINVAL;
 	}
-	set_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
-	return 0;
-}
-
-static int __init acpi_ipmi_init(void)
-{
-	int result = 0;
-
-	if (acpi_disabled)
-		return result;
-
-	mutex_init(&driver_data.ipmi_lock);
 
 	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
+	if (result)
+		pr_err("Can't register IPMI system interface watcher\n");
 
 	return result;
 }
@@ -592,6 +583,10 @@ static void __exit acpi_ipmi_exit(void)
 					       struct acpi_ipmi_device,
 					       head);
 		list_del(&ipmi_device->head);
+		if (ipmi_device == driver_data.selected_smi) {
+			acpi_ipmi_dev_put(driver_data.selected_smi);
+			driver_data.selected_smi = NULL;
+		}
 		mutex_unlock(&driver_data.ipmi_lock);
 
 		ipmi_flush_tx_msg(ipmi_device);
@@ -600,6 +595,10 @@ static void __exit acpi_ipmi_exit(void)
 		mutex_lock(&driver_data.ipmi_lock);
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
+
+	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
+					  ACPI_ADR_SPACE_IPMI,
+					  &acpi_ipmi_space_handler);
 }
 
 module_init(acpi_ipmi_init);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:09     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch adds reference couting for ACPI operation region handlers to fix
races caused by the ACPICA address space callback invocations.

ACPICA address space callback invocation is not suitable for Linux
CONFIG_MODULE=y execution environment.  This patch tries to protect the
address space callbacks by invoking them under a module safe environment.
The IPMI address space handler is also upgraded in this patch.
The acpi_unregister_region() is designed to meet the following
requirements:
1. It acts as a barrier for operation region callbacks - no callback will
   happen after acpi_unregister_region().
2. acpi_unregister_region() is safe to be called in moudle->exit()
   functions.
Using reference counting rather than module referencing allows
such benefits to be achieved even when acpi_unregister_region() is called
in the environments other than module->exit().
The header file of include/acpi/acpi_bus.h should contain the declarations
that have references to some ACPICA defined types.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   16 ++--
 drivers/acpi/osl.c       |  224 ++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h  |    5 ++
 3 files changed, 235 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 5f8f495..2a09156 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -539,20 +539,18 @@ out_ref:
 static int __init acpi_ipmi_init(void)
 {
 	int result = 0;
-	acpi_status status;
 
 	if (acpi_disabled)
 		return result;
 
 	mutex_init(&driver_data.ipmi_lock);
 
-	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
-						    ACPI_ADR_SPACE_IPMI,
-						    &acpi_ipmi_space_handler,
-						    NULL, NULL);
-	if (ACPI_FAILURE(status)) {
+	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
+				      &acpi_ipmi_space_handler,
+				      NULL, NULL);
+	if (result) {
 		pr_warn("Can't register IPMI opregion space handle\n");
-		return -EINVAL;
+		return result;
 	}
 
 	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
@@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 
-	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
-					  ACPI_ADR_SPACE_IPMI,
-					  &acpi_ipmi_space_handler);
+	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
 }
 
 module_init(acpi_ipmi_init);
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 6ab2c35..8398e51 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
 static struct workqueue_struct *kacpi_notify_wq;
 static struct workqueue_struct *kacpi_hotplug_wq;
 
+struct acpi_region {
+	unsigned long flags;
+#define ACPI_REGION_DEFAULT		0x01
+#define ACPI_REGION_INSTALLED		0x02
+#define ACPI_REGION_REGISTERED		0x04
+#define ACPI_REGION_UNREGISTERING	0x08
+#define ACPI_REGION_INSTALLING		0x10
+	/*
+	 * NOTE: Upgrading All Region Handlers
+	 * This flag is only used during the period where not all of the
+	 * region handers are upgraded to the new interfaces.
+	 */
+#define ACPI_REGION_MANAGED		0x80
+	acpi_adr_space_handler handler;
+	acpi_adr_space_setup setup;
+	void *context;
+	/* Invoking references */
+	atomic_t refcnt;
+};
+
+static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS] = {
+	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
+		.flags = ACPI_REGION_DEFAULT,
+	},
+	[ACPI_ADR_SPACE_SYSTEM_IO] = {
+		.flags = ACPI_REGION_DEFAULT,
+	},
+	[ACPI_ADR_SPACE_PCI_CONFIG] = {
+		.flags = ACPI_REGION_DEFAULT,
+	},
+	[ACPI_ADR_SPACE_IPMI] = {
+		.flags = ACPI_REGION_MANAGED,
+	},
+};
+static DEFINE_MUTEX(acpi_mutex_region);
+
 /*
  * This list of permanent mappings is for memory that may be accessed from
  * interrupt context, where we can't do the ioremap().
@@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle, u32 type, void *context,
 		kfree(hp_work);
 }
 EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
+
+static bool acpi_region_managed(struct acpi_region *rgn)
+{
+	/*
+	 * NOTE: Default and Managed
+	 * We only need to avoid region management on the regions managed
+	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need additional
+	 * check as many operation region handlers are not upgraded, so
+	 * only those known to be safe are managed (ACPI_REGION_MANAGED).
+	 */
+	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
+	       (rgn->flags & ACPI_REGION_MANAGED);
+}
+
+static bool acpi_region_callable(struct acpi_region *rgn)
+{
+	return (rgn->flags & ACPI_REGION_REGISTERED) &&
+	       !(rgn->flags & ACPI_REGION_UNREGISTERING);
+}
+
+static acpi_status
+acpi_region_default_handler(u32 function,
+			    acpi_physical_address address,
+			    u32 bit_width, u64 *value,
+			    void *handler_context, void *region_context)
+{
+	acpi_adr_space_handler handler;
+	struct acpi_region *rgn = (struct acpi_region *)handler_context;
+	void *context;
+	acpi_status status = AE_NOT_EXIST;
+
+	mutex_lock(&acpi_mutex_region);
+	if (!acpi_region_callable(rgn) || !rgn->handler) {
+		mutex_unlock(&acpi_mutex_region);
+		return status;
+	}
+
+	atomic_inc(&rgn->refcnt);
+	handler = rgn->handler;
+	context = rgn->context;
+	mutex_unlock(&acpi_mutex_region);
+
+	status = handler(function, address, bit_width, value, context,
+			 region_context);
+	atomic_dec(&rgn->refcnt);
+
+	return status;
+}
+
+static acpi_status
+acpi_region_default_setup(acpi_handle handle, u32 function,
+			  void *handler_context, void **region_context)
+{
+	acpi_adr_space_setup setup;
+	struct acpi_region *rgn = (struct acpi_region *)handler_context;
+	void *context;
+	acpi_status status = AE_OK;
+
+	mutex_lock(&acpi_mutex_region);
+	if (!acpi_region_callable(rgn) || !rgn->setup) {
+		mutex_unlock(&acpi_mutex_region);
+		return status;
+	}
+
+	atomic_inc(&rgn->refcnt);
+	setup = rgn->setup;
+	context = rgn->context;
+	mutex_unlock(&acpi_mutex_region);
+
+	status = setup(handle, function, context, region_context);
+	atomic_dec(&rgn->refcnt);
+
+	return status;
+}
+
+static int __acpi_install_region(struct acpi_region *rgn,
+				 acpi_adr_space_type space_id)
+{
+	int res = 0;
+	acpi_status status;
+	int installing = 0;
+
+	mutex_lock(&acpi_mutex_region);
+	if (rgn->flags & ACPI_REGION_INSTALLED)
+		goto out_lock;
+	if (rgn->flags & ACPI_REGION_INSTALLING) {
+		res = -EBUSY;
+		goto out_lock;
+	}
+
+	installing = 1;
+	rgn->flags |= ACPI_REGION_INSTALLING;
+	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT, space_id,
+						    acpi_region_default_handler,
+						    acpi_region_default_setup,
+						    rgn);
+	rgn->flags &= ~ACPI_REGION_INSTALLING;
+	if (ACPI_FAILURE(status))
+		res = -EINVAL;
+	else
+		rgn->flags |= ACPI_REGION_INSTALLED;
+
+out_lock:
+	mutex_unlock(&acpi_mutex_region);
+	if (installing) {
+		if (res)
+			pr_err("Failed to install region %d\n", space_id);
+		else
+			pr_info("Region %d installed\n", space_id);
+	}
+	return res;
+}
+
+int acpi_register_region(acpi_adr_space_type space_id,
+			 acpi_adr_space_handler handler,
+			 acpi_adr_space_setup setup, void *context)
+{
+	int res;
+	struct acpi_region *rgn;
+
+	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
+		return -EINVAL;
+
+	rgn = &acpi_regions[space_id];
+	if (!acpi_region_managed(rgn))
+		return -EINVAL;
+
+	res = __acpi_install_region(rgn, space_id);
+	if (res)
+		return res;
+
+	mutex_lock(&acpi_mutex_region);
+	if (rgn->flags & ACPI_REGION_REGISTERED) {
+		mutex_unlock(&acpi_mutex_region);
+		return -EBUSY;
+	}
+
+	rgn->handler = handler;
+	rgn->setup = setup;
+	rgn->context = context;
+	rgn->flags |= ACPI_REGION_REGISTERED;
+	atomic_set(&rgn->refcnt, 1);
+	mutex_unlock(&acpi_mutex_region);
+
+	pr_info("Region %d registered\n", space_id);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_register_region);
+
+void acpi_unregister_region(acpi_adr_space_type space_id)
+{
+	struct acpi_region *rgn;
+
+	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
+		return;
+
+	rgn = &acpi_regions[space_id];
+	if (!acpi_region_managed(rgn))
+		return;
+
+	mutex_lock(&acpi_mutex_region);
+	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
+		mutex_unlock(&acpi_mutex_region);
+		return;
+	}
+	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
+		mutex_unlock(&acpi_mutex_region);
+		return;
+	}
+
+	rgn->flags |= ACPI_REGION_UNREGISTERING;
+	rgn->handler = NULL;
+	rgn->setup = NULL;
+	rgn->context = NULL;
+	mutex_unlock(&acpi_mutex_region);
+
+	while (atomic_read(&rgn->refcnt) > 1)
+		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
+	atomic_dec(&rgn->refcnt);
+
+	mutex_lock(&acpi_mutex_region);
+	rgn->flags &= ~(ACPI_REGION_REGISTERED | ACPI_REGION_UNREGISTERING);
+	mutex_unlock(&acpi_mutex_region);
+
+	pr_info("Region %d unregistered\n", space_id);
+}
+EXPORT_SYMBOL_GPL(acpi_unregister_region);
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index a2c2fbb..15fad0d 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void *bus) { return 0; }
 
 #endif				/* CONFIG_ACPI */
 
+int acpi_register_region(acpi_adr_space_type space_id,
+			 acpi_adr_space_handler handler,
+			 acpi_adr_space_setup setup, void *context);
+void acpi_unregister_region(acpi_adr_space_type space_id);
+
 #endif /*__ACPI_BUS_H__*/
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-23  8:09     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, stable, linux-acpi, openipmi-developer

This patch adds reference couting for ACPI operation region handlers to fix
races caused by the ACPICA address space callback invocations.

ACPICA address space callback invocation is not suitable for Linux
CONFIG_MODULE=y execution environment.  This patch tries to protect the
address space callbacks by invoking them under a module safe environment.
The IPMI address space handler is also upgraded in this patch.
The acpi_unregister_region() is designed to meet the following
requirements:
1. It acts as a barrier for operation region callbacks - no callback will
   happen after acpi_unregister_region().
2. acpi_unregister_region() is safe to be called in moudle->exit()
   functions.
Using reference counting rather than module referencing allows
such benefits to be achieved even when acpi_unregister_region() is called
in the environments other than module->exit().
The header file of include/acpi/acpi_bus.h should contain the declarations
that have references to some ACPICA defined types.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   16 ++--
 drivers/acpi/osl.c       |  224 ++++++++++++++++++++++++++++++++++++++++++++++
 include/acpi/acpi_bus.h  |    5 ++
 3 files changed, 235 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 5f8f495..2a09156 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -539,20 +539,18 @@ out_ref:
 static int __init acpi_ipmi_init(void)
 {
 	int result = 0;
-	acpi_status status;
 
 	if (acpi_disabled)
 		return result;
 
 	mutex_init(&driver_data.ipmi_lock);
 
-	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
-						    ACPI_ADR_SPACE_IPMI,
-						    &acpi_ipmi_space_handler,
-						    NULL, NULL);
-	if (ACPI_FAILURE(status)) {
+	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
+				      &acpi_ipmi_space_handler,
+				      NULL, NULL);
+	if (result) {
 		pr_warn("Can't register IPMI opregion space handle\n");
-		return -EINVAL;
+		return result;
 	}
 
 	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
@@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 
-	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
-					  ACPI_ADR_SPACE_IPMI,
-					  &acpi_ipmi_space_handler);
+	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
 }
 
 module_init(acpi_ipmi_init);
diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
index 6ab2c35..8398e51 100644
--- a/drivers/acpi/osl.c
+++ b/drivers/acpi/osl.c
@@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
 static struct workqueue_struct *kacpi_notify_wq;
 static struct workqueue_struct *kacpi_hotplug_wq;
 
+struct acpi_region {
+	unsigned long flags;
+#define ACPI_REGION_DEFAULT		0x01
+#define ACPI_REGION_INSTALLED		0x02
+#define ACPI_REGION_REGISTERED		0x04
+#define ACPI_REGION_UNREGISTERING	0x08
+#define ACPI_REGION_INSTALLING		0x10
+	/*
+	 * NOTE: Upgrading All Region Handlers
+	 * This flag is only used during the period where not all of the
+	 * region handers are upgraded to the new interfaces.
+	 */
+#define ACPI_REGION_MANAGED		0x80
+	acpi_adr_space_handler handler;
+	acpi_adr_space_setup setup;
+	void *context;
+	/* Invoking references */
+	atomic_t refcnt;
+};
+
+static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS] = {
+	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
+		.flags = ACPI_REGION_DEFAULT,
+	},
+	[ACPI_ADR_SPACE_SYSTEM_IO] = {
+		.flags = ACPI_REGION_DEFAULT,
+	},
+	[ACPI_ADR_SPACE_PCI_CONFIG] = {
+		.flags = ACPI_REGION_DEFAULT,
+	},
+	[ACPI_ADR_SPACE_IPMI] = {
+		.flags = ACPI_REGION_MANAGED,
+	},
+};
+static DEFINE_MUTEX(acpi_mutex_region);
+
 /*
  * This list of permanent mappings is for memory that may be accessed from
  * interrupt context, where we can't do the ioremap().
@@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle, u32 type, void *context,
 		kfree(hp_work);
 }
 EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
+
+static bool acpi_region_managed(struct acpi_region *rgn)
+{
+	/*
+	 * NOTE: Default and Managed
+	 * We only need to avoid region management on the regions managed
+	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need additional
+	 * check as many operation region handlers are not upgraded, so
+	 * only those known to be safe are managed (ACPI_REGION_MANAGED).
+	 */
+	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
+	       (rgn->flags & ACPI_REGION_MANAGED);
+}
+
+static bool acpi_region_callable(struct acpi_region *rgn)
+{
+	return (rgn->flags & ACPI_REGION_REGISTERED) &&
+	       !(rgn->flags & ACPI_REGION_UNREGISTERING);
+}
+
+static acpi_status
+acpi_region_default_handler(u32 function,
+			    acpi_physical_address address,
+			    u32 bit_width, u64 *value,
+			    void *handler_context, void *region_context)
+{
+	acpi_adr_space_handler handler;
+	struct acpi_region *rgn = (struct acpi_region *)handler_context;
+	void *context;
+	acpi_status status = AE_NOT_EXIST;
+
+	mutex_lock(&acpi_mutex_region);
+	if (!acpi_region_callable(rgn) || !rgn->handler) {
+		mutex_unlock(&acpi_mutex_region);
+		return status;
+	}
+
+	atomic_inc(&rgn->refcnt);
+	handler = rgn->handler;
+	context = rgn->context;
+	mutex_unlock(&acpi_mutex_region);
+
+	status = handler(function, address, bit_width, value, context,
+			 region_context);
+	atomic_dec(&rgn->refcnt);
+
+	return status;
+}
+
+static acpi_status
+acpi_region_default_setup(acpi_handle handle, u32 function,
+			  void *handler_context, void **region_context)
+{
+	acpi_adr_space_setup setup;
+	struct acpi_region *rgn = (struct acpi_region *)handler_context;
+	void *context;
+	acpi_status status = AE_OK;
+
+	mutex_lock(&acpi_mutex_region);
+	if (!acpi_region_callable(rgn) || !rgn->setup) {
+		mutex_unlock(&acpi_mutex_region);
+		return status;
+	}
+
+	atomic_inc(&rgn->refcnt);
+	setup = rgn->setup;
+	context = rgn->context;
+	mutex_unlock(&acpi_mutex_region);
+
+	status = setup(handle, function, context, region_context);
+	atomic_dec(&rgn->refcnt);
+
+	return status;
+}
+
+static int __acpi_install_region(struct acpi_region *rgn,
+				 acpi_adr_space_type space_id)
+{
+	int res = 0;
+	acpi_status status;
+	int installing = 0;
+
+	mutex_lock(&acpi_mutex_region);
+	if (rgn->flags & ACPI_REGION_INSTALLED)
+		goto out_lock;
+	if (rgn->flags & ACPI_REGION_INSTALLING) {
+		res = -EBUSY;
+		goto out_lock;
+	}
+
+	installing = 1;
+	rgn->flags |= ACPI_REGION_INSTALLING;
+	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT, space_id,
+						    acpi_region_default_handler,
+						    acpi_region_default_setup,
+						    rgn);
+	rgn->flags &= ~ACPI_REGION_INSTALLING;
+	if (ACPI_FAILURE(status))
+		res = -EINVAL;
+	else
+		rgn->flags |= ACPI_REGION_INSTALLED;
+
+out_lock:
+	mutex_unlock(&acpi_mutex_region);
+	if (installing) {
+		if (res)
+			pr_err("Failed to install region %d\n", space_id);
+		else
+			pr_info("Region %d installed\n", space_id);
+	}
+	return res;
+}
+
+int acpi_register_region(acpi_adr_space_type space_id,
+			 acpi_adr_space_handler handler,
+			 acpi_adr_space_setup setup, void *context)
+{
+	int res;
+	struct acpi_region *rgn;
+
+	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
+		return -EINVAL;
+
+	rgn = &acpi_regions[space_id];
+	if (!acpi_region_managed(rgn))
+		return -EINVAL;
+
+	res = __acpi_install_region(rgn, space_id);
+	if (res)
+		return res;
+
+	mutex_lock(&acpi_mutex_region);
+	if (rgn->flags & ACPI_REGION_REGISTERED) {
+		mutex_unlock(&acpi_mutex_region);
+		return -EBUSY;
+	}
+
+	rgn->handler = handler;
+	rgn->setup = setup;
+	rgn->context = context;
+	rgn->flags |= ACPI_REGION_REGISTERED;
+	atomic_set(&rgn->refcnt, 1);
+	mutex_unlock(&acpi_mutex_region);
+
+	pr_info("Region %d registered\n", space_id);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(acpi_register_region);
+
+void acpi_unregister_region(acpi_adr_space_type space_id)
+{
+	struct acpi_region *rgn;
+
+	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
+		return;
+
+	rgn = &acpi_regions[space_id];
+	if (!acpi_region_managed(rgn))
+		return;
+
+	mutex_lock(&acpi_mutex_region);
+	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
+		mutex_unlock(&acpi_mutex_region);
+		return;
+	}
+	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
+		mutex_unlock(&acpi_mutex_region);
+		return;
+	}
+
+	rgn->flags |= ACPI_REGION_UNREGISTERING;
+	rgn->handler = NULL;
+	rgn->setup = NULL;
+	rgn->context = NULL;
+	mutex_unlock(&acpi_mutex_region);
+
+	while (atomic_read(&rgn->refcnt) > 1)
+		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
+	atomic_dec(&rgn->refcnt);
+
+	mutex_lock(&acpi_mutex_region);
+	rgn->flags &= ~(ACPI_REGION_REGISTERED | ACPI_REGION_UNREGISTERING);
+	mutex_unlock(&acpi_mutex_region);
+
+	pr_info("Region %d unregistered\n", space_id);
+}
+EXPORT_SYMBOL_GPL(acpi_unregister_region);
diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
index a2c2fbb..15fad0d 100644
--- a/include/acpi/acpi_bus.h
+++ b/include/acpi/acpi_bus.h
@@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void *bus) { return 0; }
 
 #endif				/* CONFIG_ACPI */
 
+int acpi_register_region(acpi_adr_space_type space_id,
+			 acpi_adr_space_handler handler,
+			 acpi_adr_space_setup setup, void *context);
+void acpi_unregister_region(acpi_adr_space_type space_id);
+
 #endif /*__ACPI_BUS_H__*/
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:09     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This patch adds reference counting for ACPI IPMI transfers to tune the
locking granularity of tx_msg_lock.

The acpi_ipmi_msg handling is re-designed using referece counting.
1. tx_msg is always unlinked before complete(), so that:
   1.1. it is safe to put complete() out side of tx_msg_lock;
   1.2. complete() can only happen once, thus smp_wmb() is not required.
2. Increasing the reference of tx_msg before calling
   ipmi_request_settime() and introducing tx_msg_lock protected
   ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
   tx_msg unlinking in the failure cases.
3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
   and freed in the contexts other than acpi_ipmi_space_handler().

The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
tuning:
1. ipmi_lock is always leaf:
   irq_context: 0
   [ffffffff81a943f8] smi_watchers_mutex
   [ffffffffa06eca60] driver_data.ipmi_lock
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a6678] s_active#103
   [ffffffffa06eca60] driver_data.ipmi_lock
2. without this patch applied, lock used by complete() is held after
   holding tx_msg_lock:
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a6678] s_active#103
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   [ffffffff81e36620] &p->pi_lock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   [ffffffff81e36620] &p->pi_lock
   [ffffffff81e5d0a8] &rq->lock
3. with this patch applied, tx_msg_lock is always leaf:
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a66d8] s_active#107
   [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  107 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 77 insertions(+), 30 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 2a09156..0ee1ea6 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
 	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
 	u8	rx_len;
 	struct acpi_ipmi_device *device;
+	atomic_t	refcnt;
 };
 
 /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
@@ -195,22 +196,47 @@ static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
 	return ipmi_device;
 }
 
-static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
+static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
 {
+	struct acpi_ipmi_device *ipmi;
 	struct acpi_ipmi_msg *ipmi_msg;
-	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 
+	ipmi = acpi_ipmi_get_selected_smi();
+	if (!ipmi)
+		return NULL;
 	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
-	if (!ipmi_msg)	{
-		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
+	if (!ipmi_msg) {
+		acpi_ipmi_dev_put(ipmi);
 		return NULL;
 	}
+	atomic_set(&ipmi_msg->refcnt, 1);
 	init_completion(&ipmi_msg->tx_complete);
 	INIT_LIST_HEAD(&ipmi_msg->head);
 	ipmi_msg->device = ipmi;
+
 	return ipmi_msg;
 }
 
+static struct acpi_ipmi_msg *
+acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
+{
+	if (tx_msg)
+		atomic_inc(&tx_msg->refcnt);
+	return tx_msg;
+}
+
+static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
+{
+	acpi_ipmi_dev_put(tx_msg->device);
+	kfree(tx_msg);
+}
+
+static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
+{
+	if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
+		ipmi_msg_release(tx_msg);
+}
+
 #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
 #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
 static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
@@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 {
-	struct acpi_ipmi_msg *tx_msg, *temp;
+	struct acpi_ipmi_msg *tx_msg;
 	unsigned long flags;
 
 	/*
@@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 	 */
 	while (atomic_read(&ipmi->refcnt) > 1) {
 		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
-		list_for_each_entry_safe(tx_msg, temp,
-					 &ipmi->tx_msg_list, head) {
+		while (!list_empty(&ipmi->tx_msg_list)) {
+			tx_msg = list_first_entry(&ipmi->tx_msg_list,
+						  struct acpi_ipmi_msg,
+						  head);
+			list_del(&tx_msg->head);
+			spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
 			/* wake up the sleep thread on the Tx msg */
 			complete(&tx_msg->tx_complete);
+			acpi_ipmi_msg_put(tx_msg);
+			spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
 		}
 		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
 		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
 	}
 }
 
+static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
+			       struct acpi_ipmi_msg *msg)
+{
+	struct acpi_ipmi_msg *tx_msg;
+	int msg_found = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+	list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
+		if (msg == tx_msg) {
+			msg_found = 1;
+			break;
+		}
+	}
+	if (msg_found)
+		list_del(&tx_msg->head);
+	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
+	if (msg_found)
+		acpi_ipmi_msg_put(tx_msg);
+}
+
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 {
 	struct acpi_ipmi_device *ipmi_device = user_msg_data;
@@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 			break;
 		}
 	}
+	if (msg_found)
+		list_del(&tx_msg->head);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev,
 			 "Unexpected response (msg id %ld) is returned.\n",
 			 msg->msgid);
-		goto out_lock;
+		goto out_msg;
 	}
 
 	/* copy the response data to Rx_data buffer */
@@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	}
 	tx_msg->rx_len = msg->msg.data_len;
 	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
-	/* tx_msg content must be valid before setting msg_done flag */
-	smp_wmb();
 	tx_msg->msg_done = 1;
 
 out_comp:
 	complete(&tx_msg->tx_complete);
-out_lock:
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
+	acpi_ipmi_msg_put(tx_msg);
 out_msg:
 	ipmi_free_recv_msg(msg);
 }
@@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	ipmi_device = acpi_ipmi_get_selected_smi();
-	if (!ipmi_device)
+	tx_msg = ipmi_msg_alloc();
+	if (!tx_msg)
 		return AE_NOT_EXIST;
-
-	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
-	if (!tx_msg) {
-		status = AE_NO_MEMORY;
-		goto out_ref;
-	}
+	ipmi_device = tx_msg->device;
 
 	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
-		status = AE_TYPE;
-		goto out_msg;
+		ipmi_msg_release(tx_msg);
+		return AE_TYPE;
 	}
 
+	acpi_ipmi_msg_get(tx_msg);
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
@@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 				   NULL, 0, 0, 0);
 	if (err) {
 		status = AE_ERROR;
-		goto out_list;
+		goto out_msg;
 	}
 	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
 					       IPMI_TIMEOUT);
 	acpi_format_ipmi_response(tx_msg, value, rem_time);
 	status = AE_OK;
 
-out_list:
-	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
-	list_del(&tx_msg->head);
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
-	kfree(tx_msg);
-out_ref:
-	acpi_ipmi_dev_put(ipmi_device);
+	ipmi_cancel_tx_msg(ipmi_device, tx_msg);
+	acpi_ipmi_msg_put(tx_msg);
 	return status;
 }
 
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
@ 2013-07-23  8:09     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:09 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This patch adds reference counting for ACPI IPMI transfers to tune the
locking granularity of tx_msg_lock.

The acpi_ipmi_msg handling is re-designed using referece counting.
1. tx_msg is always unlinked before complete(), so that:
   1.1. it is safe to put complete() out side of tx_msg_lock;
   1.2. complete() can only happen once, thus smp_wmb() is not required.
2. Increasing the reference of tx_msg before calling
   ipmi_request_settime() and introducing tx_msg_lock protected
   ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
   tx_msg unlinking in the failure cases.
3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
   and freed in the contexts other than acpi_ipmi_space_handler().

The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
tuning:
1. ipmi_lock is always leaf:
   irq_context: 0
   [ffffffff81a943f8] smi_watchers_mutex
   [ffffffffa06eca60] driver_data.ipmi_lock
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a6678] s_active#103
   [ffffffffa06eca60] driver_data.ipmi_lock
2. without this patch applied, lock used by complete() is held after
   holding tx_msg_lock:
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a6678] s_active#103
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   [ffffffff81e36620] &p->pi_lock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   [ffffffff81e36620] &p->pi_lock
   [ffffffff81e5d0a8] &rq->lock
3. with this patch applied, tx_msg_lock is always leaf:
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a66d8] s_active#107
   [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  107 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 77 insertions(+), 30 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 2a09156..0ee1ea6 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
 	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
 	u8	rx_len;
 	struct acpi_ipmi_device *device;
+	atomic_t	refcnt;
 };
 
 /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
@@ -195,22 +196,47 @@ static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
 	return ipmi_device;
 }
 
-static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
+static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
 {
+	struct acpi_ipmi_device *ipmi;
 	struct acpi_ipmi_msg *ipmi_msg;
-	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 
+	ipmi = acpi_ipmi_get_selected_smi();
+	if (!ipmi)
+		return NULL;
 	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
-	if (!ipmi_msg)	{
-		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
+	if (!ipmi_msg) {
+		acpi_ipmi_dev_put(ipmi);
 		return NULL;
 	}
+	atomic_set(&ipmi_msg->refcnt, 1);
 	init_completion(&ipmi_msg->tx_complete);
 	INIT_LIST_HEAD(&ipmi_msg->head);
 	ipmi_msg->device = ipmi;
+
 	return ipmi_msg;
 }
 
+static struct acpi_ipmi_msg *
+acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
+{
+	if (tx_msg)
+		atomic_inc(&tx_msg->refcnt);
+	return tx_msg;
+}
+
+static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
+{
+	acpi_ipmi_dev_put(tx_msg->device);
+	kfree(tx_msg);
+}
+
+static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
+{
+	if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
+		ipmi_msg_release(tx_msg);
+}
+
 #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
 #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
 static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
@@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 {
-	struct acpi_ipmi_msg *tx_msg, *temp;
+	struct acpi_ipmi_msg *tx_msg;
 	unsigned long flags;
 
 	/*
@@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 	 */
 	while (atomic_read(&ipmi->refcnt) > 1) {
 		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
-		list_for_each_entry_safe(tx_msg, temp,
-					 &ipmi->tx_msg_list, head) {
+		while (!list_empty(&ipmi->tx_msg_list)) {
+			tx_msg = list_first_entry(&ipmi->tx_msg_list,
+						  struct acpi_ipmi_msg,
+						  head);
+			list_del(&tx_msg->head);
+			spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
 			/* wake up the sleep thread on the Tx msg */
 			complete(&tx_msg->tx_complete);
+			acpi_ipmi_msg_put(tx_msg);
+			spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
 		}
 		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
 		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
 	}
 }
 
+static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
+			       struct acpi_ipmi_msg *msg)
+{
+	struct acpi_ipmi_msg *tx_msg;
+	int msg_found = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+	list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
+		if (msg == tx_msg) {
+			msg_found = 1;
+			break;
+		}
+	}
+	if (msg_found)
+		list_del(&tx_msg->head);
+	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
+	if (msg_found)
+		acpi_ipmi_msg_put(tx_msg);
+}
+
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 {
 	struct acpi_ipmi_device *ipmi_device = user_msg_data;
@@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 			break;
 		}
 	}
+	if (msg_found)
+		list_del(&tx_msg->head);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev,
 			 "Unexpected response (msg id %ld) is returned.\n",
 			 msg->msgid);
-		goto out_lock;
+		goto out_msg;
 	}
 
 	/* copy the response data to Rx_data buffer */
@@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	}
 	tx_msg->rx_len = msg->msg.data_len;
 	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
-	/* tx_msg content must be valid before setting msg_done flag */
-	smp_wmb();
 	tx_msg->msg_done = 1;
 
 out_comp:
 	complete(&tx_msg->tx_complete);
-out_lock:
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
+	acpi_ipmi_msg_put(tx_msg);
 out_msg:
 	ipmi_free_recv_msg(msg);
 }
@@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	ipmi_device = acpi_ipmi_get_selected_smi();
-	if (!ipmi_device)
+	tx_msg = ipmi_msg_alloc();
+	if (!tx_msg)
 		return AE_NOT_EXIST;
-
-	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
-	if (!tx_msg) {
-		status = AE_NO_MEMORY;
-		goto out_ref;
-	}
+	ipmi_device = tx_msg->device;
 
 	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
-		status = AE_TYPE;
-		goto out_msg;
+		ipmi_msg_release(tx_msg);
+		return AE_TYPE;
 	}
 
+	acpi_ipmi_msg_get(tx_msg);
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
@@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 				   NULL, 0, 0, 0);
 	if (err) {
 		status = AE_ERROR;
-		goto out_list;
+		goto out_msg;
 	}
 	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
 					       IPMI_TIMEOUT);
 	acpi_format_ipmi_response(tx_msg, value, rem_time);
 	status = AE_OK;
 
-out_list:
-	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
-	list_del(&tx_msg->head);
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
-	kfree(tx_msg);
-out_ref:
-	acpi_ipmi_dev_put(ipmi_device);
+	ipmi_cancel_tx_msg(ipmi_device, tx_msg);
+	acpi_ipmi_msg_put(tx_msg);
 	return status;
 }
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:10     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes a member of the acpi_ipmi_device - smi_data which is not
   actually used.
2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
   by dev_warn() invocations, so changes it to struct device.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 0ee1ea6..7f93ffd 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -63,11 +63,10 @@ struct acpi_ipmi_device {
 	struct list_head tx_msg_list;
 	spinlock_t	tx_msg_lock;
 	acpi_handle handle;
-	struct pnp_dev *pnp_dev;
+	struct device *dev;
 	ipmi_user_t	user_interface;
 	int ipmi_ifnum; /* IPMI interface number */
 	long curr_msgid;
-	struct ipmi_smi_info smi_data;
 	atomic_t refcnt;
 };
 
@@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {
 };
 
 static struct acpi_ipmi_device *
-ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
 {
 	struct acpi_ipmi_device *ipmi_device;
 	int err;
@@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
 	spin_lock_init(&ipmi_device->tx_msg_lock);
 
 	ipmi_device->handle = handle;
-	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
-	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+	ipmi_device->dev = get_device(pdev);
 	ipmi_device->ipmi_ifnum = iface;
 
 	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
 			       ipmi_device, &user);
 	if (err) {
-		put_device(smi_data->dev);
+		put_device(pdev);
 		kfree(ipmi_device);
 		return NULL;
 	}
@@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
 static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
 {
 	ipmi_destroy_user(ipmi_device->user_interface);
-	put_device(ipmi_device->smi_data.dev);
+	put_device(ipmi_device->dev);
 	kfree(ipmi_device);
 }
 
@@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	buffer = (struct acpi_ipmi_buffer *)value;
 	/* copy the tx message data */
 	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
-		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+		dev_WARN_ONCE(tx_msg->device->dev, true,
 			      "Unexpected request (msg len %d).\n",
 			      buffer->length);
 		return -EINVAL;
@@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	struct acpi_ipmi_device *ipmi_device = user_msg_data;
 	int msg_found = 0;
 	struct acpi_ipmi_msg *tx_msg;
-	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+	struct device *dev = ipmi_device->dev;
 	unsigned long flags;
 
 	if (msg->user != ipmi_device->user_interface) {
-		dev_warn(&pnp_dev->dev,
+		dev_warn(dev,
 			 "Unexpected response is returned. returned user %p, expected user %p\n",
 			 msg->user, ipmi_device->user_interface);
 		goto out_msg;
@@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
-		dev_warn(&pnp_dev->dev,
+		dev_warn(dev,
 			 "Unexpected response (msg id %ld) is returned.\n",
 			 msg->msgid);
 		goto out_msg;
@@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 
 	/* copy the response data to Rx_data buffer */
 	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
-		dev_WARN_ONCE(&pnp_dev->dev, true,
+		dev_WARN_ONCE(dev, true,
 			      "Unexpected response (msg len %d).\n",
 			      msg->msg.data_len);
 		goto out_comp;
@@ -431,7 +429,7 @@ out_msg:
 static void ipmi_register_bmc(int iface, struct device *dev)
 {
 	struct acpi_ipmi_device *ipmi_device, *temp;
-	struct pnp_dev *pnp_dev;
+	struct device *pdev;
 	int err;
 	struct ipmi_smi_info smi_data;
 	acpi_handle handle;
@@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 	handle = smi_data.addr_info.acpi_info.acpi_handle;
 	if (!handle)
 		goto err_ref;
-	pnp_dev = to_pnp_dev(smi_data.dev);
+	pdev = smi_data.dev;
 
-	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
 	if (!ipmi_device) {
-		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+		dev_warn(pdev, "Can't create IPMI user interface\n");
 		goto err_ref;
 	}
 
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
@ 2013-07-23  8:10     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes a member of the acpi_ipmi_device - smi_data which is not
   actually used.
2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
   by dev_warn() invocations, so changes it to struct device.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 0ee1ea6..7f93ffd 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -63,11 +63,10 @@ struct acpi_ipmi_device {
 	struct list_head tx_msg_list;
 	spinlock_t	tx_msg_lock;
 	acpi_handle handle;
-	struct pnp_dev *pnp_dev;
+	struct device *dev;
 	ipmi_user_t	user_interface;
 	int ipmi_ifnum; /* IPMI interface number */
 	long curr_msgid;
-	struct ipmi_smi_info smi_data;
 	atomic_t refcnt;
 };
 
@@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {
 };
 
 static struct acpi_ipmi_device *
-ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
 {
 	struct acpi_ipmi_device *ipmi_device;
 	int err;
@@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
 	spin_lock_init(&ipmi_device->tx_msg_lock);
 
 	ipmi_device->handle = handle;
-	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
-	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+	ipmi_device->dev = get_device(pdev);
 	ipmi_device->ipmi_ifnum = iface;
 
 	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
 			       ipmi_device, &user);
 	if (err) {
-		put_device(smi_data->dev);
+		put_device(pdev);
 		kfree(ipmi_device);
 		return NULL;
 	}
@@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
 static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
 {
 	ipmi_destroy_user(ipmi_device->user_interface);
-	put_device(ipmi_device->smi_data.dev);
+	put_device(ipmi_device->dev);
 	kfree(ipmi_device);
 }
 
@@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	buffer = (struct acpi_ipmi_buffer *)value;
 	/* copy the tx message data */
 	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
-		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+		dev_WARN_ONCE(tx_msg->device->dev, true,
 			      "Unexpected request (msg len %d).\n",
 			      buffer->length);
 		return -EINVAL;
@@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	struct acpi_ipmi_device *ipmi_device = user_msg_data;
 	int msg_found = 0;
 	struct acpi_ipmi_msg *tx_msg;
-	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+	struct device *dev = ipmi_device->dev;
 	unsigned long flags;
 
 	if (msg->user != ipmi_device->user_interface) {
-		dev_warn(&pnp_dev->dev,
+		dev_warn(dev,
 			 "Unexpected response is returned. returned user %p, expected user %p\n",
 			 msg->user, ipmi_device->user_interface);
 		goto out_msg;
@@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
-		dev_warn(&pnp_dev->dev,
+		dev_warn(dev,
 			 "Unexpected response (msg id %ld) is returned.\n",
 			 msg->msgid);
 		goto out_msg;
@@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 
 	/* copy the response data to Rx_data buffer */
 	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
-		dev_WARN_ONCE(&pnp_dev->dev, true,
+		dev_WARN_ONCE(dev, true,
 			      "Unexpected response (msg len %d).\n",
 			      msg->msg.data_len);
 		goto out_comp;
@@ -431,7 +429,7 @@ out_msg:
 static void ipmi_register_bmc(int iface, struct device *dev)
 {
 	struct acpi_ipmi_device *ipmi_device, *temp;
-	struct pnp_dev *pnp_dev;
+	struct device *pdev;
 	int err;
 	struct ipmi_smi_info smi_data;
 	acpi_handle handle;
@@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 	handle = smi_data.addr_info.acpi_info.acpi_handle;
 	if (!handle)
 		goto err_ref;
-	pnp_dev = to_pnp_dev(smi_data.dev);
+	pdev = smi_data.dev;
 
-	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
 	if (!ipmi_device) {
-		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+		dev_warn(pdev, "Can't create IPMI user interface\n");
 		goto err_ref;
 	}
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 09/13] ACPI/IPMI: Cleanup some initialization codes
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:10     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch.
1. Changes dynamic mutex initialization to static initialization.
2. Removes one acpi_ipmi_init() variable initialization as it is not
   needed.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 7f93ffd..2d31003 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -128,6 +128,7 @@ static struct ipmi_driver_data driver_data = {
 	.ipmi_hndlrs = {
 		.ipmi_recv_hndl = ipmi_msg_handler,
 	},
+	.ipmi_lock = __MUTEX_INITIALIZER(driver_data.ipmi_lock)
 };
 
 static struct acpi_ipmi_device *
@@ -583,12 +584,10 @@ out_msg:
 
 static int __init acpi_ipmi_init(void)
 {
-	int result = 0;
+	int result;
 
 	if (acpi_disabled)
-		return result;
-
-	mutex_init(&driver_data.ipmi_lock);
+		return 0;
 
 	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
 				      &acpi_ipmi_space_handler,
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 09/13] ACPI/IPMI: Cleanup some initialization codes
@ 2013-07-23  8:10     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch.
1. Changes dynamic mutex initialization to static initialization.
2. Removes one acpi_ipmi_init() variable initialization as it is not
   needed.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 7f93ffd..2d31003 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -128,6 +128,7 @@ static struct ipmi_driver_data driver_data = {
 	.ipmi_hndlrs = {
 		.ipmi_recv_hndl = ipmi_msg_handler,
 	},
+	.ipmi_lock = __MUTEX_INITIALIZER(driver_data.ipmi_lock)
 };
 
 static struct acpi_ipmi_device *
@@ -583,12 +584,10 @@ out_msg:
 
 static int __init acpi_ipmi_init(void)
 {
-	int result = 0;
+	int result;
 
 	if (acpi_disabled)
-		return result;
-
-	mutex_init(&driver_data.ipmi_lock);
+		return 0;
 
 	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
 				      &acpi_ipmi_space_handler,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 10/13] ACPI/IPMI: Cleanup some inclusion codes
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:10     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes several useless header inclusions.
2. Kernel codes should always include <linux/acpi.h> instead of
   <acpi/acpi_bus.h> or <acpi/acpi_drivers.h> where many conditional
   declarations are handled.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 2d31003..e56b1f8 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -24,22 +24,9 @@
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  */
 
-#include <linux/kernel.h>
 #include <linux/module.h>
-#include <linux/init.h>
-#include <linux/types.h>
-#include <linux/delay.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/interrupt.h>
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <linux/io.h>
-#include <acpi/acpi_bus.h>
-#include <acpi/acpi_drivers.h>
+#include <linux/acpi.h>
 #include <linux/ipmi.h>
-#include <linux/device.h>
-#include <linux/pnp.h>
 #include <linux/spinlock.h>
 
 MODULE_AUTHOR("Zhao Yakui");
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 10/13] ACPI/IPMI: Cleanup some inclusion codes
@ 2013-07-23  8:10     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes several useless header inclusions.
2. Kernel codes should always include <linux/acpi.h> instead of
   <acpi/acpi_bus.h> or <acpi/acpi_drivers.h> where many conditional
   declarations are handled.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 2d31003..e56b1f8 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -24,22 +24,9 @@
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  */
 
-#include <linux/kernel.h>
 #include <linux/module.h>
-#include <linux/init.h>
-#include <linux/types.h>
-#include <linux/delay.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/interrupt.h>
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <linux/io.h>
-#include <acpi/acpi_bus.h>
-#include <acpi/acpi_drivers.h>
+#include <linux/acpi.h>
 #include <linux/ipmi.h>
-#include <linux/device.h>
-#include <linux/pnp.h>
 #include <linux/spinlock.h>
 
 MODULE_AUTHOR("Zhao Yakui");
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 11/13] ACPI/IPMI: Cleanup some Kconfig codes
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:10     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes duplicate Kconfig dependency as there is "if IPMI_HANDLER"
   around "IPMI_SI".

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/Kconfig |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 3278a21..d129869 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -181,9 +181,10 @@ config ACPI_PROCESSOR
 
 	  To compile this driver as a module, choose M here:
 	  the module will be called processor.
+
 config ACPI_IPMI
 	tristate "IPMI"
-	depends on IPMI_SI && IPMI_HANDLER
+	depends on IPMI_SI
 	default n
 	help
 	  This driver enables the ACPI to access the BMC controller. And it
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 11/13] ACPI/IPMI: Cleanup some Kconfig codes
@ 2013-07-23  8:10     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes duplicate Kconfig dependency as there is "if IPMI_HANDLER"
   around "IPMI_SI".

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/Kconfig |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 3278a21..d129869 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -181,9 +181,10 @@ config ACPI_PROCESSOR
 
 	  To compile this driver as a module, choose M here:
 	  the module will be called processor.
+
 config ACPI_IPMI
 	tristate "IPMI"
-	depends on IPMI_SI && IPMI_HANDLER
+	depends on IPMI_SI
 	default n
 	help
 	  This driver enables the ACPI to access the BMC controller. And it
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 12/13] Testing: Add module load/unload test suite
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:10     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This patch contains two test scripts for module load/unload race testing.

Follow the following steps to use this test suite:
1. Run several instances invoking endless_cat.sh to access the sysfs files
   that exported by the module.
2. Run endless_mod.sh to load/unload the module frequently to see if races
   will happen.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 tools/testing/module-unloading/endless_cat.sh |   32 ++++++++++
 tools/testing/module-unloading/endless_mod.sh |   81 +++++++++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100755 tools/testing/module-unloading/endless_cat.sh
 create mode 100755 tools/testing/module-unloading/endless_mod.sh

diff --git a/tools/testing/module-unloading/endless_cat.sh b/tools/testing/module-unloading/endless_cat.sh
new file mode 100755
index 0000000..72e035c
--- /dev/null
+++ b/tools/testing/module-unloading/endless_cat.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+fatal() {
+	echo $1
+	exit 1
+}
+
+usage() {
+	echo "Usage: `basename $0` <file>"
+	echo "Where:"
+	echo " file: file to cat"
+}
+
+fatal_usage() {
+	usage
+	exit 1
+}
+
+if [ "x$1" = "x" ]; then
+	echo "Missing <file> paraemter."
+	fatal_usage
+fi
+if [ ! -f $1 ]; then
+	echo "$1 is not an accessible file."
+	fatal_usage
+fi
+
+while :
+do
+	cat $1
+	echo "-----------------------------------"
+done
diff --git a/tools/testing/module-unloading/endless_mod.sh b/tools/testing/module-unloading/endless_mod.sh
new file mode 100755
index 0000000..359b0c0
--- /dev/null
+++ b/tools/testing/module-unloading/endless_mod.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+fatal() {
+	echo $1
+	exit 1
+}
+
+usage() {
+	echo "Usage: `basename $0` [-t second] <module>"
+	echo "Where:"
+	echo " second: seconds to sleep between module actions"
+	echo " module: name of module to test"
+}
+
+fatal_usage() {
+	usage
+	exit 1
+}
+
+SLEEPSEC=10
+
+while getopts "t:" opt
+do
+	case $opt in
+	t) SLEEPSEC=$OPTARG;;
+	?) echo "Invalid argument $opt"
+	   fatal_usage;;
+	esac
+done
+shift $(($OPTIND - 1))
+
+if [ "x$1" = "x" ]; then
+	echo "Missing <module> paraemter."
+	fatal_usage
+fi
+
+find_module() {
+	curr_modules=`lsmod | cut -d " " -f1`
+
+	for m in $curr_modules; do
+		if [ "x$m" = "x$1" ]; then
+			return 0
+		fi
+	done
+	return 1
+}
+
+remove_module() {
+	while :
+	do
+		find_module $1
+		if [ $? -eq 0 ]; then
+			rmmod $1
+			echo "Removing $1 ..."
+		else
+			break
+		fi
+	done
+}
+
+insert_module() {
+	while :
+	do
+		find_module $1
+		if [ ! $? -eq 0 ]; then
+			modprobe $1
+			echo "Inserting $1 ..."
+		else
+			break
+		fi
+	done
+}
+
+while :
+do
+	echo "-----------------------------------"
+	insert_module $1
+	sleep $SLEEPSEC
+	remove_module $1
+	sleep $SLEEPSEC
+done
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 12/13] Testing: Add module load/unload test suite
@ 2013-07-23  8:10     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This patch contains two test scripts for module load/unload race testing.

Follow the following steps to use this test suite:
1. Run several instances invoking endless_cat.sh to access the sysfs files
   that exported by the module.
2. Run endless_mod.sh to load/unload the module frequently to see if races
   will happen.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 tools/testing/module-unloading/endless_cat.sh |   32 ++++++++++
 tools/testing/module-unloading/endless_mod.sh |   81 +++++++++++++++++++++++++
 2 files changed, 113 insertions(+)
 create mode 100755 tools/testing/module-unloading/endless_cat.sh
 create mode 100755 tools/testing/module-unloading/endless_mod.sh

diff --git a/tools/testing/module-unloading/endless_cat.sh b/tools/testing/module-unloading/endless_cat.sh
new file mode 100755
index 0000000..72e035c
--- /dev/null
+++ b/tools/testing/module-unloading/endless_cat.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+fatal() {
+	echo $1
+	exit 1
+}
+
+usage() {
+	echo "Usage: `basename $0` <file>"
+	echo "Where:"
+	echo " file: file to cat"
+}
+
+fatal_usage() {
+	usage
+	exit 1
+}
+
+if [ "x$1" = "x" ]; then
+	echo "Missing <file> paraemter."
+	fatal_usage
+fi
+if [ ! -f $1 ]; then
+	echo "$1 is not an accessible file."
+	fatal_usage
+fi
+
+while :
+do
+	cat $1
+	echo "-----------------------------------"
+done
diff --git a/tools/testing/module-unloading/endless_mod.sh b/tools/testing/module-unloading/endless_mod.sh
new file mode 100755
index 0000000..359b0c0
--- /dev/null
+++ b/tools/testing/module-unloading/endless_mod.sh
@@ -0,0 +1,81 @@
+#!/bin/sh
+
+fatal() {
+	echo $1
+	exit 1
+}
+
+usage() {
+	echo "Usage: `basename $0` [-t second] <module>"
+	echo "Where:"
+	echo " second: seconds to sleep between module actions"
+	echo " module: name of module to test"
+}
+
+fatal_usage() {
+	usage
+	exit 1
+}
+
+SLEEPSEC=10
+
+while getopts "t:" opt
+do
+	case $opt in
+	t) SLEEPSEC=$OPTARG;;
+	?) echo "Invalid argument $opt"
+	   fatal_usage;;
+	esac
+done
+shift $(($OPTIND - 1))
+
+if [ "x$1" = "x" ]; then
+	echo "Missing <module> paraemter."
+	fatal_usage
+fi
+
+find_module() {
+	curr_modules=`lsmod | cut -d " " -f1`
+
+	for m in $curr_modules; do
+		if [ "x$m" = "x$1" ]; then
+			return 0
+		fi
+	done
+	return 1
+}
+
+remove_module() {
+	while :
+	do
+		find_module $1
+		if [ $? -eq 0 ]; then
+			rmmod $1
+			echo "Removing $1 ..."
+		else
+			break
+		fi
+	done
+}
+
+insert_module() {
+	while :
+	do
+		find_module $1
+		if [ ! $? -eq 0 ]; then
+			modprobe $1
+			echo "Inserting $1 ..."
+		else
+			break
+		fi
+	done
+}
+
+while :
+do
+	echo "-----------------------------------"
+	insert_module $1
+	sleep $SLEEPSEC
+	remove_module $1
+	sleep $SLEEPSEC
+done
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 13/13] ACPI/IPMI: Add IPMI operation region test device driver
  2013-07-23  8:08   ` Lv Zheng
@ 2013-07-23  8:10     ` Lv Zheng
  -1 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This patch is only used for test purpose and should not be merged by any
public Linux kernel repositories.

This patch contains one driver that can drive a fake test device accessing
IPMI operation region fields.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 drivers/acpi/Kconfig     |   68 +++++++++++++
 drivers/acpi/Makefile    |    1 +
 drivers/acpi/ipmi_test.c |  254 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 323 insertions(+)
 create mode 100644 drivers/acpi/ipmi_test.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index d129869..e3dd3fd 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -377,6 +377,74 @@ config ACPI_BGRT
 	  data from the firmware boot splash. It will appear under
 	  /sys/firmware/acpi/bgrt/ .
 
+config ACPI_IPMI_TEST
+	tristate "IPMI operation region tester"
+	help
+	  This is a test device written for such fake ACPI namespace device.
+	    Device (PMIT)
+	    {
+	        Name (_HID, "ZETA0000")  // _HID: Hardware ID
+	        Name (_STA, 0x0F)  // _STA: Status
+	        OperationRegion (SYSI, IPMI, 0x0600, 0x0100)
+	        Field (SYSI, BufferAcc, Lock, Preserve)
+	        {
+	            AccessAs (BufferAcc, 0x01),
+                   Offset (0x01),
+	            GDIC,   8,	// Get Device ID Command
+	        }
+	        Method (GDIM, 0, NotSerialized)  // GDIM: Get Device ID Method
+	        {
+	            Name (GDIR, Package (0x08)
+	            {
+	                0x00,
+	                0x00,
+	                0x0000,
+	                0x00,
+	                0x00,
+	                Buffer (0x03) {0x00, 0x00, 0x00},
+	                Buffer (0x02) {0x00, 0x00},
+	                0x00000000
+	            })
+	            Name (BUFF, Buffer (0x42) {})
+	            CreateByteField (BUFF, 0x00, STAT)
+	            CreateByteField (BUFF, 0x01, LENG)
+	            CreateByteField (BUFF, 0x02, CMPC)
+	            CreateByteField (BUFF, 0x03, DID)
+	            CreateByteField (BUFF, 0x04, DREV)
+	            CreateWordField (BUFF, 0x05, FREV)
+	            CreateByteField (BUFF, 0x07, SREV)
+	            CreateByteField (BUFF, 0x08, ADS)
+	            CreateByteField (BUFF, 0x09, VID0)
+	            CreateByteField (BUFF, 0x0A, VID1)
+	            CreateByteField (BUFF, 0x0B, VID2)
+	            CreateByteField (BUFF, 0x0C, PID0)
+	            CreateByteField (BUFF, 0x0D, PID1)
+	            CreateDWordField (BUFF, 0x0E, AFRI)
+	            Store (0x00, LENG)
+	            Store (Store (BUFF, GDIC), BUFF)
+	            If (LAnd (LEqual (STAT, 0x00), LEqual (CMPC, 0x00)))
+	            {
+	                Name (VBUF, Buffer (0x03) { 0x00, 0x00, 0x00 })
+	                Name (PBUF, Buffer (0x02) { 0x00, 0x00 })
+	                Store (DID, Index (GDIR, 0x00))
+	                Store (DREV, Index (GDIR, 0x01))
+	                Store (FREV, Index (GDIR, 0x02))
+	                Store (SREV, Index (GDIR, 0x03))
+	                Store (ADS, Index (GDIR, 0x04))
+	                Store (VID0, Index (VBUF, 0x00))
+	                Store (VID1, Index (VBUF, 0x01))
+	                Store (VID2, Index (VBUF, 0x02))
+	                Store (VBUF, Index (GDIR, 0x05))
+	                Store (PID0, Index (PBUF, 0x00))
+	                Store (PID1, Index (PBUF, 0x01))
+	                Store (PBUF, Index (GDIR, 0x06))
+	                Store (AFRI, Index (GDIR, 0x07))
+	            }
+	            Return (GDIR)
+	        }
+	    }
+	  It is for validation purpose, only calls "Get Device ID" command.
+
 source "drivers/acpi/apei/Kconfig"
 
 endif	# ACPI
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 81dbeb8..1476623 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -74,6 +74,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_CUSTOM_METHOD)+= custom_method.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_I2C)		+= acpi_i2c.o
+obj-$(CONFIG_ACPI_IPMI_TEST)	+= ipmi_test.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o processor_throttling.o
diff --git a/drivers/acpi/ipmi_test.c b/drivers/acpi/ipmi_test.c
new file mode 100644
index 0000000..5d144e4
--- /dev/null
+++ b/drivers/acpi/ipmi_test.c
@@ -0,0 +1,254 @@
+/*
+ * An IPMI operation region tester driver
+ *
+ * Copyright (C) 2013 Intel Corporation
+ *   Author: Lv Zheng <lv.zheng@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <linux/module.h>
+#include <linux/acpi.h>
+
+#define ACPI_IPMI_TEST_NAME		"ipmi_test"
+ACPI_MODULE_NAME(ACPI_IPMI_TEST_NAME);
+
+#define ACPI_IPMI_TEST_DEVICE		"IPMI Test"
+#define ACPI_IPMI_TEST_CLASS		"ipmi_tester"
+
+static const struct acpi_device_id acpi_ipmi_test_ids[] = {
+	{"ZETA0000", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, acpi_ipmi_test_ids);
+
+struct acpi_ipmi_device_id {
+	u64	device_id;
+	u64	device_rev;
+	u64	firmware_rev;
+	u64	ipmi_version;
+	u64	additional_dev_support;
+	u8	*vendor_id;
+	u8	*product_id;
+	u64	aux_firm_rev_info;
+	u8	extra_buf[5];
+} __packed;
+
+struct acpi_ipmi_tester {
+	struct acpi_device	*adev;
+	acpi_bus_id		name;
+	struct acpi_ipmi_device_id	device_id;
+	int			registered_group;
+};
+
+#define ipmi_err(tester, fmt, ...)	\
+	dev_err(&(tester)->adev->dev, fmt, ##__VA_ARGS__)
+#define ipmi_info(tester, fmt, ...)	\
+	dev_info(&(tester)->adev->dev, fmt, ##__VA_ARGS__)
+#define IPMI_ACPI_HANDLE(tester)	((tester)->adev->handle)
+
+static int acpi_ipmi_update_device_id(struct acpi_ipmi_tester *tester)
+{
+	int res = 0;
+	acpi_status status;
+	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+	struct acpi_buffer format = { sizeof("NNNNNBBN"), "NNNNNBBN" };
+	struct acpi_buffer device_id = { 0, NULL };
+	union acpi_object *did;
+
+	status = acpi_evaluate_object(IPMI_ACPI_HANDLE(tester), "GDIM", NULL,
+				      &buffer);
+	if (ACPI_FAILURE(status) || !buffer.pointer) {
+		ipmi_err(tester, "Evaluating GDIM, status - %d\n", status);
+		return -ENODEV;
+	}
+
+	did = buffer.pointer;
+	if (did->type != ACPI_TYPE_PACKAGE || did->package.count != 8) {
+		ipmi_err(tester, "Invalid GDIM data, type - %d, count - %d\n",
+			 did->type, did->package.count);
+		res = -EFAULT;
+		goto err_buf;
+	}
+
+	device_id.length = sizeof(struct acpi_ipmi_device_id);
+	device_id.pointer = &tester->device_id;
+
+	status = acpi_extract_package(did, &format, &device_id);
+	if (ACPI_FAILURE(status)) {
+		ipmi_err(tester, "Invalid GDIM data\n");
+		res = -EFAULT;
+		goto err_buf;
+	}
+
+err_buf:
+	kfree(buffer.pointer);
+	return res;
+}
+
+static ssize_t show_device_id(struct device *device,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.device_id);
+}
+
+static ssize_t show_device_rev(struct device *device,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.device_rev);
+}
+
+static ssize_t show_firmware_rev(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.firmware_rev);
+}
+
+static ssize_t show_ipmi_version(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.ipmi_version);
+}
+
+static ssize_t show_vendor_id(struct device *device,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%02x %02x %02x\n",
+		       tester->device_id.vendor_id[0],
+		       tester->device_id.vendor_id[1],
+		       tester->device_id.vendor_id[2]);
+}
+
+static ssize_t show_product_id(struct device *device,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%02x %02x\n",
+		       tester->device_id.product_id[0],
+		       tester->device_id.product_id[1]);
+}
+
+static DEVICE_ATTR(device_id, S_IRUGO, show_device_id, NULL);
+static DEVICE_ATTR(device_rev, S_IRUGO, show_device_rev, NULL);
+static DEVICE_ATTR(firmware_rev, S_IRUGO, show_firmware_rev, NULL);
+static DEVICE_ATTR(ipmi_version, S_IRUGO, show_ipmi_version, NULL);
+static DEVICE_ATTR(vendor_id, S_IRUGO, show_vendor_id, NULL);
+static DEVICE_ATTR(product_id, S_IRUGO, show_product_id, NULL);
+
+static struct attribute *acpi_ipmi_test_attrs[] = {
+	&dev_attr_device_id.attr,
+	&dev_attr_device_rev.attr,
+	&dev_attr_firmware_rev.attr,
+	&dev_attr_ipmi_version.attr,
+	&dev_attr_vendor_id.attr,
+	&dev_attr_product_id.attr,
+	NULL,
+};
+
+static struct attribute_group acpi_ipmi_test_group = {
+	.attrs	= acpi_ipmi_test_attrs,
+};
+
+static int acpi_ipmi_test_add(struct acpi_device *device)
+{
+	struct acpi_ipmi_tester *tester;
+
+	if (!device)
+		return -EINVAL;
+
+	tester = kzalloc(sizeof(struct acpi_ipmi_tester), GFP_KERNEL);
+	if (!tester)
+		return -ENOMEM;
+
+	tester->adev = device;
+	strcpy(acpi_device_name(device), ACPI_IPMI_TEST_DEVICE);
+	strcpy(acpi_device_class(device), ACPI_IPMI_TEST_CLASS);
+	device->driver_data = tester;
+	if (sysfs_create_group(&device->dev.kobj, &acpi_ipmi_test_group) == 0)
+		tester->registered_group = 1;
+
+	return 0;
+}
+
+static int acpi_ipmi_test_remove(struct acpi_device *device)
+{
+	struct acpi_ipmi_tester *tester;
+
+	if (!device || !acpi_driver_data(device))
+		return -EINVAL;
+
+	tester = acpi_driver_data(device);
+
+	if (tester->registered_group)
+		sysfs_remove_group(&device->dev.kobj, &acpi_ipmi_test_group);
+
+	kfree(tester);
+	return 0;
+}
+
+static struct acpi_driver acpi_ipmi_test_driver = {
+	.name = "ipmi_test",
+	.class = ACPI_IPMI_TEST_CLASS,
+	.ids = acpi_ipmi_test_ids,
+	.ops = {
+		.add = acpi_ipmi_test_add,
+		.remove = acpi_ipmi_test_remove,
+	},
+};
+
+static int __init acpi_ipmi_test_init(void)
+{
+	return acpi_bus_register_driver(&acpi_ipmi_test_driver);
+}
+module_init(acpi_ipmi_test_init);
+
+static void __exit acpi_ipmi_test_exit(void)
+{
+	acpi_bus_unregister_driver(&acpi_ipmi_test_driver);
+}
+module_exit(acpi_ipmi_test_exit);
+
+MODULE_AUTHOR("Lv Zheng <lv.zheng@intel.com>");
+MODULE_DESCRIPTION("ACPI IPMI operation region tester driver");
+MODULE_LICENSE("GPL");
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH 13/13] ACPI/IPMI: Add IPMI operation region test device driver
@ 2013-07-23  8:10     ` Lv Zheng
  0 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-07-23  8:10 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-kernel, linux-acpi, openipmi-developer

This patch is only used for test purpose and should not be merged by any
public Linux kernel repositories.

This patch contains one driver that can drive a fake test device accessing
IPMI operation region fields.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 drivers/acpi/Kconfig     |   68 +++++++++++++
 drivers/acpi/Makefile    |    1 +
 drivers/acpi/ipmi_test.c |  254 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 323 insertions(+)
 create mode 100644 drivers/acpi/ipmi_test.c

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index d129869..e3dd3fd 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -377,6 +377,74 @@ config ACPI_BGRT
 	  data from the firmware boot splash. It will appear under
 	  /sys/firmware/acpi/bgrt/ .
 
+config ACPI_IPMI_TEST
+	tristate "IPMI operation region tester"
+	help
+	  This is a test device written for such fake ACPI namespace device.
+	    Device (PMIT)
+	    {
+	        Name (_HID, "ZETA0000")  // _HID: Hardware ID
+	        Name (_STA, 0x0F)  // _STA: Status
+	        OperationRegion (SYSI, IPMI, 0x0600, 0x0100)
+	        Field (SYSI, BufferAcc, Lock, Preserve)
+	        {
+	            AccessAs (BufferAcc, 0x01),
+                   Offset (0x01),
+	            GDIC,   8,	// Get Device ID Command
+	        }
+	        Method (GDIM, 0, NotSerialized)  // GDIM: Get Device ID Method
+	        {
+	            Name (GDIR, Package (0x08)
+	            {
+	                0x00,
+	                0x00,
+	                0x0000,
+	                0x00,
+	                0x00,
+	                Buffer (0x03) {0x00, 0x00, 0x00},
+	                Buffer (0x02) {0x00, 0x00},
+	                0x00000000
+	            })
+	            Name (BUFF, Buffer (0x42) {})
+	            CreateByteField (BUFF, 0x00, STAT)
+	            CreateByteField (BUFF, 0x01, LENG)
+	            CreateByteField (BUFF, 0x02, CMPC)
+	            CreateByteField (BUFF, 0x03, DID)
+	            CreateByteField (BUFF, 0x04, DREV)
+	            CreateWordField (BUFF, 0x05, FREV)
+	            CreateByteField (BUFF, 0x07, SREV)
+	            CreateByteField (BUFF, 0x08, ADS)
+	            CreateByteField (BUFF, 0x09, VID0)
+	            CreateByteField (BUFF, 0x0A, VID1)
+	            CreateByteField (BUFF, 0x0B, VID2)
+	            CreateByteField (BUFF, 0x0C, PID0)
+	            CreateByteField (BUFF, 0x0D, PID1)
+	            CreateDWordField (BUFF, 0x0E, AFRI)
+	            Store (0x00, LENG)
+	            Store (Store (BUFF, GDIC), BUFF)
+	            If (LAnd (LEqual (STAT, 0x00), LEqual (CMPC, 0x00)))
+	            {
+	                Name (VBUF, Buffer (0x03) { 0x00, 0x00, 0x00 })
+	                Name (PBUF, Buffer (0x02) { 0x00, 0x00 })
+	                Store (DID, Index (GDIR, 0x00))
+	                Store (DREV, Index (GDIR, 0x01))
+	                Store (FREV, Index (GDIR, 0x02))
+	                Store (SREV, Index (GDIR, 0x03))
+	                Store (ADS, Index (GDIR, 0x04))
+	                Store (VID0, Index (VBUF, 0x00))
+	                Store (VID1, Index (VBUF, 0x01))
+	                Store (VID2, Index (VBUF, 0x02))
+	                Store (VBUF, Index (GDIR, 0x05))
+	                Store (PID0, Index (PBUF, 0x00))
+	                Store (PID1, Index (PBUF, 0x01))
+	                Store (PBUF, Index (GDIR, 0x06))
+	                Store (AFRI, Index (GDIR, 0x07))
+	            }
+	            Return (GDIR)
+	        }
+	    }
+	  It is for validation purpose, only calls "Get Device ID" command.
+
 source "drivers/acpi/apei/Kconfig"
 
 endif	# ACPI
diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
index 81dbeb8..1476623 100644
--- a/drivers/acpi/Makefile
+++ b/drivers/acpi/Makefile
@@ -74,6 +74,7 @@ obj-$(CONFIG_ACPI_EC_DEBUGFS)	+= ec_sys.o
 obj-$(CONFIG_ACPI_CUSTOM_METHOD)+= custom_method.o
 obj-$(CONFIG_ACPI_BGRT)		+= bgrt.o
 obj-$(CONFIG_ACPI_I2C)		+= acpi_i2c.o
+obj-$(CONFIG_ACPI_IPMI_TEST)	+= ipmi_test.o
 
 # processor has its own "processor." module_param namespace
 processor-y			:= processor_driver.o processor_throttling.o
diff --git a/drivers/acpi/ipmi_test.c b/drivers/acpi/ipmi_test.c
new file mode 100644
index 0000000..5d144e4
--- /dev/null
+++ b/drivers/acpi/ipmi_test.c
@@ -0,0 +1,254 @@
+/*
+ * An IPMI operation region tester driver
+ *
+ * Copyright (C) 2013 Intel Corporation
+ *   Author: Lv Zheng <lv.zheng@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the
+ * Free Software Foundation; either version 2 of the License, or (at your
+ * option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, write to the Free Software Foundation, Inc.,
+ * 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+
+#include <linux/module.h>
+#include <linux/acpi.h>
+
+#define ACPI_IPMI_TEST_NAME		"ipmi_test"
+ACPI_MODULE_NAME(ACPI_IPMI_TEST_NAME);
+
+#define ACPI_IPMI_TEST_DEVICE		"IPMI Test"
+#define ACPI_IPMI_TEST_CLASS		"ipmi_tester"
+
+static const struct acpi_device_id acpi_ipmi_test_ids[] = {
+	{"ZETA0000", 0},
+	{"", 0},
+};
+MODULE_DEVICE_TABLE(acpi, acpi_ipmi_test_ids);
+
+struct acpi_ipmi_device_id {
+	u64	device_id;
+	u64	device_rev;
+	u64	firmware_rev;
+	u64	ipmi_version;
+	u64	additional_dev_support;
+	u8	*vendor_id;
+	u8	*product_id;
+	u64	aux_firm_rev_info;
+	u8	extra_buf[5];
+} __packed;
+
+struct acpi_ipmi_tester {
+	struct acpi_device	*adev;
+	acpi_bus_id		name;
+	struct acpi_ipmi_device_id	device_id;
+	int			registered_group;
+};
+
+#define ipmi_err(tester, fmt, ...)	\
+	dev_err(&(tester)->adev->dev, fmt, ##__VA_ARGS__)
+#define ipmi_info(tester, fmt, ...)	\
+	dev_info(&(tester)->adev->dev, fmt, ##__VA_ARGS__)
+#define IPMI_ACPI_HANDLE(tester)	((tester)->adev->handle)
+
+static int acpi_ipmi_update_device_id(struct acpi_ipmi_tester *tester)
+{
+	int res = 0;
+	acpi_status status;
+	struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER, NULL };
+	struct acpi_buffer format = { sizeof("NNNNNBBN"), "NNNNNBBN" };
+	struct acpi_buffer device_id = { 0, NULL };
+	union acpi_object *did;
+
+	status = acpi_evaluate_object(IPMI_ACPI_HANDLE(tester), "GDIM", NULL,
+				      &buffer);
+	if (ACPI_FAILURE(status) || !buffer.pointer) {
+		ipmi_err(tester, "Evaluating GDIM, status - %d\n", status);
+		return -ENODEV;
+	}
+
+	did = buffer.pointer;
+	if (did->type != ACPI_TYPE_PACKAGE || did->package.count != 8) {
+		ipmi_err(tester, "Invalid GDIM data, type - %d, count - %d\n",
+			 did->type, did->package.count);
+		res = -EFAULT;
+		goto err_buf;
+	}
+
+	device_id.length = sizeof(struct acpi_ipmi_device_id);
+	device_id.pointer = &tester->device_id;
+
+	status = acpi_extract_package(did, &format, &device_id);
+	if (ACPI_FAILURE(status)) {
+		ipmi_err(tester, "Invalid GDIM data\n");
+		res = -EFAULT;
+		goto err_buf;
+	}
+
+err_buf:
+	kfree(buffer.pointer);
+	return res;
+}
+
+static ssize_t show_device_id(struct device *device,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.device_id);
+}
+
+static ssize_t show_device_rev(struct device *device,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.device_rev);
+}
+
+static ssize_t show_firmware_rev(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.firmware_rev);
+}
+
+static ssize_t show_ipmi_version(struct device *device,
+				 struct device_attribute *attr,
+				 char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%llu\n", tester->device_id.ipmi_version);
+}
+
+static ssize_t show_vendor_id(struct device *device,
+			      struct device_attribute *attr,
+			      char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%02x %02x %02x\n",
+		       tester->device_id.vendor_id[0],
+		       tester->device_id.vendor_id[1],
+		       tester->device_id.vendor_id[2]);
+}
+
+static ssize_t show_product_id(struct device *device,
+			       struct device_attribute *attr,
+			       char *buf)
+{
+	struct acpi_device *adev = to_acpi_device(device);
+	struct acpi_ipmi_tester *tester = adev->driver_data;
+
+	acpi_ipmi_update_device_id(tester);
+	return sprintf(buf, "%02x %02x\n",
+		       tester->device_id.product_id[0],
+		       tester->device_id.product_id[1]);
+}
+
+static DEVICE_ATTR(device_id, S_IRUGO, show_device_id, NULL);
+static DEVICE_ATTR(device_rev, S_IRUGO, show_device_rev, NULL);
+static DEVICE_ATTR(firmware_rev, S_IRUGO, show_firmware_rev, NULL);
+static DEVICE_ATTR(ipmi_version, S_IRUGO, show_ipmi_version, NULL);
+static DEVICE_ATTR(vendor_id, S_IRUGO, show_vendor_id, NULL);
+static DEVICE_ATTR(product_id, S_IRUGO, show_product_id, NULL);
+
+static struct attribute *acpi_ipmi_test_attrs[] = {
+	&dev_attr_device_id.attr,
+	&dev_attr_device_rev.attr,
+	&dev_attr_firmware_rev.attr,
+	&dev_attr_ipmi_version.attr,
+	&dev_attr_vendor_id.attr,
+	&dev_attr_product_id.attr,
+	NULL,
+};
+
+static struct attribute_group acpi_ipmi_test_group = {
+	.attrs	= acpi_ipmi_test_attrs,
+};
+
+static int acpi_ipmi_test_add(struct acpi_device *device)
+{
+	struct acpi_ipmi_tester *tester;
+
+	if (!device)
+		return -EINVAL;
+
+	tester = kzalloc(sizeof(struct acpi_ipmi_tester), GFP_KERNEL);
+	if (!tester)
+		return -ENOMEM;
+
+	tester->adev = device;
+	strcpy(acpi_device_name(device), ACPI_IPMI_TEST_DEVICE);
+	strcpy(acpi_device_class(device), ACPI_IPMI_TEST_CLASS);
+	device->driver_data = tester;
+	if (sysfs_create_group(&device->dev.kobj, &acpi_ipmi_test_group) == 0)
+		tester->registered_group = 1;
+
+	return 0;
+}
+
+static int acpi_ipmi_test_remove(struct acpi_device *device)
+{
+	struct acpi_ipmi_tester *tester;
+
+	if (!device || !acpi_driver_data(device))
+		return -EINVAL;
+
+	tester = acpi_driver_data(device);
+
+	if (tester->registered_group)
+		sysfs_remove_group(&device->dev.kobj, &acpi_ipmi_test_group);
+
+	kfree(tester);
+	return 0;
+}
+
+static struct acpi_driver acpi_ipmi_test_driver = {
+	.name = "ipmi_test",
+	.class = ACPI_IPMI_TEST_CLASS,
+	.ids = acpi_ipmi_test_ids,
+	.ops = {
+		.add = acpi_ipmi_test_add,
+		.remove = acpi_ipmi_test_remove,
+	},
+};
+
+static int __init acpi_ipmi_test_init(void)
+{
+	return acpi_bus_register_driver(&acpi_ipmi_test_driver);
+}
+module_init(acpi_ipmi_test_init);
+
+static void __exit acpi_ipmi_test_exit(void)
+{
+	acpi_bus_unregister_driver(&acpi_ipmi_test_driver);
+}
+module_exit(acpi_ipmi_test_exit);
+
+MODULE_AUTHOR("Lv Zheng <lv.zheng@intel.com>");
+MODULE_DESCRIPTION("ACPI IPMI operation region tester driver");
+MODULE_LICENSE("GPL");
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow
  2013-07-23  8:08     ` Lv Zheng
  (?)
@ 2013-07-23 14:54     ` Greg KH
  2013-07-24  0:21         ` Zheng, Lv
  2013-07-24  0:44         ` Zheng, Lv
  -1 siblings, 2 replies; 99+ messages in thread
From: Greg KH @ 2013-07-23 14:54 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> This patch enhances sanity checks on message size to avoid potential buffer
> overflow.
> 
> The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while the
> ACPI specification defined IPMI message size is 64 bytes.  The difference
> is not handled by the original codes.  This may cause crash in the response
> handling codes.
> This patch fixes this gap and also combines rx_data/tx_data to use single
> data/len pair since they need not be seperated.
> 
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> Reviewed-by: Huang Ying <ying.huang@intel.com>
> ---
>  drivers/acpi/acpi_ipmi.c |  100 ++++++++++++++++++++++++++++------------------
>  1 file changed, 61 insertions(+), 39 deletions(-)

<formletter>

This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
for how to do this properly.

</formletter>

Same goes for the other patches you sent in this thread...

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow
  2013-07-23 14:54     ` Greg KH
@ 2013-07-24  0:21         ` Zheng, Lv
  2013-07-24  0:44         ` Zheng, Lv
  1 sibling, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-24  0:21 UTC (permalink / raw)
  To: Greg KH
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Greg KH
> Sent: Tuesday, July 23, 2013 10:54 PM
> 
> On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> > This patch enhances sanity checks on message size to avoid potential
> > buffer overflow.
> >
> > The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while
> > the ACPI specification defined IPMI message size is 64 bytes.  The
> > difference is not handled by the original codes.  This may cause crash
> > in the response handling codes.
> > This patch fixes this gap and also combines rx_data/tx_data to use
> > single data/len pair since they need not be seperated.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |  100
> > ++++++++++++++++++++++++++++------------------
> >  1 file changed, 61 insertions(+), 39 deletions(-)
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the stable kernel
> tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
> 
> </formletter>
> 
> Same goes for the other patches you sent in this thread...

OK, I'll add prerequisites for each that want to be accepted by the stable queue and re-send them (PATCH 01-06).

Thanks and best regards
-Lv

> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow
@ 2013-07-24  0:21         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-24  0:21 UTC (permalink / raw)
  To: Greg KH
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Greg KH
> Sent: Tuesday, July 23, 2013 10:54 PM
> 
> On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> > This patch enhances sanity checks on message size to avoid potential
> > buffer overflow.
> >
> > The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while
> > the ACPI specification defined IPMI message size is 64 bytes.  The
> > difference is not handled by the original codes.  This may cause crash
> > in the response handling codes.
> > This patch fixes this gap and also combines rx_data/tx_data to use
> > single data/len pair since they need not be seperated.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |  100
> > ++++++++++++++++++++++++++++------------------
> >  1 file changed, 61 insertions(+), 39 deletions(-)
> 
> <formletter>
> 
> This is not the correct way to submit patches for inclusion in the stable kernel
> tree.  Please read Documentation/stable_kernel_rules.txt
> for how to do this properly.
> 
> </formletter>
> 
> Same goes for the other patches you sent in this thread...

OK, I'll add prerequisites for each that want to be accepted by the stable queue and re-send them (PATCH 01-06).

Thanks and best regards
-Lv

> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow
  2013-07-23 14:54     ` Greg KH
@ 2013-07-24  0:44         ` Zheng, Lv
  2013-07-24  0:44         ` Zheng, Lv
  1 sibling, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-24  0:44 UTC (permalink / raw)
  To: Greg KH
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

> From: Zheng, Lv
> Sent: Wednesday, July 24, 2013 8:22 AM
> 
> > From: linux-acpi-owner@vger.kernel.org
> > [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Greg KH
> > Sent: Tuesday, July 23, 2013 10:54 PM
> >
> > On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> > > This patch enhances sanity checks on message size to avoid potential
> > > buffer overflow.
> > >
> > > The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while
> > > the ACPI specification defined IPMI message size is 64 bytes.  The
> > > difference is not handled by the original codes.  This may cause
> > > crash in the response handling codes.
> > > This patch fixes this gap and also combines rx_data/tx_data to use
> > > single data/len pair since they need not be seperated.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |  100
> > > ++++++++++++++++++++++++++++------------------
> > >  1 file changed, 61 insertions(+), 39 deletions(-)
> >
> > <formletter>
> >
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> > for how to do this properly.
> >
> > </formletter>
> >
> > Same goes for the other patches you sent in this thread...
> 
> OK, I'll add prerequisites for each that want to be accepted by the stable queue
> and re-send them (PATCH 01-06).

Maybe I shouldn't.
I looks it is not possible to add commit ID prerequisites for patch series that has not been accepted by the mainline.
As the patches haven't been merged by the mainline, it is likely that the commit IDs in this series will be changed.
Please ignore [PATCH 01-06] that have been sent to the stable mailing list.
I'll just let ACPI maintainers know which patches I think that can go for stable tree and let they make the decision after the mainline acceptance.

Thanks and best regards
-Lv

> 
> Thanks and best regards
> -Lv
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow
@ 2013-07-24  0:44         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-24  0:44 UTC (permalink / raw)
  To: Greg KH
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

> From: Zheng, Lv
> Sent: Wednesday, July 24, 2013 8:22 AM
> 
> > From: linux-acpi-owner@vger.kernel.org
> > [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Greg KH
> > Sent: Tuesday, July 23, 2013 10:54 PM
> >
> > On Tue, Jul 23, 2013 at 04:08:59PM +0800, Lv Zheng wrote:
> > > This patch enhances sanity checks on message size to avoid potential
> > > buffer overflow.
> > >
> > > The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while
> > > the ACPI specification defined IPMI message size is 64 bytes.  The
> > > difference is not handled by the original codes.  This may cause
> > > crash in the response handling codes.
> > > This patch fixes this gap and also combines rx_data/tx_data to use
> > > single data/len pair since they need not be seperated.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |  100
> > > ++++++++++++++++++++++++++++------------------
> > >  1 file changed, 61 insertions(+), 39 deletions(-)
> >
> > <formletter>
> >
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree.  Please read Documentation/stable_kernel_rules.txt
> > for how to do this properly.
> >
> > </formletter>
> >
> > Same goes for the other patches you sent in this thread...
> 
> OK, I'll add prerequisites for each that want to be accepted by the stable queue
> and re-send them (PATCH 01-06).

Maybe I shouldn't.
I looks it is not possible to add commit ID prerequisites for patch series that has not been accepted by the mainline.
As the patches haven't been merged by the mainline, it is likely that the commit IDs in this series will be changed.
Please ignore [PATCH 01-06] that have been sent to the stable mailing list.
I'll just let ACPI maintainers know which patches I think that can go for stable tree and let they make the decision after the mainline acceptance.

Thanks and best regards
-Lv

> 
> Thanks and best regards
> -Lv
> 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-acpi"
> > in the body of a message to majordomo@vger.kernel.org More majordomo
> > info at http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-23  8:09     ` Lv Zheng
  (?)
@ 2013-07-24 23:38     ` Rafael J. Wysocki
  2013-07-25  3:09         ` Zheng, Lv
  -1 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-24 23:38 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> This patch fixes races caused by unprotected ACPI IPMI transfers.
> 
> We can see the following crashes may occur:
> 1. There is no tx_msg_lock held for iterating tx_msg_list in
>    ipmi_flush_tx_msg() while it is parellel unlinked on failure in
>    acpi_ipmi_space_handler() under protection of tx_msg_lock.
> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
>    while it is parellel accessed in ipmi_flush_tx_msg() and
>    ipmi_msg_handler().
> 
> This patch enhances tx_msg_lock to protect all tx_msg accesses to solve
> this issue.  Then tx_msg_lock is always held around complete() and tx_msg
> accesses.
> Calling smp_wmb() before setting msg_done flag so that messages completed
> due to flushing will not be handled as 'done' messages while their contents
> are not vaild.
> 
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> Cc: Zhao Yakui <yakui.zhao@intel.com>
> Reviewed-by: Huang Ying <ying.huang@intel.com>
> ---
>  drivers/acpi/acpi_ipmi.c |   10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index b37c189..527ee43 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
>  	struct acpi_ipmi_msg *tx_msg, *temp;
>  	int count = HZ / 10;
>  	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> +	unsigned long flags;
>  
> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
>  	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
>  		/* wake up the sleep thread on the Tx msg */
>  		complete(&tx_msg->tx_complete);
>  	}
> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
>  
>  	/* wait for about 100ms to flush the tx message list */
>  	while (count--) {
> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  			break;
>  		}
>  	}
> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>  
>  	if (!msg_found) {
>  		dev_warn(&pnp_dev->dev,
>  			 "Unexpected response (msg id %ld) is returned.\n",
>  			 msg->msgid);
> -		goto out_msg;
> +		goto out_lock;
>  	}
>  
>  	/* copy the response data to Rx_data buffer */
> @@ -286,10 +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  	}
>  	tx_msg->rx_len = msg->msg.data_len;
>  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> +	/* tx_msg content must be valid before setting msg_done flag */
> +	smp_wmb();

That's suspicious.

If you need the write barrier here, you'll most likely need a read barrier
somewhere else.  Where's that?

>  	tx_msg->msg_done = 1;
>  
>  out_comp:
>  	complete(&tx_msg->tx_complete);
> +out_lock:
> +	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>  out_msg:
>  	ipmi_free_recv_msg(msg);
>  }

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-24 23:38     ` Rafael J. Wysocki
@ 2013-07-25  3:09         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-25  3:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Brown, Len, Corey Minyard, Wysocki, Rafael J, linux-kernel, Zhao,
	Yakui, linux-acpi, openipmi-developer

-stable according to the previous conversation.

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Thursday, July 25, 2013 7:38 AM
> 
> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > This patch fixes races caused by unprotected ACPI IPMI transfers.
> >
> > We can see the following crashes may occur:
> > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >    ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >    acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >    while it is parellel accessed in ipmi_flush_tx_msg() and
> >    ipmi_msg_handler().
> >
> > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > solve this issue.  Then tx_msg_lock is always held around complete()
> > and tx_msg accesses.
> > Calling smp_wmb() before setting msg_done flag so that messages
> > completed due to flushing will not be handled as 'done' messages while
> > their contents are not vaild.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > b37c189..527ee43 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> acpi_ipmi_device *ipmi)
> >  	struct acpi_ipmi_msg *tx_msg, *temp;
> >  	int count = HZ / 10;
> >  	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > +	unsigned long flags;
> >
> > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >  	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >  		/* wake up the sleep thread on the Tx msg */
> >  		complete(&tx_msg->tx_complete);
> >  	}
> > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >
> >  	/* wait for about 100ms to flush the tx message list */
> >  	while (count--) {
> > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  			break;
> >  		}
> >  	}
> > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> >  	if (!msg_found) {
> >  		dev_warn(&pnp_dev->dev,
> >  			 "Unexpected response (msg id %ld) is returned.\n",
> >  			 msg->msgid);
> > -		goto out_msg;
> > +		goto out_lock;
> >  	}
> >
> >  	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> *user_msg_data)
> >  	}
> >  	tx_msg->rx_len = msg->msg.data_len;
> >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > +	/* tx_msg content must be valid before setting msg_done flag */
> > +	smp_wmb();
> 
> That's suspicious.
> 
> If you need the write barrier here, you'll most likely need a read barrier
> somewhere else.  Where's that?

It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().

So comment could be treated as 2 parts:
1. do we need a paired smp_rmb().
2. do we need a smp_wmb().

For 1.
If we want a paired smp_rmb(), then it will appear in this function:

186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
187                 acpi_integer *value, int rem_time)
188 {
189         struct acpi_ipmi_buffer *buffer;
190 
191         /*
192          * value is also used as output parameter. It represents the response
193          * IPMI message returned by IPMI command.
194          */
195         buffer = (struct acpi_ipmi_buffer *)value;
196         if (!rem_time && !msg->msg_done) {
197                 buffer->status = ACPI_IPMI_TIMEOUT;
198                 return;
199         }
200         /*
201          * If the flag of msg_done is not set or the recv length is zero, it
202          * means that the IPMI command is not executed correctly.
203          * The status code will be ACPI_IPMI_UNKNOWN.
204          */
205         if (!msg->msg_done || !msg->rx_len) {
206                 buffer->status = ACPI_IPMI_UNKNOWN;
207                 return;
208         }
+         smp_rmb();
209         /*
210          * If the IPMI response message is obtained correctly, the status code
211          * will be ACPI_IPMI_OK
212          */
213         buffer->status = ACPI_IPMI_OK;
214         buffer->length = msg->rx_len;
215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
216 }

If we don't then there will only be msg content not correctly read from msg->rx_data.
Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.

Being without smp_rmb() is also OK in this case, since:
1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].

So IMO, we needn't add the smp_rmb(), what do you think of this?

For 2.
If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.

Thanks and best regards
-Lv

> 
> >  	tx_msg->msg_done = 1;
> >
> >  out_comp:
> >  	complete(&tx_msg->tx_complete);
> > +out_lock:
> > +	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >  out_msg:
> >  	ipmi_free_recv_msg(msg);
> >  }
> 
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-25  3:09         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-25  3:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5930 bytes --]

-stable according to the previous conversation.

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Thursday, July 25, 2013 7:38 AM
> 
> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > This patch fixes races caused by unprotected ACPI IPMI transfers.
> >
> > We can see the following crashes may occur:
> > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >    ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >    acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >    while it is parellel accessed in ipmi_flush_tx_msg() and
> >    ipmi_msg_handler().
> >
> > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > solve this issue.  Then tx_msg_lock is always held around complete()
> > and tx_msg accesses.
> > Calling smp_wmb() before setting msg_done flag so that messages
> > completed due to flushing will not be handled as 'done' messages while
> > their contents are not vaild.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > b37c189..527ee43 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> acpi_ipmi_device *ipmi)
> >  	struct acpi_ipmi_msg *tx_msg, *temp;
> >  	int count = HZ / 10;
> >  	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > +	unsigned long flags;
> >
> > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >  	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >  		/* wake up the sleep thread on the Tx msg */
> >  		complete(&tx_msg->tx_complete);
> >  	}
> > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >
> >  	/* wait for about 100ms to flush the tx message list */
> >  	while (count--) {
> > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  			break;
> >  		}
> >  	}
> > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> >  	if (!msg_found) {
> >  		dev_warn(&pnp_dev->dev,
> >  			 "Unexpected response (msg id %ld) is returned.\n",
> >  			 msg->msgid);
> > -		goto out_msg;
> > +		goto out_lock;
> >  	}
> >
> >  	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> *user_msg_data)
> >  	}
> >  	tx_msg->rx_len = msg->msg.data_len;
> >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > +	/* tx_msg content must be valid before setting msg_done flag */
> > +	smp_wmb();
> 
> That's suspicious.
> 
> If you need the write barrier here, you'll most likely need a read barrier
> somewhere else.  Where's that?

It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().

So comment could be treated as 2 parts:
1. do we need a paired smp_rmb().
2. do we need a smp_wmb().

For 1.
If we want a paired smp_rmb(), then it will appear in this function:

186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
187                 acpi_integer *value, int rem_time)
188 {
189         struct acpi_ipmi_buffer *buffer;
190 
191         /*
192          * value is also used as output parameter. It represents the response
193          * IPMI message returned by IPMI command.
194          */
195         buffer = (struct acpi_ipmi_buffer *)value;
196         if (!rem_time && !msg->msg_done) {
197                 buffer->status = ACPI_IPMI_TIMEOUT;
198                 return;
199         }
200         /*
201          * If the flag of msg_done is not set or the recv length is zero, it
202          * means that the IPMI command is not executed correctly.
203          * The status code will be ACPI_IPMI_UNKNOWN.
204          */
205         if (!msg->msg_done || !msg->rx_len) {
206                 buffer->status = ACPI_IPMI_UNKNOWN;
207                 return;
208         }
+         smp_rmb();
209         /*
210          * If the IPMI response message is obtained correctly, the status code
211          * will be ACPI_IPMI_OK
212          */
213         buffer->status = ACPI_IPMI_OK;
214         buffer->length = msg->rx_len;
215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
216 }

If we don't then there will only be msg content not correctly read from msg->rx_data.
Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.

Being without smp_rmb() is also OK in this case, since:
1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].

So IMO, we needn't add the smp_rmb(), what do you think of this?

For 2.
If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.

Thanks and best regards
-Lv

> 
> >  	tx_msg->msg_done = 1;
> >
> >  out_comp:
> >  	complete(&tx_msg->tx_complete);
> > +out_lock:
> > +	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >  out_msg:
> >  	ipmi_free_recv_msg(msg);
> >  }
> 
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-25  3:09         ` Zheng, Lv
@ 2013-07-25 12:06           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 12:06 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> -stable according to the previous conversation.
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Thursday, July 25, 2013 7:38 AM
> > 
> > On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > > This patch fixes races caused by unprotected ACPI IPMI transfers.
> > >
> > > We can see the following crashes may occur:
> > > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > >    ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > >    acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > >    while it is parellel accessed in ipmi_flush_tx_msg() and
> > >    ipmi_msg_handler().
> > >
> > > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > > solve this issue.  Then tx_msg_lock is always held around complete()
> > > and tx_msg accesses.
> > > Calling smp_wmb() before setting msg_done flag so that messages
> > > completed due to flushing will not be handled as 'done' messages while
> > > their contents are not vaild.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > b37c189..527ee43 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)
> > >  	struct acpi_ipmi_msg *tx_msg, *temp;
> > >  	int count = HZ / 10;
> > >  	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > > +	unsigned long flags;
> > >
> > > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > >  	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > >  		/* wake up the sleep thread on the Tx msg */
> > >  		complete(&tx_msg->tx_complete);
> > >  	}
> > > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > >
> > >  	/* wait for about 100ms to flush the tx message list */
> > >  	while (count--) {
> > > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data)
> > >  			break;
> > >  		}
> > >  	}
> > > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >
> > >  	if (!msg_found) {
> > >  		dev_warn(&pnp_dev->dev,
> > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > >  			 msg->msgid);
> > > -		goto out_msg;
> > > +		goto out_lock;
> > >  	}
> > >
> > >  	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> > > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> > *user_msg_data)
> > >  	}
> > >  	tx_msg->rx_len = msg->msg.data_len;
> > >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > > +	/* tx_msg content must be valid before setting msg_done flag */
> > > +	smp_wmb();
> > 
> > That's suspicious.
> > 
> > If you need the write barrier here, you'll most likely need a read barrier
> > somewhere else.  Where's that?
> 
> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
> 
> So comment could be treated as 2 parts:
> 1. do we need a paired smp_rmb().
> 2. do we need a smp_wmb().
> 
> For 1.
> If we want a paired smp_rmb(), then it will appear in this function:
> 
> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> 187                 acpi_integer *value, int rem_time)
> 188 {
> 189         struct acpi_ipmi_buffer *buffer;
> 190 
> 191         /*
> 192          * value is also used as output parameter. It represents the response
> 193          * IPMI message returned by IPMI command.
> 194          */
> 195         buffer = (struct acpi_ipmi_buffer *)value;
> 196         if (!rem_time && !msg->msg_done) {
> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> 198                 return;
> 199         }
> 200         /*
> 201          * If the flag of msg_done is not set or the recv length is zero, it
> 202          * means that the IPMI command is not executed correctly.
> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> 204          */
> 205         if (!msg->msg_done || !msg->rx_len) {
> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> 207                 return;
> 208         }
> +         smp_rmb();
> 209         /*
> 210          * If the IPMI response message is obtained correctly, the status code
> 211          * will be ACPI_IPMI_OK
> 212          */
> 213         buffer->status = ACPI_IPMI_OK;
> 214         buffer->length = msg->rx_len;
> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> 216 }
> 
> If we don't then there will only be msg content not correctly read from msg->rx_data.
> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
> 
> Being without smp_rmb() is also OK in this case, since:
> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
> 
> So IMO, we needn't add the smp_rmb(), what do you think of this?
> 
> For 2.
> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.

Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
because each of them prevents only one flow of control from being
speculatively reordered, either by the CPU or by the compiler.  If only one
of them is used without the other, then the flow of control without the
barrier may be reordered in a way that will effectively cancel the effect of
the barrier in the second flow of control.

So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-25 12:06           ` Rafael J. Wysocki
  0 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 12:06 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> -stable according to the previous conversation.
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Thursday, July 25, 2013 7:38 AM
> > 
> > On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > > This patch fixes races caused by unprotected ACPI IPMI transfers.
> > >
> > > We can see the following crashes may occur:
> > > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > >    ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > >    acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > >    while it is parellel accessed in ipmi_flush_tx_msg() and
> > >    ipmi_msg_handler().
> > >
> > > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > > solve this issue.  Then tx_msg_lock is always held around complete()
> > > and tx_msg accesses.
> > > Calling smp_wmb() before setting msg_done flag so that messages
> > > completed due to flushing will not be handled as 'done' messages while
> > > their contents are not vaild.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > b37c189..527ee43 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)
> > >  	struct acpi_ipmi_msg *tx_msg, *temp;
> > >  	int count = HZ / 10;
> > >  	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > > +	unsigned long flags;
> > >
> > > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > >  	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > >  		/* wake up the sleep thread on the Tx msg */
> > >  		complete(&tx_msg->tx_complete);
> > >  	}
> > > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > >
> > >  	/* wait for about 100ms to flush the tx message list */
> > >  	while (count--) {
> > > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data)
> > >  			break;
> > >  		}
> > >  	}
> > > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >
> > >  	if (!msg_found) {
> > >  		dev_warn(&pnp_dev->dev,
> > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > >  			 msg->msgid);
> > > -		goto out_msg;
> > > +		goto out_lock;
> > >  	}
> > >
> > >  	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> > > static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> > *user_msg_data)
> > >  	}
> > >  	tx_msg->rx_len = msg->msg.data_len;
> > >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > > +	/* tx_msg content must be valid before setting msg_done flag */
> > > +	smp_wmb();
> > 
> > That's suspicious.
> > 
> > If you need the write barrier here, you'll most likely need a read barrier
> > somewhere else.  Where's that?
> 
> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
> 
> So comment could be treated as 2 parts:
> 1. do we need a paired smp_rmb().
> 2. do we need a smp_wmb().
> 
> For 1.
> If we want a paired smp_rmb(), then it will appear in this function:
> 
> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> 187                 acpi_integer *value, int rem_time)
> 188 {
> 189         struct acpi_ipmi_buffer *buffer;
> 190 
> 191         /*
> 192          * value is also used as output parameter. It represents the response
> 193          * IPMI message returned by IPMI command.
> 194          */
> 195         buffer = (struct acpi_ipmi_buffer *)value;
> 196         if (!rem_time && !msg->msg_done) {
> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> 198                 return;
> 199         }
> 200         /*
> 201          * If the flag of msg_done is not set or the recv length is zero, it
> 202          * means that the IPMI command is not executed correctly.
> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> 204          */
> 205         if (!msg->msg_done || !msg->rx_len) {
> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> 207                 return;
> 208         }
> +         smp_rmb();
> 209         /*
> 210          * If the IPMI response message is obtained correctly, the status code
> 211          * will be ACPI_IPMI_OK
> 212          */
> 213         buffer->status = ACPI_IPMI_OK;
> 214         buffer->length = msg->rx_len;
> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> 216 }
> 
> If we don't then there will only be msg content not correctly read from msg->rx_data.
> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
> 
> Being without smp_rmb() is also OK in this case, since:
> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
> 
> So IMO, we needn't add the smp_rmb(), what do you think of this?
> 
> For 2.
> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.

Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
because each of them prevents only one flow of control from being
speculatively reordered, either by the CPU or by the compiler.  If only one
of them is used without the other, then the flow of control without the
barrier may be reordered in a way that will effectively cancel the effect of
the barrier in the second flow of control.

So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-25 12:06           ` Rafael J. Wysocki
@ 2013-07-25 18:12             ` Corey Minyard
  -1 siblings, 0 replies; 99+ messages in thread
From: Corey Minyard @ 2013-07-25 18:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Zheng, Lv, Wysocki, Rafael J, Brown, Len, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
>> -stable according to the previous conversation.
>>
>>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
>>> Sent: Thursday, July 25, 2013 7:38 AM
>>>
>>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
>>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
>>>>
>>>> We can see the following crashes may occur:
>>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
>>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
>>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
>>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
>>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
>>>>     ipmi_msg_handler().
>>>>
>>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
>>>> solve this issue.  Then tx_msg_lock is always held around complete()
>>>> and tx_msg accesses.
>>>> Calling smp_wmb() before setting msg_done flag so that messages
>>>> completed due to flushing will not be handled as 'done' messages while
>>>> their contents are not vaild.
>>>>
>>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
>>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
>>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
>>>> ---
>>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
>>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
>>>> b37c189..527ee43 100644
>>>> --- a/drivers/acpi/acpi_ipmi.c
>>>> +++ b/drivers/acpi/acpi_ipmi.c
>>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
>>> acpi_ipmi_device *ipmi)
>>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
>>>>   	int count = HZ / 10;
>>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
>>>> +	unsigned long flags;
>>>>
>>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
>>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
>>>>   		/* wake up the sleep thread on the Tx msg */
>>>>   		complete(&tx_msg->tx_complete);
>>>>   	}
>>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
>>>>
>>>>   	/* wait for about 100ms to flush the tx message list */
>>>>   	while (count--) {
>>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
>>> ipmi_recv_msg *msg, void *user_msg_data)
>>>>   			break;
>>>>   		}
>>>>   	}
>>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>>>>
>>>>   	if (!msg_found) {
>>>>   		dev_warn(&pnp_dev->dev,
>>>>   			 "Unexpected response (msg id %ld) is returned.\n",
>>>>   			 msg->msgid);
>>>> -		goto out_msg;
>>>> +		goto out_lock;
>>>>   	}
>>>>
>>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
>>>> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
>>> *user_msg_data)
>>>>   	}
>>>>   	tx_msg->rx_len = msg->msg.data_len;
>>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
>>>> +	/* tx_msg content must be valid before setting msg_done flag */
>>>> +	smp_wmb();
>>> That's suspicious.
>>>
>>> If you need the write barrier here, you'll most likely need a read barrier
>>> somewhere else.  Where's that?
>> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
>>
>> So comment could be treated as 2 parts:
>> 1. do we need a paired smp_rmb().
>> 2. do we need a smp_wmb().
>>
>> For 1.
>> If we want a paired smp_rmb(), then it will appear in this function:
>>
>> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
>> 187                 acpi_integer *value, int rem_time)
>> 188 {
>> 189         struct acpi_ipmi_buffer *buffer;
>> 190
>> 191         /*
>> 192          * value is also used as output parameter. It represents the response
>> 193          * IPMI message returned by IPMI command.
>> 194          */
>> 195         buffer = (struct acpi_ipmi_buffer *)value;
>> 196         if (!rem_time && !msg->msg_done) {
>> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
>> 198                 return;
>> 199         }
>> 200         /*
>> 201          * If the flag of msg_done is not set or the recv length is zero, it
>> 202          * means that the IPMI command is not executed correctly.
>> 203          * The status code will be ACPI_IPMI_UNKNOWN.
>> 204          */
>> 205         if (!msg->msg_done || !msg->rx_len) {
>> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
>> 207                 return;
>> 208         }
>> +         smp_rmb();
>> 209         /*
>> 210          * If the IPMI response message is obtained correctly, the status code
>> 211          * will be ACPI_IPMI_OK
>> 212          */
>> 213         buffer->status = ACPI_IPMI_OK;
>> 214         buffer->length = msg->rx_len;
>> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
>> 216 }
>>
>> If we don't then there will only be msg content not correctly read from msg->rx_data.
>> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
>>
>> Being without smp_rmb() is also OK in this case, since:
>> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
>> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
>>
>> So IMO, we needn't add the smp_rmb(), what do you think of this?
>>
>> For 2.
>> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.
> Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> because each of them prevents only one flow of control from being
> speculatively reordered, either by the CPU or by the compiler.  If only one
> of them is used without the other, then the flow of control without the
> barrier may be reordered in a way that will effectively cancel the effect of
> the barrier in the second flow of control.
>
> So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.

If I understand this correctly, the problem would be if:

rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
                                         IPMI_TIMEOUT);

returns on a timeout, then checks msg_done and races with something 
setting msg_done.  If that is the case, you would need the smp_rmb() 
before checking msg_done.

However, the timeout above is unnecessary.  You are using 
ipmi_request_settime(), so you can set the timeout when the IPMI command 
fails and returns a failure message.  The driver guarantees a return 
message for each request.  Just remove the timeout from the completion, 
set the timeout and retries in the ipmi request, and the completion 
should handle the barrier issues.

Plus, from a quick glance at the code, it doesn't look like it will 
properly handle a situation where the timeout occurs and is handled then 
the response comes in later.

-corey

>
> Thanks,
> Rafael
>
>


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-25 18:12             ` Corey Minyard
  0 siblings, 0 replies; 99+ messages in thread
From: Corey Minyard @ 2013-07-25 18:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Zheng, Lv, Wysocki, Rafael J, Brown, Len, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
>> -stable according to the previous conversation.
>>
>>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
>>> Sent: Thursday, July 25, 2013 7:38 AM
>>>
>>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
>>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
>>>>
>>>> We can see the following crashes may occur:
>>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
>>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
>>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
>>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
>>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
>>>>     ipmi_msg_handler().
>>>>
>>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
>>>> solve this issue.  Then tx_msg_lock is always held around complete()
>>>> and tx_msg accesses.
>>>> Calling smp_wmb() before setting msg_done flag so that messages
>>>> completed due to flushing will not be handled as 'done' messages while
>>>> their contents are not vaild.
>>>>
>>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
>>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
>>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
>>>> ---
>>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
>>>>   1 file changed, 8 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
>>>> b37c189..527ee43 100644
>>>> --- a/drivers/acpi/acpi_ipmi.c
>>>> +++ b/drivers/acpi/acpi_ipmi.c
>>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
>>> acpi_ipmi_device *ipmi)
>>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
>>>>   	int count = HZ / 10;
>>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
>>>> +	unsigned long flags;
>>>>
>>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
>>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
>>>>   		/* wake up the sleep thread on the Tx msg */
>>>>   		complete(&tx_msg->tx_complete);
>>>>   	}
>>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
>>>>
>>>>   	/* wait for about 100ms to flush the tx message list */
>>>>   	while (count--) {
>>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
>>> ipmi_recv_msg *msg, void *user_msg_data)
>>>>   			break;
>>>>   		}
>>>>   	}
>>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>>>>
>>>>   	if (!msg_found) {
>>>>   		dev_warn(&pnp_dev->dev,
>>>>   			 "Unexpected response (msg id %ld) is returned.\n",
>>>>   			 msg->msgid);
>>>> -		goto out_msg;
>>>> +		goto out_lock;
>>>>   	}
>>>>
>>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
>>>> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
>>> *user_msg_data)
>>>>   	}
>>>>   	tx_msg->rx_len = msg->msg.data_len;
>>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
>>>> +	/* tx_msg content must be valid before setting msg_done flag */
>>>> +	smp_wmb();
>>> That's suspicious.
>>>
>>> If you need the write barrier here, you'll most likely need a read barrier
>>> somewhere else.  Where's that?
>> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
>>
>> So comment could be treated as 2 parts:
>> 1. do we need a paired smp_rmb().
>> 2. do we need a smp_wmb().
>>
>> For 1.
>> If we want a paired smp_rmb(), then it will appear in this function:
>>
>> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
>> 187                 acpi_integer *value, int rem_time)
>> 188 {
>> 189         struct acpi_ipmi_buffer *buffer;
>> 190
>> 191         /*
>> 192          * value is also used as output parameter. It represents the response
>> 193          * IPMI message returned by IPMI command.
>> 194          */
>> 195         buffer = (struct acpi_ipmi_buffer *)value;
>> 196         if (!rem_time && !msg->msg_done) {
>> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
>> 198                 return;
>> 199         }
>> 200         /*
>> 201          * If the flag of msg_done is not set or the recv length is zero, it
>> 202          * means that the IPMI command is not executed correctly.
>> 203          * The status code will be ACPI_IPMI_UNKNOWN.
>> 204          */
>> 205         if (!msg->msg_done || !msg->rx_len) {
>> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
>> 207                 return;
>> 208         }
>> +         smp_rmb();
>> 209         /*
>> 210          * If the IPMI response message is obtained correctly, the status code
>> 211          * will be ACPI_IPMI_OK
>> 212          */
>> 213         buffer->status = ACPI_IPMI_OK;
>> 214         buffer->length = msg->rx_len;
>> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
>> 216 }
>>
>> If we don't then there will only be msg content not correctly read from msg->rx_data.
>> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
>>
>> Being without smp_rmb() is also OK in this case, since:
>> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
>> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
>>
>> So IMO, we needn't add the smp_rmb(), what do you think of this?
>>
>> For 2.
>> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.
> Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> because each of them prevents only one flow of control from being
> speculatively reordered, either by the CPU or by the compiler.  If only one
> of them is used without the other, then the flow of control without the
> barrier may be reordered in a way that will effectively cancel the effect of
> the barrier in the second flow of control.
>
> So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.

If I understand this correctly, the problem would be if:

rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
                                         IPMI_TIMEOUT);

returns on a timeout, then checks msg_done and races with something 
setting msg_done.  If that is the case, you would need the smp_rmb() 
before checking msg_done.

However, the timeout above is unnecessary.  You are using 
ipmi_request_settime(), so you can set the timeout when the IPMI command 
fails and returns a failure message.  The driver guarantees a return 
message for each request.  Just remove the timeout from the completion, 
set the timeout and retries in the ipmi request, and the completion 
should handle the barrier issues.

Plus, from a quick glance at the code, it doesn't look like it will 
properly handle a situation where the timeout occurs and is handled then 
the response comes in later.

-corey

>
> Thanks,
> Rafael
>
>


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-25 18:12             ` Corey Minyard
@ 2013-07-25 19:32               ` Rafael J. Wysocki
  -1 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 19:32 UTC (permalink / raw)
  To: minyard, Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Zhao, Yakui, linux-kernel,
	linux-acpi, openipmi-developer

On Thursday, July 25, 2013 01:12:38 PM Corey Minyard wrote:
> On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> >> -stable according to the previous conversation.
> >>
> >>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> >>> Sent: Thursday, July 25, 2013 7:38 AM
> >>>
> >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> >>>>
> >>>> We can see the following crashes may occur:
> >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
> >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
> >>>>     ipmi_msg_handler().
> >>>>
> >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> >>>> solve this issue.  Then tx_msg_lock is always held around complete()
> >>>> and tx_msg accesses.
> >>>> Calling smp_wmb() before setting msg_done flag so that messages
> >>>> completed due to flushing will not be handled as 'done' messages while
> >>>> their contents are not vaild.
> >>>>
> >>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> >>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
> >>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
> >>>> ---
> >>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> >>>>   1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> >>>> b37c189..527ee43 100644
> >>>> --- a/drivers/acpi/acpi_ipmi.c
> >>>> +++ b/drivers/acpi/acpi_ipmi.c
> >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> >>> acpi_ipmi_device *ipmi)
> >>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
> >>>>   	int count = HZ / 10;
> >>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >>>> +	unsigned long flags;
> >>>>
> >>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >>>>   		/* wake up the sleep thread on the Tx msg */
> >>>>   		complete(&tx_msg->tx_complete);
> >>>>   	}
> >>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >>>>
> >>>>   	/* wait for about 100ms to flush the tx message list */
> >>>>   	while (count--) {
> >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> >>> ipmi_recv_msg *msg, void *user_msg_data)
> >>>>   			break;
> >>>>   		}
> >>>>   	}
> >>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >>>>
> >>>>   	if (!msg_found) {
> >>>>   		dev_warn(&pnp_dev->dev,
> >>>>   			 "Unexpected response (msg id %ld) is returned.\n",
> >>>>   			 msg->msgid);
> >>>> -		goto out_msg;
> >>>> +		goto out_lock;
> >>>>   	}
> >>>>
> >>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> >>>> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> >>> *user_msg_data)
> >>>>   	}
> >>>>   	tx_msg->rx_len = msg->msg.data_len;
> >>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> >>>> +	/* tx_msg content must be valid before setting msg_done flag */
> >>>> +	smp_wmb();
> >>> That's suspicious.
> >>>
> >>> If you need the write barrier here, you'll most likely need a read barrier
> >>> somewhere else.  Where's that?
> >> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
> >>
> >> So comment could be treated as 2 parts:
> >> 1. do we need a paired smp_rmb().
> >> 2. do we need a smp_wmb().
> >>
> >> For 1.
> >> If we want a paired smp_rmb(), then it will appear in this function:
> >>
> >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> >> 187                 acpi_integer *value, int rem_time)
> >> 188 {
> >> 189         struct acpi_ipmi_buffer *buffer;
> >> 190
> >> 191         /*
> >> 192          * value is also used as output parameter. It represents the response
> >> 193          * IPMI message returned by IPMI command.
> >> 194          */
> >> 195         buffer = (struct acpi_ipmi_buffer *)value;
> >> 196         if (!rem_time && !msg->msg_done) {
> >> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> >> 198                 return;
> >> 199         }
> >> 200         /*
> >> 201          * If the flag of msg_done is not set or the recv length is zero, it
> >> 202          * means that the IPMI command is not executed correctly.
> >> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> >> 204          */
> >> 205         if (!msg->msg_done || !msg->rx_len) {
> >> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> >> 207                 return;
> >> 208         }
> >> +         smp_rmb();
> >> 209         /*
> >> 210          * If the IPMI response message is obtained correctly, the status code
> >> 211          * will be ACPI_IPMI_OK
> >> 212          */
> >> 213         buffer->status = ACPI_IPMI_OK;
> >> 214         buffer->length = msg->rx_len;
> >> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> >> 216 }
> >>
> >> If we don't then there will only be msg content not correctly read from msg->rx_data.
> >> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
> >>
> >> Being without smp_rmb() is also OK in this case, since:
> >> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
> >> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
> >>
> >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> >>
> >> For 2.
> >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.
> > Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> > because each of them prevents only one flow of control from being
> > speculatively reordered, either by the CPU or by the compiler.  If only one
> > of them is used without the other, then the flow of control without the
> > barrier may be reordered in a way that will effectively cancel the effect of
> > the barrier in the second flow of control.
> >
> > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.
> 
> If I understand this correctly, the problem would be if:
> 
> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>                                          IPMI_TIMEOUT);
> 
> returns on a timeout, then checks msg_done and races with something 
> setting msg_done.  If that is the case, you would need the smp_rmb() 
> before checking msg_done.

I believe so.

> However, the timeout above is unnecessary.  You are using 
> ipmi_request_settime(), so you can set the timeout when the IPMI command 
> fails and returns a failure message.  The driver guarantees a return 
> message for each request.  Just remove the timeout from the completion, 
> set the timeout and retries in the ipmi request, and the completion 
> should handle the barrier issues.

Good point.

> Plus, from a quick glance at the code, it doesn't look like it will 
> properly handle a situation where the timeout occurs and is handled then 
> the response comes in later.

Lv, what about this?

Rafael


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-25 19:32               ` Rafael J. Wysocki
  0 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 19:32 UTC (permalink / raw)
  To: minyard, Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Zhao, Yakui, linux-kernel,
	linux-acpi, openipmi-developer

On Thursday, July 25, 2013 01:12:38 PM Corey Minyard wrote:
> On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> >> -stable according to the previous conversation.
> >>
> >>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> >>> Sent: Thursday, July 25, 2013 7:38 AM
> >>>
> >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> >>>>
> >>>> We can see the following crashes may occur:
> >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
> >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
> >>>>     ipmi_msg_handler().
> >>>>
> >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> >>>> solve this issue.  Then tx_msg_lock is always held around complete()
> >>>> and tx_msg accesses.
> >>>> Calling smp_wmb() before setting msg_done flag so that messages
> >>>> completed due to flushing will not be handled as 'done' messages while
> >>>> their contents are not vaild.
> >>>>
> >>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> >>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
> >>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
> >>>> ---
> >>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> >>>>   1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> >>>> b37c189..527ee43 100644
> >>>> --- a/drivers/acpi/acpi_ipmi.c
> >>>> +++ b/drivers/acpi/acpi_ipmi.c
> >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> >>> acpi_ipmi_device *ipmi)
> >>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
> >>>>   	int count = HZ / 10;
> >>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >>>> +	unsigned long flags;
> >>>>
> >>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >>>>   		/* wake up the sleep thread on the Tx msg */
> >>>>   		complete(&tx_msg->tx_complete);
> >>>>   	}
> >>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >>>>
> >>>>   	/* wait for about 100ms to flush the tx message list */
> >>>>   	while (count--) {
> >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> >>> ipmi_recv_msg *msg, void *user_msg_data)
> >>>>   			break;
> >>>>   		}
> >>>>   	}
> >>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >>>>
> >>>>   	if (!msg_found) {
> >>>>   		dev_warn(&pnp_dev->dev,
> >>>>   			 "Unexpected response (msg id %ld) is returned.\n",
> >>>>   			 msg->msgid);
> >>>> -		goto out_msg;
> >>>> +		goto out_lock;
> >>>>   	}
> >>>>
> >>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10 +288,14 @@
> >>>> static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> >>> *user_msg_data)
> >>>>   	}
> >>>>   	tx_msg->rx_len = msg->msg.data_len;
> >>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> >>>> +	/* tx_msg content must be valid before setting msg_done flag */
> >>>> +	smp_wmb();
> >>> That's suspicious.
> >>>
> >>> If you need the write barrier here, you'll most likely need a read barrier
> >>> somewhere else.  Where's that?
> >> It might depend on whether the content written before the smp_wmb() is used or not by the other side codes under the condition set after the smp_wmb().
> >>
> >> So comment could be treated as 2 parts:
> >> 1. do we need a paired smp_rmb().
> >> 2. do we need a smp_wmb().
> >>
> >> For 1.
> >> If we want a paired smp_rmb(), then it will appear in this function:
> >>
> >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> >> 187                 acpi_integer *value, int rem_time)
> >> 188 {
> >> 189         struct acpi_ipmi_buffer *buffer;
> >> 190
> >> 191         /*
> >> 192          * value is also used as output parameter. It represents the response
> >> 193          * IPMI message returned by IPMI command.
> >> 194          */
> >> 195         buffer = (struct acpi_ipmi_buffer *)value;
> >> 196         if (!rem_time && !msg->msg_done) {
> >> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> >> 198                 return;
> >> 199         }
> >> 200         /*
> >> 201          * If the flag of msg_done is not set or the recv length is zero, it
> >> 202          * means that the IPMI command is not executed correctly.
> >> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> >> 204          */
> >> 205         if (!msg->msg_done || !msg->rx_len) {
> >> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> >> 207                 return;
> >> 208         }
> >> +         smp_rmb();
> >> 209         /*
> >> 210          * If the IPMI response message is obtained correctly, the status code
> >> 211          * will be ACPI_IPMI_OK
> >> 212          */
> >> 213         buffer->status = ACPI_IPMI_OK;
> >> 214         buffer->length = msg->rx_len;
> >> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> >> 216 }
> >>
> >> If we don't then there will only be msg content not correctly read from msg->rx_data.
> >> Note that the rx_len is 0 during initialization and will never exceed the sizeof(buffer->data), so the read is safe.
> >>
> >> Being without smp_rmb() is also OK in this case, since:
> >> 1. buffer->data will never be used when buffer->status is not ACPI_IPMI_OK and
> >> 2. the smp_rmb()/smp_wmb() added in this patch will be deleted in [PATCH 07].
> >>
> >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> >>
> >> For 2.
> >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes running on other thread in the acpi_format_ipmi_response() may read wrong msg->rx_data (a timeout triggers this function, but when acpi_format_ipmi_response() is entered, the msg->msg_done flag could be seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in this quick fix.
> > Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> > because each of them prevents only one flow of control from being
> > speculatively reordered, either by the CPU or by the compiler.  If only one
> > of them is used without the other, then the flow of control without the
> > barrier may be reordered in a way that will effectively cancel the effect of
> > the barrier in the second flow of control.
> >
> > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them at all.
> 
> If I understand this correctly, the problem would be if:
> 
> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>                                          IPMI_TIMEOUT);
> 
> returns on a timeout, then checks msg_done and races with something 
> setting msg_done.  If that is the case, you would need the smp_rmb() 
> before checking msg_done.

I believe so.

> However, the timeout above is unnecessary.  You are using 
> ipmi_request_settime(), so you can set the timeout when the IPMI command 
> fails and returns a failure message.  The driver guarantees a return 
> message for each request.  Just remove the timeout from the completion, 
> set the timeout and retries in the ipmi request, and the completion 
> should handle the barrier issues.

Good point.

> Plus, from a quick glance at the code, it doesn't look like it will 
> properly handle a situation where the timeout occurs and is handled then 
> the response comes in later.

Lv, what about this?

Rafael


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-23  8:09     ` Lv Zheng
  (?)
@ 2013-07-25 20:27     ` Rafael J. Wysocki
  2013-07-26  0:47         ` Zheng, Lv
  -1 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 20:27 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> This patch adds reference couting for ACPI operation region handlers to fix
> races caused by the ACPICA address space callback invocations.
> 
> ACPICA address space callback invocation is not suitable for Linux
> CONFIG_MODULE=y execution environment.  This patch tries to protect the
> address space callbacks by invoking them under a module safe environment.
> The IPMI address space handler is also upgraded in this patch.
> The acpi_unregister_region() is designed to meet the following
> requirements:
> 1. It acts as a barrier for operation region callbacks - no callback will
>    happen after acpi_unregister_region().
> 2. acpi_unregister_region() is safe to be called in moudle->exit()
>    functions.
> Using reference counting rather than module referencing allows
> such benefits to be achieved even when acpi_unregister_region() is called
> in the environments other than module->exit().
> The header file of include/acpi/acpi_bus.h should contain the declarations
> that have references to some ACPICA defined types.
> 
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> Reviewed-by: Huang Ying <ying.huang@intel.com>
> ---
>  drivers/acpi/acpi_ipmi.c |   16 ++--
>  drivers/acpi/osl.c       |  224 ++++++++++++++++++++++++++++++++++++++++++++++
>  include/acpi/acpi_bus.h  |    5 ++
>  3 files changed, 235 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 5f8f495..2a09156 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -539,20 +539,18 @@ out_ref:
>  static int __init acpi_ipmi_init(void)
>  {
>  	int result = 0;
> -	acpi_status status;
>  
>  	if (acpi_disabled)
>  		return result;
>  
>  	mutex_init(&driver_data.ipmi_lock);
>  
> -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> -						    ACPI_ADR_SPACE_IPMI,
> -						    &acpi_ipmi_space_handler,
> -						    NULL, NULL);
> -	if (ACPI_FAILURE(status)) {
> +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> +				      &acpi_ipmi_space_handler,
> +				      NULL, NULL);
> +	if (result) {
>  		pr_warn("Can't register IPMI opregion space handle\n");
> -		return -EINVAL;
> +		return result;
>  	}
>  
>  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
>  	}
>  	mutex_unlock(&driver_data.ipmi_lock);
>  
> -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> -					  ACPI_ADR_SPACE_IPMI,
> -					  &acpi_ipmi_space_handler);
> +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
>  }
>  
>  module_init(acpi_ipmi_init);
> diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c
> index 6ab2c35..8398e51 100644
> --- a/drivers/acpi/osl.c
> +++ b/drivers/acpi/osl.c
> @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
>  static struct workqueue_struct *kacpi_notify_wq;
>  static struct workqueue_struct *kacpi_hotplug_wq;
>  
> +struct acpi_region {
> +	unsigned long flags;
> +#define ACPI_REGION_DEFAULT		0x01
> +#define ACPI_REGION_INSTALLED		0x02
> +#define ACPI_REGION_REGISTERED		0x04
> +#define ACPI_REGION_UNREGISTERING	0x08
> +#define ACPI_REGION_INSTALLING		0x10

What about (1UL << 1), (1UL << 2) etc.?

Also please remove the #defines out of the struct definition.

> +	/*
> +	 * NOTE: Upgrading All Region Handlers
> +	 * This flag is only used during the period where not all of the
> +	 * region handers are upgraded to the new interfaces.
> +	 */
> +#define ACPI_REGION_MANAGED		0x80
> +	acpi_adr_space_handler handler;
> +	acpi_adr_space_setup setup;
> +	void *context;
> +	/* Invoking references */
> +	atomic_t refcnt;

Actually, why don't you use krefs?

> +};
> +
> +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS] = {
> +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> +		.flags = ACPI_REGION_DEFAULT,
> +	},
> +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> +		.flags = ACPI_REGION_DEFAULT,
> +	},
> +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> +		.flags = ACPI_REGION_DEFAULT,
> +	},
> +	[ACPI_ADR_SPACE_IPMI] = {
> +		.flags = ACPI_REGION_MANAGED,
> +	},
> +};
> +static DEFINE_MUTEX(acpi_mutex_region);
> +
>  /*
>   * This list of permanent mappings is for memory that may be accessed from
>   * interrupt context, where we can't do the ioremap().
> @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle, u32 type, void *context,
>  		kfree(hp_work);
>  }
>  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> +
> +static bool acpi_region_managed(struct acpi_region *rgn)
> +{
> +	/*
> +	 * NOTE: Default and Managed
> +	 * We only need to avoid region management on the regions managed
> +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need additional
> +	 * check as many operation region handlers are not upgraded, so
> +	 * only those known to be safe are managed (ACPI_REGION_MANAGED).
> +	 */
> +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> +	       (rgn->flags & ACPI_REGION_MANAGED);
> +}
> +
> +static bool acpi_region_callable(struct acpi_region *rgn)
> +{
> +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> +	       !(rgn->flags & ACPI_REGION_UNREGISTERING);
> +}
> +
> +static acpi_status
> +acpi_region_default_handler(u32 function,
> +			    acpi_physical_address address,
> +			    u32 bit_width, u64 *value,
> +			    void *handler_context, void *region_context)
> +{
> +	acpi_adr_space_handler handler;
> +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> +	void *context;
> +	acpi_status status = AE_NOT_EXIST;
> +
> +	mutex_lock(&acpi_mutex_region);
> +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> +		mutex_unlock(&acpi_mutex_region);
> +		return status;
> +	}
> +
> +	atomic_inc(&rgn->refcnt);
> +	handler = rgn->handler;
> +	context = rgn->context;
> +	mutex_unlock(&acpi_mutex_region);
> +
> +	status = handler(function, address, bit_width, value, context,
> +			 region_context);

Why don't we call the handler under the mutex?

What exactly prevents context from becoming NULL before the call above?

> +	atomic_dec(&rgn->refcnt);
> +
> +	return status;
> +}
> +
> +static acpi_status
> +acpi_region_default_setup(acpi_handle handle, u32 function,
> +			  void *handler_context, void **region_context)
> +{
> +	acpi_adr_space_setup setup;
> +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> +	void *context;
> +	acpi_status status = AE_OK;
> +
> +	mutex_lock(&acpi_mutex_region);
> +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> +		mutex_unlock(&acpi_mutex_region);
> +		return status;
> +	}
> +
> +	atomic_inc(&rgn->refcnt);
> +	setup = rgn->setup;
> +	context = rgn->context;
> +	mutex_unlock(&acpi_mutex_region);
> +
> +	status = setup(handle, function, context, region_context);

Can setup drop rgn->refcnt ?

> +	atomic_dec(&rgn->refcnt);
> +
> +	return status;
> +}
> +
> +static int __acpi_install_region(struct acpi_region *rgn,
> +				 acpi_adr_space_type space_id)
> +{
> +	int res = 0;
> +	acpi_status status;
> +	int installing = 0;
> +
> +	mutex_lock(&acpi_mutex_region);
> +	if (rgn->flags & ACPI_REGION_INSTALLED)
> +		goto out_lock;
> +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> +		res = -EBUSY;
> +		goto out_lock;
> +	}
> +
> +	installing = 1;
> +	rgn->flags |= ACPI_REGION_INSTALLING;
> +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT, space_id,
> +						    acpi_region_default_handler,
> +						    acpi_region_default_setup,
> +						    rgn);
> +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> +	if (ACPI_FAILURE(status))
> +		res = -EINVAL;
> +	else
> +		rgn->flags |= ACPI_REGION_INSTALLED;
> +
> +out_lock:
> +	mutex_unlock(&acpi_mutex_region);
> +	if (installing) {
> +		if (res)
> +			pr_err("Failed to install region %d\n", space_id);
> +		else
> +			pr_info("Region %d installed\n", space_id);
> +	}
> +	return res;
> +}
> +
> +int acpi_register_region(acpi_adr_space_type space_id,
> +			 acpi_adr_space_handler handler,
> +			 acpi_adr_space_setup setup, void *context)
> +{
> +	int res;
> +	struct acpi_region *rgn;
> +
> +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> +		return -EINVAL;
> +
> +	rgn = &acpi_regions[space_id];
> +	if (!acpi_region_managed(rgn))
> +		return -EINVAL;
> +
> +	res = __acpi_install_region(rgn, space_id);
> +	if (res)
> +		return res;
> +
> +	mutex_lock(&acpi_mutex_region);
> +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> +		mutex_unlock(&acpi_mutex_region);
> +		return -EBUSY;
> +	}
> +
> +	rgn->handler = handler;
> +	rgn->setup = setup;
> +	rgn->context = context;
> +	rgn->flags |= ACPI_REGION_REGISTERED;
> +	atomic_set(&rgn->refcnt, 1);
> +	mutex_unlock(&acpi_mutex_region);
> +
> +	pr_info("Region %d registered\n", space_id);
> +
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(acpi_register_region);
> +
> +void acpi_unregister_region(acpi_adr_space_type space_id)
> +{
> +	struct acpi_region *rgn;
> +
> +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> +		return;
> +
> +	rgn = &acpi_regions[space_id];
> +	if (!acpi_region_managed(rgn))
> +		return;
> +
> +	mutex_lock(&acpi_mutex_region);
> +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> +		mutex_unlock(&acpi_mutex_region);
> +		return;
> +	}
> +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> +		mutex_unlock(&acpi_mutex_region);
> +		return;

What about

	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
		mutex_unlock(&acpi_mutex_region);
		return;
	}

> +	}
> +
> +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> +	rgn->handler = NULL;
> +	rgn->setup = NULL;
> +	rgn->context = NULL;
> +	mutex_unlock(&acpi_mutex_region);
> +
> +	while (atomic_read(&rgn->refcnt) > 1)
> +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));

Wouldn't it be better to use a wait queue here?

> +	atomic_dec(&rgn->refcnt);
> +
> +	mutex_lock(&acpi_mutex_region);
> +	rgn->flags &= ~(ACPI_REGION_REGISTERED | ACPI_REGION_UNREGISTERING);
> +	mutex_unlock(&acpi_mutex_region);
> +
> +	pr_info("Region %d unregistered\n", space_id);
> +}
> +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h
> index a2c2fbb..15fad0d 100644
> --- a/include/acpi/acpi_bus.h
> +++ b/include/acpi/acpi_bus.h
> @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void *bus) { return 0; }
>  
>  #endif				/* CONFIG_ACPI */
>  
> +int acpi_register_region(acpi_adr_space_type space_id,
> +			 acpi_adr_space_handler handler,
> +			 acpi_adr_space_setup setup, void *context);
> +void acpi_unregister_region(acpi_adr_space_type space_id);
> +
>  #endif /*__ACPI_BUS_H__*/

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-23  8:09     ` Lv Zheng
  (?)
  (?)
@ 2013-07-25 21:29     ` Rafael J. Wysocki
  2013-07-26  1:54         ` Zheng, Lv
  -1 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 21:29 UTC (permalink / raw)
  To: Lv Zheng; +Cc: Rafael J. Wysocki, Len Brown, linux-kernel, linux-acpi

On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> This patch adds reference couting for ACPI operation region handlers to fix
> races caused by the ACPICA address space callback invocations.
> 
> ACPICA address space callback invocation is not suitable for Linux
> CONFIG_MODULE=y execution environment.

Actually, can you please explain to me what *exactly* the problem is?

Rafael


^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
  2013-07-23  8:09     ` Lv Zheng
  (?)
@ 2013-07-25 21:59     ` Rafael J. Wysocki
  2013-07-26  1:17         ` Zheng, Lv
  -1 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 21:59 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

On Tuesday, July 23, 2013 04:09:26 PM Lv Zheng wrote:
> This patch uses reference counting to fix the race caused by the
> unprotected ACPI IPMI user.
> 
> As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler()
> can happen before setting user_interface to NULL and codes after the check
> in acpi_ipmi_space_handler() can happen after user_interface becoming NULL,
> then the on-going acpi_ipmi_space_handler() still can pass an invalid
> acpi_ipmi_device->user_interface to ipmi_request_settime().  Such race
> condition is not allowed by the IPMI layer's API design as crash will
> happen in ipmi_request_settime().
> In IPMI layer, smi_gone()/new_smi() callbacks are protected by
> smi_watchers_mutex, thus their invocations are serialized.  But as a new
> smi can re-use the freed intf_num, it requires that the callback
> implementation must not use intf_num as an identification mean or it must
> ensure all references to the previous smi are all dropped before exiting
> smi_gone() callback.  In case of acpi_ipmi module, this means
> ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are completed
> before exiting ipmi_flush_tx_msg().
> 
> This patch follows ipmi_devintf.c design:
> 1. Invoking ipmi_destroy_user() after the reference count of
>    acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
>    rule on ipmi_destroy_user() and ipmi_request_settime().
> 2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
>    this acpi_ipmi_device are all freed, this can be used to implement the
>    new flushing mechanism.  Note complete() must be retried so that the
>    on-going tx_msg won't block flushing at the point to add tx_msg into
>    tx_msg_list where reference of acpi_ipmi_device is held.  This matches
>    the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
> 3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
>    the list so that no new tx_msg can be created after entering flushing
>    process.
> 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.
> 
> The forthcoming IPMI operation region handler installation changes also
> requires acpi_ipmi_device be handled in the reference counting style.
> 
> Authorship is also updated due to this design change.
> 
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> Cc: Zhao Yakui <yakui.zhao@intel.com>
> Reviewed-by: Huang Ying <ying.huang@intel.com>
> ---
>  drivers/acpi/acpi_ipmi.c |  249 +++++++++++++++++++++++++++-------------------
>  1 file changed, 149 insertions(+), 100 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 527ee43..cbf25e0 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -1,8 +1,9 @@
>  /*
>   *  acpi_ipmi.c - ACPI IPMI opregion
>   *
> - *  Copyright (C) 2010 Intel Corporation
> - *  Copyright (C) 2010 Zhao Yakui <yakui.zhao@intel.com>
> + *  Copyright (C) 2010, 2013 Intel Corporation
> + *    Author: Zhao Yakui <yakui.zhao@intel.com>
> + *            Lv Zheng <lv.zheng@intel.com>
>   *
>   * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>   *
> @@ -67,6 +68,7 @@ struct acpi_ipmi_device {
>  	long curr_msgid;
>  	unsigned long flags;
>  	struct ipmi_smi_info smi_data;
> +	atomic_t refcnt;

Can you use a kref instead?

>  };
>  
>  struct ipmi_driver_data {
> @@ -107,8 +109,8 @@ struct acpi_ipmi_buffer {
>  static void ipmi_register_bmc(int iface, struct device *dev);
>  static void ipmi_bmc_gone(int iface);
>  static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
> -static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device);
> -static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
> +static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
> +static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
>  
>  static struct ipmi_driver_data driver_data = {
>  	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
> @@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
>  	},
>  };
>  
> +static struct acpi_ipmi_device *
> +ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
> +{
> +	struct acpi_ipmi_device *ipmi_device;
> +	int err;
> +	ipmi_user_t user;
> +
> +	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> +	if (!ipmi_device)
> +		return NULL;
> +
> +	atomic_set(&ipmi_device->refcnt, 1);
> +	INIT_LIST_HEAD(&ipmi_device->head);
> +	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> +	spin_lock_init(&ipmi_device->tx_msg_lock);
> +
> +	ipmi_device->handle = handle;
> +	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> +	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
> +	ipmi_device->ipmi_ifnum = iface;
> +
> +	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> +			       ipmi_device, &user);
> +	if (err) {
> +		put_device(smi_data->dev);
> +		kfree(ipmi_device);
> +		return NULL;
> +	}
> +	ipmi_device->user_interface = user;
> +	ipmi_install_space_handler(ipmi_device);
> +
> +	return ipmi_device;
> +}
> +
> +static struct acpi_ipmi_device *
> +acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
> +{
> +	if (ipmi_device)
> +		atomic_inc(&ipmi_device->refcnt);
> +	return ipmi_device;
> +}
> +
> +static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
> +{
> +	ipmi_remove_space_handler(ipmi_device);
> +	ipmi_destroy_user(ipmi_device->user_interface);
> +	put_device(ipmi_device->smi_data.dev);
> +	kfree(ipmi_device);
> +}
> +
> +static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
> +{
> +	if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
> +		ipmi_dev_release(ipmi_device);
> +}
> +
> +static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
> +{
> +	int dev_found = 0;
> +	struct acpi_ipmi_device *ipmi_device;
> +

Why don't you do

	struct acpi_ipmi_device *ipmi_device, *ret = NULL;

and then ->

> +	mutex_lock(&driver_data.ipmi_lock);
> +	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> +		if (ipmi_device->ipmi_ifnum == iface) {

->			ret = ipmi_device; ->

> +			dev_found = 1;
> +			acpi_ipmi_dev_get(ipmi_device);
> +			break;
> +		}
> +	}
> +	mutex_unlock(&driver_data.ipmi_lock);
> +
> +	return dev_found ? ipmi_device : NULL;

->	return ret;

> +}
> +
>  static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
>  {
>  	struct acpi_ipmi_msg *ipmi_msg;
> @@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
>  static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
>  {
>  	struct acpi_ipmi_msg *tx_msg, *temp;
> -	int count = HZ / 10;
> -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
>  	unsigned long flags;
>  
> -	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> -	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> -		/* wake up the sleep thread on the Tx msg */
> -		complete(&tx_msg->tx_complete);
> -	}
> -	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> -
> -	/* wait for about 100ms to flush the tx message list */
> -	while (count--) {
> -		if (list_empty(&ipmi->tx_msg_list))
> -			break;
> -		schedule_timeout(1);
> +	/*
> +	 * NOTE: Synchronous Flushing
> +	 * Wait until refnct dropping to 1 - no other users unless this
> +	 * context.  This function should always be called before
> +	 * acpi_ipmi_device destruction.
> +	 */
> +	while (atomic_read(&ipmi->refcnt) > 1) {

Isn't this racy?  What if we see that the refcount is 1 and break the loop,
but someone else bumps up the refcount at the same time?

> +		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> +		list_for_each_entry_safe(tx_msg, temp,
> +					 &ipmi->tx_msg_list, head) {
> +			/* wake up the sleep thread on the Tx msg */
> +			complete(&tx_msg->tx_complete);
> +		}
> +		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> +		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
>  	}
> -	if (!list_empty(&ipmi->tx_msg_list))
> -		dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
>  }
>  
>  static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> @@ -304,22 +379,26 @@ static void ipmi_register_bmc(int iface, struct device *dev)
>  {
>  	struct acpi_ipmi_device *ipmi_device, *temp;
>  	struct pnp_dev *pnp_dev;
> -	ipmi_user_t		user;
>  	int err;
>  	struct ipmi_smi_info smi_data;
>  	acpi_handle handle;
>  
>  	err = ipmi_get_smi_info(iface, &smi_data);
> -
>  	if (err)
>  		return;
>  
> -	if (smi_data.addr_src != SI_ACPI) {
> -		put_device(smi_data.dev);
> -		return;
> -	}
> -
> +	if (smi_data.addr_src != SI_ACPI)
> +		goto err_ref;
>  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> +	if (!handle)
> +		goto err_ref;
> +	pnp_dev = to_pnp_dev(smi_data.dev);
> +
> +	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> +	if (!ipmi_device) {
> +		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> +		goto err_ref;
> +	}
>  
>  	mutex_lock(&driver_data.ipmi_lock);
>  	list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
> @@ -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device *dev)
>  		 * to the device list, don't add it again.
>  		 */
>  		if (temp->handle == handle)
> -			goto out;
> +			goto err_lock;
>  	}
>  
> -	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> -
> -	if (!ipmi_device)
> -		goto out;
> -
> -	pnp_dev = to_pnp_dev(smi_data.dev);
> -	ipmi_device->handle = handle;
> -	ipmi_device->pnp_dev = pnp_dev;
> -
> -	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> -					ipmi_device, &user);
> -	if (err) {
> -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> -		kfree(ipmi_device);
> -		goto out;
> -	}
> -	acpi_add_ipmi_device(ipmi_device);
> -	ipmi_device->user_interface = user;
> -	ipmi_device->ipmi_ifnum = iface;
> +	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
>  	mutex_unlock(&driver_data.ipmi_lock);
> -	memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct ipmi_smi_info));
> +	put_device(smi_data.dev);
>  	return;
>  
> -out:
> +err_lock:
>  	mutex_unlock(&driver_data.ipmi_lock);
> +	ipmi_dev_release(ipmi_device);
> +err_ref:
>  	put_device(smi_data.dev);
>  	return;
>  }
>  
>  static void ipmi_bmc_gone(int iface)
>  {
> -	struct acpi_ipmi_device *ipmi_device, *temp;
> +	int dev_found = 0;
> +	struct acpi_ipmi_device *ipmi_device;
>  
>  	mutex_lock(&driver_data.ipmi_lock);
> -	list_for_each_entry_safe(ipmi_device, temp,
> -				&driver_data.ipmi_devices, head) {
> -		if (ipmi_device->ipmi_ifnum != iface)
> -			continue;
> -
> -		acpi_remove_ipmi_device(ipmi_device);
> -		put_device(ipmi_device->smi_data.dev);
> -		kfree(ipmi_device);
> -		break;
> +	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> +		if (ipmi_device->ipmi_ifnum == iface) {
> +			dev_found = 1;

You can do the list_del() here, because you're under the mutex, so others
won't see the list in an inconsistens state and you're about to break anyway.

> +			break;
> +		}
>  	}
> +	if (dev_found)
> +		list_del(&ipmi_device->head);
>  	mutex_unlock(&driver_data.ipmi_lock);
> +
> +	if (dev_found) {
> +		ipmi_flush_tx_msg(ipmi_device);
> +		acpi_ipmi_dev_put(ipmi_device);
> +	}
>  }
>  
>  /* --------------------------------------------------------------------------
> @@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
>  			void *handler_context, void *region_context)
>  {
>  	struct acpi_ipmi_msg *tx_msg;
> -	struct acpi_ipmi_device *ipmi_device = handler_context;
> +	int iface = (long)handler_context;
> +	struct acpi_ipmi_device *ipmi_device;
>  	int err, rem_time;
>  	acpi_status status;
>  	unsigned long flags;
> @@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
>  	if ((function & ACPI_IO_MASK) == ACPI_READ)
>  		return AE_TYPE;
>  
> -	if (!ipmi_device->user_interface)
> +	ipmi_device = acpi_ipmi_get_targeted_smi(iface);
> +	if (!ipmi_device)
>  		return AE_NOT_EXIST;
>  
>  	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> -	if (!tx_msg)
> -		return AE_NO_MEMORY;
> +	if (!tx_msg) {
> +		status = AE_NO_MEMORY;
> +		goto out_ref;
> +	}
>  
>  	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
>  		status = AE_TYPE;
> @@ -449,6 +520,8 @@ out_list:
>  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>  out_msg:
>  	kfree(tx_msg);
> +out_ref:
> +	acpi_ipmi_dev_put(ipmi_device);
>  	return status;
>  }
>  
> @@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
>  	status = acpi_install_address_space_handler(ipmi->handle,
>  						    ACPI_ADR_SPACE_IPMI,
>  						    &acpi_ipmi_space_handler,
> -						    NULL, ipmi);
> +						    NULL, (void *)((long)ipmi->ipmi_ifnum));
>  	if (ACPI_FAILURE(status)) {
>  		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
>  		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
> @@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
>  	return 0;
>  }
>  
> -static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
> -{
> -
> -	INIT_LIST_HEAD(&ipmi_device->head);
> -
> -	spin_lock_init(&ipmi_device->tx_msg_lock);
> -	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> -	ipmi_install_space_handler(ipmi_device);
> -
> -	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> -}
> -
> -static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device)
> -{
> -	/*
> -	 * If the IPMI user interface is created, it should be
> -	 * destroyed.
> -	 */
> -	if (ipmi_device->user_interface) {
> -		ipmi_destroy_user(ipmi_device->user_interface);
> -		ipmi_device->user_interface = NULL;
> -	}
> -	/* flush the Tx_msg list */
> -	if (!list_empty(&ipmi_device->tx_msg_list))
> -		ipmi_flush_tx_msg(ipmi_device);
> -
> -	list_del(&ipmi_device->head);
> -	ipmi_remove_space_handler(ipmi_device);
> -}
> -
>  static int __init acpi_ipmi_init(void)
>  {
>  	int result = 0;
> @@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)
>  
>  static void __exit acpi_ipmi_exit(void)
>  {
> -	struct acpi_ipmi_device *ipmi_device, *temp;
> +	struct acpi_ipmi_device *ipmi_device;
>  
>  	if (acpi_disabled)
>  		return;
> @@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
>  	 * handler and free it.
>  	 */
>  	mutex_lock(&driver_data.ipmi_lock);
> -	list_for_each_entry_safe(ipmi_device, temp,
> -				&driver_data.ipmi_devices, head) {
> -		acpi_remove_ipmi_device(ipmi_device);
> -		put_device(ipmi_device->smi_data.dev);
> -		kfree(ipmi_device);
> +	while (!list_empty(&driver_data.ipmi_devices)) {
> +		ipmi_device = list_first_entry(&driver_data.ipmi_devices,
> +					       struct acpi_ipmi_device,
> +					       head);
> +		list_del(&ipmi_device->head);
> +		mutex_unlock(&driver_data.ipmi_lock);
> +
> +		ipmi_flush_tx_msg(ipmi_device);
> +		acpi_ipmi_dev_put(ipmi_device);
> +
> +		mutex_lock(&driver_data.ipmi_lock);
>  	}
>  	mutex_unlock(&driver_data.ipmi_lock);
>  }
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  2013-07-23  8:09     ` Lv Zheng
  (?)
@ 2013-07-25 22:23     ` Rafael J. Wysocki
  2013-07-26  1:21         ` Zheng, Lv
  -1 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 22:23 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> This patch adds reference counting for ACPI IPMI transfers to tune the
> locking granularity of tx_msg_lock.
> 
> The acpi_ipmi_msg handling is re-designed using referece counting.
> 1. tx_msg is always unlinked before complete(), so that:
>    1.1. it is safe to put complete() out side of tx_msg_lock;
>    1.2. complete() can only happen once, thus smp_wmb() is not required.
> 2. Increasing the reference of tx_msg before calling
>    ipmi_request_settime() and introducing tx_msg_lock protected
>    ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
>    tx_msg unlinking in the failure cases.
> 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
>    and freed in the contexts other than acpi_ipmi_space_handler().
> 
> The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> tuning:
> 1. ipmi_lock is always leaf:
>    irq_context: 0
>    [ffffffff81a943f8] smi_watchers_mutex
>    [ffffffffa06eca60] driver_data.ipmi_lock
>    irq_context: 0
>    [ffffffff82767b40] &buffer->mutex
>    [ffffffffa00a6678] s_active#103
>    [ffffffffa06eca60] driver_data.ipmi_lock
> 2. without this patch applied, lock used by complete() is held after
>    holding tx_msg_lock:
>    irq_context: 0
>    [ffffffff82767b40] &buffer->mutex
>    [ffffffffa00a6678] s_active#103
>    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
>    irq_context: 1
>    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
>    irq_context: 1
>    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
>    [ffffffffa06eccf0] &x->wait#25
>    irq_context: 1
>    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
>    [ffffffffa06eccf0] &x->wait#25
>    [ffffffff81e36620] &p->pi_lock
>    irq_context: 1
>    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
>    [ffffffffa06eccf0] &x->wait#25
>    [ffffffff81e36620] &p->pi_lock
>    [ffffffff81e5d0a8] &rq->lock
> 3. with this patch applied, tx_msg_lock is always leaf:
>    irq_context: 0
>    [ffffffff82767b40] &buffer->mutex
>    [ffffffffa00a66d8] s_active#107
>    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
>    irq_context: 1
>    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> 
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> Cc: Zhao Yakui <yakui.zhao@intel.com>
> Reviewed-by: Huang Ying <ying.huang@intel.com>
> ---
>  drivers/acpi/acpi_ipmi.c |  107 +++++++++++++++++++++++++++++++++-------------
>  1 file changed, 77 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 2a09156..0ee1ea6 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
>  	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
>  	u8	rx_len;
>  	struct acpi_ipmi_device *device;
> +	atomic_t	refcnt;

Again: kref, please?

>  };
>  
>  /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> @@ -195,22 +196,47 @@ static struct acpi_ipmi_device *acpi_ipmi_get_selected_smi(void)
>  	return ipmi_device;
>  }
>  
> -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
> +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
>  {
> +	struct acpi_ipmi_device *ipmi;
>  	struct acpi_ipmi_msg *ipmi_msg;
> -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
>  
> +	ipmi = acpi_ipmi_get_selected_smi();
> +	if (!ipmi)
> +		return NULL;
>  	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> -	if (!ipmi_msg)	{
> -		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> +	if (!ipmi_msg) {
> +		acpi_ipmi_dev_put(ipmi);
>  		return NULL;
>  	}
> +	atomic_set(&ipmi_msg->refcnt, 1);
>  	init_completion(&ipmi_msg->tx_complete);
>  	INIT_LIST_HEAD(&ipmi_msg->head);
>  	ipmi_msg->device = ipmi;
> +
>  	return ipmi_msg;
>  }
>  
> +static struct acpi_ipmi_msg *
> +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> +{
> +	if (tx_msg)
> +		atomic_inc(&tx_msg->refcnt);
> +	return tx_msg;
> +}
> +
> +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> +{
> +	acpi_ipmi_dev_put(tx_msg->device);
> +	kfree(tx_msg);
> +}
> +
> +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> +{
> +	if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> +		ipmi_msg_release(tx_msg);
> +}
> +
>  #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
>  #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
>  static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
>  
>  static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
>  {
> -	struct acpi_ipmi_msg *tx_msg, *temp;
> +	struct acpi_ipmi_msg *tx_msg;
>  	unsigned long flags;
>  
>  	/*
> @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
>  	 */
>  	while (atomic_read(&ipmi->refcnt) > 1) {
>  		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> -		list_for_each_entry_safe(tx_msg, temp,
> -					 &ipmi->tx_msg_list, head) {
> +		while (!list_empty(&ipmi->tx_msg_list)) {
> +			tx_msg = list_first_entry(&ipmi->tx_msg_list,
> +						  struct acpi_ipmi_msg,
> +						  head);
> +			list_del(&tx_msg->head);
> +			spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> +
>  			/* wake up the sleep thread on the Tx msg */
>  			complete(&tx_msg->tx_complete);
> +			acpi_ipmi_msg_put(tx_msg);
> +			spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
>  		}
>  		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> +
>  		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
>  	}
>  }
>  
> +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> +			       struct acpi_ipmi_msg *msg)
> +{
> +	struct acpi_ipmi_msg *tx_msg;
> +	int msg_found = 0;

Use bool?

> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> +	list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> +		if (msg == tx_msg) {
> +			msg_found = 1;
> +			break;
> +		}
> +	}
> +	if (msg_found)
> +		list_del(&tx_msg->head);

The list_del() can be done when you set msg_found.

> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> +
> +	if (msg_found)
> +		acpi_ipmi_msg_put(tx_msg);
> +}
> +
>  static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  {
>  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> @@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  			break;
>  		}
>  	}
> +	if (msg_found)
> +		list_del(&tx_msg->head);
> +	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>  
>  	if (!msg_found) {
>  		dev_warn(&pnp_dev->dev,
>  			 "Unexpected response (msg id %ld) is returned.\n",
>  			 msg->msgid);
> -		goto out_lock;
> +		goto out_msg;
>  	}
>  
>  	/* copy the response data to Rx_data buffer */
> @@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  	}
>  	tx_msg->rx_len = msg->msg.data_len;
>  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> -	/* tx_msg content must be valid before setting msg_done flag */
> -	smp_wmb();
>  	tx_msg->msg_done = 1;
>  
>  out_comp:
>  	complete(&tx_msg->tx_complete);
> -out_lock:
> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> +	acpi_ipmi_msg_put(tx_msg);
>  out_msg:
>  	ipmi_free_recv_msg(msg);
>  }
> @@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
>  	if ((function & ACPI_IO_MASK) == ACPI_READ)
>  		return AE_TYPE;
>  
> -	ipmi_device = acpi_ipmi_get_selected_smi();
> -	if (!ipmi_device)
> +	tx_msg = ipmi_msg_alloc();
> +	if (!tx_msg)
>  		return AE_NOT_EXIST;
> -
> -	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> -	if (!tx_msg) {
> -		status = AE_NO_MEMORY;
> -		goto out_ref;
> -	}
> +	ipmi_device = tx_msg->device;
>  
>  	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> -		status = AE_TYPE;
> -		goto out_msg;
> +		ipmi_msg_release(tx_msg);
> +		return AE_TYPE;
>  	}
>  
> +	acpi_ipmi_msg_get(tx_msg);
>  	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
>  	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
>  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> @@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
>  				   NULL, 0, 0, 0);
>  	if (err) {
>  		status = AE_ERROR;
> -		goto out_list;
> +		goto out_msg;
>  	}
>  	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>  					       IPMI_TIMEOUT);
>  	acpi_format_ipmi_response(tx_msg, value, rem_time);
>  	status = AE_OK;
>  
> -out_list:
> -	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> -	list_del(&tx_msg->head);
> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>  out_msg:
> -	kfree(tx_msg);
> -out_ref:
> -	acpi_ipmi_dev_put(ipmi_device);
> +	ipmi_cancel_tx_msg(ipmi_device, tx_msg);
> +	acpi_ipmi_msg_put(tx_msg);
>  	return status;
>  }
>  
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
  2013-07-23  8:10     ` Lv Zheng
  (?)
@ 2013-07-25 22:25     ` Rafael J. Wysocki
  2013-07-26  1:25         ` Zheng, Lv
  -1 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-25 22:25 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> This is a trivial patch:
> 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
>    actually used.
> 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
>    by dev_warn() invocations, so changes it to struct device.
> 
> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> Reviewed-by: Huang Ying <ying.huang@intel.com>
> ---
>  drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> index 0ee1ea6..7f93ffd 100644
> --- a/drivers/acpi/acpi_ipmi.c
> +++ b/drivers/acpi/acpi_ipmi.c
> @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
>  	struct list_head tx_msg_list;
>  	spinlock_t	tx_msg_lock;
>  	acpi_handle handle;
> -	struct pnp_dev *pnp_dev;
> +	struct device *dev;
>  	ipmi_user_t	user_interface;
>  	int ipmi_ifnum; /* IPMI interface number */
>  	long curr_msgid;
> -	struct ipmi_smi_info smi_data;
>  	atomic_t refcnt;
>  };
>  
> @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {
>  };
>  
>  static struct acpi_ipmi_device *
> -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
> +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)

Why is the second arg called pdev?

>  {
>  	struct acpi_ipmi_device *ipmi_device;
>  	int err;
> @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
>  	spin_lock_init(&ipmi_device->tx_msg_lock);
>  
>  	ipmi_device->handle = handle;
> -	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> -	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
> +	ipmi_device->dev = get_device(pdev);
>  	ipmi_device->ipmi_ifnum = iface;
>  
>  	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
>  			       ipmi_device, &user);
>  	if (err) {
> -		put_device(smi_data->dev);
> +		put_device(pdev);
>  		kfree(ipmi_device);
>  		return NULL;
>  	}
> @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device)
>  static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
>  {
>  	ipmi_destroy_user(ipmi_device->user_interface);
> -	put_device(ipmi_device->smi_data.dev);
> +	put_device(ipmi_device->dev);
>  	kfree(ipmi_device);
>  }
>  
> @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
>  	buffer = (struct acpi_ipmi_buffer *)value;
>  	/* copy the tx message data */
>  	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> -		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> +		dev_WARN_ONCE(tx_msg->device->dev, true,
>  			      "Unexpected request (msg len %d).\n",
>  			      buffer->length);
>  		return -EINVAL;
> @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
>  	int msg_found = 0;
>  	struct acpi_ipmi_msg *tx_msg;
> -	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> +	struct device *dev = ipmi_device->dev;
>  	unsigned long flags;
>  
>  	if (msg->user != ipmi_device->user_interface) {
> -		dev_warn(&pnp_dev->dev,
> +		dev_warn(dev,
>  			 "Unexpected response is returned. returned user %p, expected user %p\n",
>  			 msg->user, ipmi_device->user_interface);
>  		goto out_msg;
> @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
>  
>  	if (!msg_found) {
> -		dev_warn(&pnp_dev->dev,
> +		dev_warn(dev,
>  			 "Unexpected response (msg id %ld) is returned.\n",
>  			 msg->msgid);
>  		goto out_msg;
> @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
>  
>  	/* copy the response data to Rx_data buffer */
>  	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> -		dev_WARN_ONCE(&pnp_dev->dev, true,
> +		dev_WARN_ONCE(dev, true,
>  			      "Unexpected response (msg len %d).\n",
>  			      msg->msg.data_len);
>  		goto out_comp;
> @@ -431,7 +429,7 @@ out_msg:
>  static void ipmi_register_bmc(int iface, struct device *dev)
>  {
>  	struct acpi_ipmi_device *ipmi_device, *temp;
> -	struct pnp_dev *pnp_dev;
> +	struct device *pdev;

And here?

>  	int err;
>  	struct ipmi_smi_info smi_data;
>  	acpi_handle handle;
> @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct device *dev)
>  	handle = smi_data.addr_info.acpi_info.acpi_handle;
>  	if (!handle)
>  		goto err_ref;
> -	pnp_dev = to_pnp_dev(smi_data.dev);
> +	pdev = smi_data.dev;
>  
> -	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> +	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
>  	if (!ipmi_device) {
> -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> +		dev_warn(pdev, "Can't create IPMI user interface\n");
>  		goto err_ref;
>  	}
>  
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-25 12:06           ` Rafael J. Wysocki
@ 2013-07-26  0:09             ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer



> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Thursday, July 25, 2013 8:07 PM
> 
> On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> > -stable according to the previous conversation.
> >
> > > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > > Sent: Thursday, July 25, 2013 7:38 AM
> > >
> > > On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > > > This patch fixes races caused by unprotected ACPI IPMI transfers.
> > > >
> > > > We can see the following crashes may occur:
> > > > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > > >    ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > > >    acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > > > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > > >    while it is parellel accessed in ipmi_flush_tx_msg() and
> > > >    ipmi_msg_handler().
> > > >
> > > > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > > > solve this issue.  Then tx_msg_lock is always held around
> > > > complete() and tx_msg accesses.
> > > > Calling smp_wmb() before setting msg_done flag so that messages
> > > > completed due to flushing will not be handled as 'done' messages
> > > > while their contents are not vaild.
> > > >
> > > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > > ---
> > > >  drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > > index
> > > > b37c189..527ee43 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > > acpi_ipmi_device *ipmi)
> > > >  	struct acpi_ipmi_msg *tx_msg, *temp;
> > > >  	int count = HZ / 10;
> > > >  	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > > > +	unsigned long flags;
> > > >
> > > > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > >  	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > > >  		/* wake up the sleep thread on the Tx msg */
> > > >  		complete(&tx_msg->tx_complete);
> > > >  	}
> > > > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > >
> > > >  	/* wait for about 100ms to flush the tx message list */
> > > >  	while (count--) {
> > > > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > > ipmi_recv_msg *msg, void *user_msg_data)
> > > >  			break;
> > > >  		}
> > > >  	}
> > > > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > > >
> > > >  	if (!msg_found) {
> > > >  		dev_warn(&pnp_dev->dev,
> > > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > > >  			 msg->msgid);
> > > > -		goto out_msg;
> > > > +		goto out_lock;
> > > >  	}
> > > >
> > > >  	/* copy the response data to Rx_data buffer */ @@ -286,10
> > > > +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg,
> > > > void
> > > *user_msg_data)
> > > >  	}
> > > >  	tx_msg->rx_len = msg->msg.data_len;
> > > >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > > > +	/* tx_msg content must be valid before setting msg_done flag */
> > > > +	smp_wmb();
> > >
> > > That's suspicious.
> > >
> > > If you need the write barrier here, you'll most likely need a read
> > > barrier somewhere else.  Where's that?
> >
> > It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> >
> > So comment could be treated as 2 parts:
> > 1. do we need a paired smp_rmb().
> > 2. do we need a smp_wmb().
> >
> > For 1.
> > If we want a paired smp_rmb(), then it will appear in this function:
> >
> > 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> > 187                 acpi_integer *value, int rem_time)
> > 188 {
> > 189         struct acpi_ipmi_buffer *buffer;
> > 190
> > 191         /*
> > 192          * value is also used as output parameter. It represents the
> response
> > 193          * IPMI message returned by IPMI command.
> > 194          */
> > 195         buffer = (struct acpi_ipmi_buffer *)value;
> > 196         if (!rem_time && !msg->msg_done) {
> > 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> > 198                 return;
> > 199         }
> > 200         /*
> > 201          * If the flag of msg_done is not set or the recv length is zero,
> it
> > 202          * means that the IPMI command is not executed correctly.
> > 203          * The status code will be ACPI_IPMI_UNKNOWN.
> > 204          */
> > 205         if (!msg->msg_done || !msg->rx_len) {
> > 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> > 207                 return;
> > 208         }
> > +         smp_rmb();
> > 209         /*
> > 210          * If the IPMI response message is obtained correctly, the
> status code
> > 211          * will be ACPI_IPMI_OK
> > 212          */
> > 213         buffer->status = ACPI_IPMI_OK;
> > 214         buffer->length = msg->rx_len;
> > 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> > 216 }
> >
> > If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> > Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> >
> > Being without smp_rmb() is also OK in this case, since:
> > 1. buffer->data will never be used when buffer->status is not
> > ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will be
> deleted in [PATCH 07].
> >
> > So IMO, we needn't add the smp_rmb(), what do you think of this?
> >
> > For 2.
> > If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> 
> Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> because each of them prevents only one flow of control from being
> speculatively reordered, either by the CPU or by the compiler.  If only one of
> them is used without the other, then the flow of control without the barrier
> may be reordered in a way that will effectively cancel the effect of the barrier in
> the second flow of control.
> 
> So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them
> at all.

I think you are right, it is about order of L1 cache flushing.

The smp_wmb()/smp_rmb() is not a useful approach for non-tuning implementations.

It's here because the code used a combined programing model, and thus a bug.
Such bugs can be avoided by:
1. either using bigger granularity locks, in this case, tx_msg_lock should be held around acpi_format_ipmi_response()
2. or using smaller granularity locks, races are automatically avoided by the excluded running flows (like what the PATCH 07 shows)

I'll update this patch or even drop it.

Thanks and best regards
-Lv

> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-26  0:09             ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7482 bytes --]



> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Thursday, July 25, 2013 8:07 PM
> 
> On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> > -stable according to the previous conversation.
> >
> > > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > > Sent: Thursday, July 25, 2013 7:38 AM
> > >
> > > On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > > > This patch fixes races caused by unprotected ACPI IPMI transfers.
> > > >
> > > > We can see the following crashes may occur:
> > > > 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > > >    ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > > >    acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > > > 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > > >    while it is parellel accessed in ipmi_flush_tx_msg() and
> > > >    ipmi_msg_handler().
> > > >
> > > > This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > > > solve this issue.  Then tx_msg_lock is always held around
> > > > complete() and tx_msg accesses.
> > > > Calling smp_wmb() before setting msg_done flag so that messages
> > > > completed due to flushing will not be handled as 'done' messages
> > > > while their contents are not vaild.
> > > >
> > > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > > ---
> > > >  drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> > > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > > index
> > > > b37c189..527ee43 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > > acpi_ipmi_device *ipmi)
> > > >  	struct acpi_ipmi_msg *tx_msg, *temp;
> > > >  	int count = HZ / 10;
> > > >  	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > > > +	unsigned long flags;
> > > >
> > > > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > >  	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > > >  		/* wake up the sleep thread on the Tx msg */
> > > >  		complete(&tx_msg->tx_complete);
> > > >  	}
> > > > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > >
> > > >  	/* wait for about 100ms to flush the tx message list */
> > > >  	while (count--) {
> > > > @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > > ipmi_recv_msg *msg, void *user_msg_data)
> > > >  			break;
> > > >  		}
> > > >  	}
> > > > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > > >
> > > >  	if (!msg_found) {
> > > >  		dev_warn(&pnp_dev->dev,
> > > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > > >  			 msg->msgid);
> > > > -		goto out_msg;
> > > > +		goto out_lock;
> > > >  	}
> > > >
> > > >  	/* copy the response data to Rx_data buffer */ @@ -286,10
> > > > +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg,
> > > > void
> > > *user_msg_data)
> > > >  	}
> > > >  	tx_msg->rx_len = msg->msg.data_len;
> > > >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > > > +	/* tx_msg content must be valid before setting msg_done flag */
> > > > +	smp_wmb();
> > >
> > > That's suspicious.
> > >
> > > If you need the write barrier here, you'll most likely need a read
> > > barrier somewhere else.  Where's that?
> >
> > It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> >
> > So comment could be treated as 2 parts:
> > 1. do we need a paired smp_rmb().
> > 2. do we need a smp_wmb().
> >
> > For 1.
> > If we want a paired smp_rmb(), then it will appear in this function:
> >
> > 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> > 187                 acpi_integer *value, int rem_time)
> > 188 {
> > 189         struct acpi_ipmi_buffer *buffer;
> > 190
> > 191         /*
> > 192          * value is also used as output parameter. It represents the
> response
> > 193          * IPMI message returned by IPMI command.
> > 194          */
> > 195         buffer = (struct acpi_ipmi_buffer *)value;
> > 196         if (!rem_time && !msg->msg_done) {
> > 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> > 198                 return;
> > 199         }
> > 200         /*
> > 201          * If the flag of msg_done is not set or the recv length is zero,
> it
> > 202          * means that the IPMI command is not executed correctly.
> > 203          * The status code will be ACPI_IPMI_UNKNOWN.
> > 204          */
> > 205         if (!msg->msg_done || !msg->rx_len) {
> > 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> > 207                 return;
> > 208         }
> > +         smp_rmb();
> > 209         /*
> > 210          * If the IPMI response message is obtained correctly, the
> status code
> > 211          * will be ACPI_IPMI_OK
> > 212          */
> > 213         buffer->status = ACPI_IPMI_OK;
> > 214         buffer->length = msg->rx_len;
> > 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> > 216 }
> >
> > If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> > Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> >
> > Being without smp_rmb() is also OK in this case, since:
> > 1. buffer->data will never be used when buffer->status is not
> > ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will be
> deleted in [PATCH 07].
> >
> > So IMO, we needn't add the smp_rmb(), what do you think of this?
> >
> > For 2.
> > If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> 
> Using smp_wmb() without the complementary smp_rmb() doesn't makes sense,
> because each of them prevents only one flow of control from being
> speculatively reordered, either by the CPU or by the compiler.  If only one of
> them is used without the other, then the flow of control without the barrier
> may be reordered in a way that will effectively cancel the effect of the barrier in
> the second flow of control.
> 
> So, either we need *both* smp_wmb() and smp_rmb(), or we don't need them
> at all.

I think you are right, it is about order of L1 cache flushing.

The smp_wmb()/smp_rmb() is not a useful approach for non-tuning implementations.

It's here because the code used a combined programing model, and thus a bug.
Such bugs can be avoided by:
1. either using bigger granularity locks, in this case, tx_msg_lock should be held around acpi_format_ipmi_response()
2. or using smaller granularity locks, races are automatically avoided by the excluded running flows (like what the PATCH 07 shows)

I'll update this patch or even drop it.

Thanks and best regards
-Lv

> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-25 18:12             ` Corey Minyard
@ 2013-07-26  0:16               ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:16 UTC (permalink / raw)
  To: minyard, Rafael J. Wysocki
  Cc: Brown, Len, Wysocki, Rafael J, linux-kernel, Zhao, Yakui,
	linux-acpi, openipmi-developer

> From: Corey Minyard [mailto:tcminyard@gmail.com]
> Sent: Friday, July 26, 2013 2:13 AM
> 
> On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> >> -stable according to the previous conversation.
> >>
> >>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> >>> Sent: Thursday, July 25, 2013 7:38 AM
> >>>
> >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> >>>>
> >>>> We can see the following crashes may occur:
> >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
> >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
> >>>>     ipmi_msg_handler().
> >>>>
> >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> >>>> solve this issue.  Then tx_msg_lock is always held around
> >>>> complete() and tx_msg accesses.
> >>>> Calling smp_wmb() before setting msg_done flag so that messages
> >>>> completed due to flushing will not be handled as 'done' messages
> >>>> while their contents are not vaild.
> >>>>
> >>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> >>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
> >>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
> >>>> ---
> >>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> >>>>   1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> >>>> index
> >>>> b37c189..527ee43 100644
> >>>> --- a/drivers/acpi/acpi_ipmi.c
> >>>> +++ b/drivers/acpi/acpi_ipmi.c
> >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> >>> acpi_ipmi_device *ipmi)
> >>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
> >>>>   	int count = HZ / 10;
> >>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >>>> +	unsigned long flags;
> >>>>
> >>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >>>>   		/* wake up the sleep thread on the Tx msg */
> >>>>   		complete(&tx_msg->tx_complete);
> >>>>   	}
> >>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >>>>
> >>>>   	/* wait for about 100ms to flush the tx message list */
> >>>>   	while (count--) {
> >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> >>> ipmi_recv_msg *msg, void *user_msg_data)
> >>>>   			break;
> >>>>   		}
> >>>>   	}
> >>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >>>>
> >>>>   	if (!msg_found) {
> >>>>   		dev_warn(&pnp_dev->dev,
> >>>>   			 "Unexpected response (msg id %ld) is returned.\n",
> >>>>   			 msg->msgid);
> >>>> -		goto out_msg;
> >>>> +		goto out_lock;
> >>>>   	}
> >>>>
> >>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10
> >>>> +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg,
> >>>> void
> >>> *user_msg_data)
> >>>>   	}
> >>>>   	tx_msg->rx_len = msg->msg.data_len;
> >>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> >>>> +	/* tx_msg content must be valid before setting msg_done flag */
> >>>> +	smp_wmb();
> >>> That's suspicious.
> >>>
> >>> If you need the write barrier here, you'll most likely need a read
> >>> barrier somewhere else.  Where's that?
> >> It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> >>
> >> So comment could be treated as 2 parts:
> >> 1. do we need a paired smp_rmb().
> >> 2. do we need a smp_wmb().
> >>
> >> For 1.
> >> If we want a paired smp_rmb(), then it will appear in this function:
> >>
> >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> >> 187                 acpi_integer *value, int rem_time)
> >> 188 {
> >> 189         struct acpi_ipmi_buffer *buffer;
> >> 190
> >> 191         /*
> >> 192          * value is also used as output parameter. It represents the
> response
> >> 193          * IPMI message returned by IPMI command.
> >> 194          */
> >> 195         buffer = (struct acpi_ipmi_buffer *)value;
> >> 196         if (!rem_time && !msg->msg_done) {
> >> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> >> 198                 return;
> >> 199         }
> >> 200         /*
> >> 201          * If the flag of msg_done is not set or the recv length is zero,
> it
> >> 202          * means that the IPMI command is not executed correctly.
> >> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> >> 204          */
> >> 205         if (!msg->msg_done || !msg->rx_len) {
> >> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> >> 207                 return;
> >> 208         }
> >> +         smp_rmb();
> >> 209         /*
> >> 210          * If the IPMI response message is obtained correctly, the
> status code
> >> 211          * will be ACPI_IPMI_OK
> >> 212          */
> >> 213         buffer->status = ACPI_IPMI_OK;
> >> 214         buffer->length = msg->rx_len;
> >> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> >> 216 }
> >>
> >> If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> >> Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> >>
> >> Being without smp_rmb() is also OK in this case, since:
> >> 1. buffer->data will never be used when buffer->status is not
> >> ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will be
> deleted in [PATCH 07].
> >>
> >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> >>
> >> For 2.
> >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> > Using smp_wmb() without the complementary smp_rmb() doesn't makes
> > sense, because each of them prevents only one flow of control from
> > being speculatively reordered, either by the CPU or by the compiler.
> > If only one of them is used without the other, then the flow of
> > control without the barrier may be reordered in a way that will
> > effectively cancel the effect of the barrier in the second flow of control.
> >
> > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need
> them at all.
> 
> If I understand this correctly, the problem would be if:
> 
> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>                                          IPMI_TIMEOUT);
> 
> returns on a timeout, then checks msg_done and races with something setting
> msg_done.  If that is the case, you would need the smp_rmb() before checking
> msg_done.
> 
> However, the timeout above is unnecessary.  You are using
> ipmi_request_settime(), so you can set the timeout when the IPMI command
> fails and returns a failure message.  The driver guarantees a return message
> for each request.  Just remove the timeout from the completion, set the
> timeout and retries in the ipmi request, and the completion should handle the
> barrier issues.

It's just difficult for me to determine retry count and timeout value, maybe retry=0, timeout=IPMI_TIMEOUT is OK.
The code of the timeout completion is already there, I think the quick fix code should not introduce this logic.
I'll add a new patch to apply your comment.

> 
> Plus, from a quick glance at the code, it doesn't look like it will properly handle a
> situation where the timeout occurs and is handled then the response comes in
> later.

PATCH 07 fixed this issue.
Here we just need the smp_rmb() or holding tx_msg_lock() around the acpi_format_ipmi_response().

Thanks for commenting.

Best regards
-Lv

> 
> -corey
> 
> >
> > Thanks,
> > Rafael
> >
> >

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-26  0:16               ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:16 UTC (permalink / raw)
  To: minyard, Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Zhao, Yakui, linux-kernel,
	linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 8278 bytes --]

> From: Corey Minyard [mailto:tcminyard@gmail.com]
> Sent: Friday, July 26, 2013 2:13 AM
> 
> On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> >> -stable according to the previous conversation.
> >>
> >>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> >>> Sent: Thursday, July 25, 2013 7:38 AM
> >>>
> >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> >>>>
> >>>> We can see the following crashes may occur:
> >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> >>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> >>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
> >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> >>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
> >>>>     ipmi_msg_handler().
> >>>>
> >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> >>>> solve this issue.  Then tx_msg_lock is always held around
> >>>> complete() and tx_msg accesses.
> >>>> Calling smp_wmb() before setting msg_done flag so that messages
> >>>> completed due to flushing will not be handled as 'done' messages
> >>>> while their contents are not vaild.
> >>>>
> >>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> >>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
> >>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
> >>>> ---
> >>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> >>>>   1 file changed, 8 insertions(+), 2 deletions(-)
> >>>>
> >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> >>>> index
> >>>> b37c189..527ee43 100644
> >>>> --- a/drivers/acpi/acpi_ipmi.c
> >>>> +++ b/drivers/acpi/acpi_ipmi.c
> >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> >>> acpi_ipmi_device *ipmi)
> >>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
> >>>>   	int count = HZ / 10;
> >>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >>>> +	unsigned long flags;
> >>>>
> >>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> >>>>   		/* wake up the sleep thread on the Tx msg */
> >>>>   		complete(&tx_msg->tx_complete);
> >>>>   	}
> >>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> >>>>
> >>>>   	/* wait for about 100ms to flush the tx message list */
> >>>>   	while (count--) {
> >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> >>> ipmi_recv_msg *msg, void *user_msg_data)
> >>>>   			break;
> >>>>   		}
> >>>>   	}
> >>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >>>>
> >>>>   	if (!msg_found) {
> >>>>   		dev_warn(&pnp_dev->dev,
> >>>>   			 "Unexpected response (msg id %ld) is returned.\n",
> >>>>   			 msg->msgid);
> >>>> -		goto out_msg;
> >>>> +		goto out_lock;
> >>>>   	}
> >>>>
> >>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10
> >>>> +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg,
> >>>> void
> >>> *user_msg_data)
> >>>>   	}
> >>>>   	tx_msg->rx_len = msg->msg.data_len;
> >>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> >>>> +	/* tx_msg content must be valid before setting msg_done flag */
> >>>> +	smp_wmb();
> >>> That's suspicious.
> >>>
> >>> If you need the write barrier here, you'll most likely need a read
> >>> barrier somewhere else.  Where's that?
> >> It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> >>
> >> So comment could be treated as 2 parts:
> >> 1. do we need a paired smp_rmb().
> >> 2. do we need a smp_wmb().
> >>
> >> For 1.
> >> If we want a paired smp_rmb(), then it will appear in this function:
> >>
> >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> >> 187                 acpi_integer *value, int rem_time)
> >> 188 {
> >> 189         struct acpi_ipmi_buffer *buffer;
> >> 190
> >> 191         /*
> >> 192          * value is also used as output parameter. It represents the
> response
> >> 193          * IPMI message returned by IPMI command.
> >> 194          */
> >> 195         buffer = (struct acpi_ipmi_buffer *)value;
> >> 196         if (!rem_time && !msg->msg_done) {
> >> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> >> 198                 return;
> >> 199         }
> >> 200         /*
> >> 201          * If the flag of msg_done is not set or the recv length is zero,
> it
> >> 202          * means that the IPMI command is not executed correctly.
> >> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> >> 204          */
> >> 205         if (!msg->msg_done || !msg->rx_len) {
> >> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> >> 207                 return;
> >> 208         }
> >> +         smp_rmb();
> >> 209         /*
> >> 210          * If the IPMI response message is obtained correctly, the
> status code
> >> 211          * will be ACPI_IPMI_OK
> >> 212          */
> >> 213         buffer->status = ACPI_IPMI_OK;
> >> 214         buffer->length = msg->rx_len;
> >> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> >> 216 }
> >>
> >> If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> >> Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> >>
> >> Being without smp_rmb() is also OK in this case, since:
> >> 1. buffer->data will never be used when buffer->status is not
> >> ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will be
> deleted in [PATCH 07].
> >>
> >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> >>
> >> For 2.
> >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> > Using smp_wmb() without the complementary smp_rmb() doesn't makes
> > sense, because each of them prevents only one flow of control from
> > being speculatively reordered, either by the CPU or by the compiler.
> > If only one of them is used without the other, then the flow of
> > control without the barrier may be reordered in a way that will
> > effectively cancel the effect of the barrier in the second flow of control.
> >
> > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need
> them at all.
> 
> If I understand this correctly, the problem would be if:
> 
> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>                                          IPMI_TIMEOUT);
> 
> returns on a timeout, then checks msg_done and races with something setting
> msg_done.  If that is the case, you would need the smp_rmb() before checking
> msg_done.
> 
> However, the timeout above is unnecessary.  You are using
> ipmi_request_settime(), so you can set the timeout when the IPMI command
> fails and returns a failure message.  The driver guarantees a return message
> for each request.  Just remove the timeout from the completion, set the
> timeout and retries in the ipmi request, and the completion should handle the
> barrier issues.

It's just difficult for me to determine retry count and timeout value, maybe retry=0, timeout=IPMI_TIMEOUT is OK.
The code of the timeout completion is already there, I think the quick fix code should not introduce this logic.
I'll add a new patch to apply your comment.

> 
> Plus, from a quick glance at the code, it doesn't look like it will properly handle a
> situation where the timeout occurs and is handled then the response comes in
> later.

PATCH 07 fixed this issue.
Here we just need the smp_rmb() or holding tx_msg_lock() around the acpi_format_ipmi_response().

Thanks for commenting.

Best regards
-Lv

> 
> -corey
> 
> >
> > Thanks,
> > Rafael
> >
> >

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-25 19:32               ` Rafael J. Wysocki
@ 2013-07-26  0:18                 ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, minyard
  Cc: Wysocki, Rafael J, Brown, Len, Zhao, Yakui, linux-kernel,
	linux-acpi, openipmi-developer

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> Sent: Friday, July 26, 2013 3:33 AM
> 
> On Thursday, July 25, 2013 01:12:38 PM Corey Minyard wrote:
> > On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> > >> -stable according to the previous conversation.
> > >>
> > >>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > >>> Sent: Thursday, July 25, 2013 7:38 AM
> > >>>
> > >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> > >>>>
> > >>>> We can see the following crashes may occur:
> > >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > >>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > >>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > >>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
> > >>>>     ipmi_msg_handler().
> > >>>>
> > >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > >>>> solve this issue.  Then tx_msg_lock is always held around
> > >>>> complete() and tx_msg accesses.
> > >>>> Calling smp_wmb() before setting msg_done flag so that messages
> > >>>> completed due to flushing will not be handled as 'done' messages
> > >>>> while their contents are not vaild.
> > >>>>
> > >>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > >>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
> > >>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
> > >>>> ---
> > >>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> > >>>>   1 file changed, 8 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > >>>> index
> > >>>> b37c189..527ee43 100644
> > >>>> --- a/drivers/acpi/acpi_ipmi.c
> > >>>> +++ b/drivers/acpi/acpi_ipmi.c
> > >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > >>> acpi_ipmi_device *ipmi)
> > >>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
> > >>>>   	int count = HZ / 10;
> > >>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > >>>> +	unsigned long flags;
> > >>>>
> > >>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > >>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list,
> head) {
> > >>>>   		/* wake up the sleep thread on the Tx msg */
> > >>>>   		complete(&tx_msg->tx_complete);
> > >>>>   	}
> > >>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > >>>>
> > >>>>   	/* wait for about 100ms to flush the tx message list */
> > >>>>   	while (count--) {
> > >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > >>> ipmi_recv_msg *msg, void *user_msg_data)
> > >>>>   			break;
> > >>>>   		}
> > >>>>   	}
> > >>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >>>>
> > >>>>   	if (!msg_found) {
> > >>>>   		dev_warn(&pnp_dev->dev,
> > >>>>   			 "Unexpected response (msg id %ld) is returned.\n",
> > >>>>   			 msg->msgid);
> > >>>> -		goto out_msg;
> > >>>> +		goto out_lock;
> > >>>>   	}
> > >>>>
> > >>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10
> > >>>> +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > >>>> *msg, void
> > >>> *user_msg_data)
> > >>>>   	}
> > >>>>   	tx_msg->rx_len = msg->msg.data_len;
> > >>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > >>>> +	/* tx_msg content must be valid before setting msg_done flag */
> > >>>> +	smp_wmb();
> > >>> That's suspicious.
> > >>>
> > >>> If you need the write barrier here, you'll most likely need a read
> > >>> barrier somewhere else.  Where's that?
> > >> It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> > >>
> > >> So comment could be treated as 2 parts:
> > >> 1. do we need a paired smp_rmb().
> > >> 2. do we need a smp_wmb().
> > >>
> > >> For 1.
> > >> If we want a paired smp_rmb(), then it will appear in this function:
> > >>
> > >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> > >> 187                 acpi_integer *value, int rem_time)
> > >> 188 {
> > >> 189         struct acpi_ipmi_buffer *buffer;
> > >> 190
> > >> 191         /*
> > >> 192          * value is also used as output parameter. It represents the
> response
> > >> 193          * IPMI message returned by IPMI command.
> > >> 194          */
> > >> 195         buffer = (struct acpi_ipmi_buffer *)value;
> > >> 196         if (!rem_time && !msg->msg_done) {
> > >> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> > >> 198                 return;
> > >> 199         }
> > >> 200         /*
> > >> 201          * If the flag of msg_done is not set or the recv length is
> zero, it
> > >> 202          * means that the IPMI command is not executed correctly.
> > >> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> > >> 204          */
> > >> 205         if (!msg->msg_done || !msg->rx_len) {
> > >> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> > >> 207                 return;
> > >> 208         }
> > >> +         smp_rmb();
> > >> 209         /*
> > >> 210          * If the IPMI response message is obtained correctly, the
> status code
> > >> 211          * will be ACPI_IPMI_OK
> > >> 212          */
> > >> 213         buffer->status = ACPI_IPMI_OK;
> > >> 214         buffer->length = msg->rx_len;
> > >> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> > >> 216 }
> > >>
> > >> If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> > >> Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> > >>
> > >> Being without smp_rmb() is also OK in this case, since:
> > >> 1. buffer->data will never be used when buffer->status is not
> > >> ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will
> be deleted in [PATCH 07].
> > >>
> > >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> > >>
> > >> For 2.
> > >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> > > Using smp_wmb() without the complementary smp_rmb() doesn't makes
> > > sense, because each of them prevents only one flow of control from
> > > being speculatively reordered, either by the CPU or by the compiler.
> > > If only one of them is used without the other, then the flow of
> > > control without the barrier may be reordered in a way that will
> > > effectively cancel the effect of the barrier in the second flow of control.
> > >
> > > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need
> them at all.
> >
> > If I understand this correctly, the problem would be if:
> >
> > rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> >                                          IPMI_TIMEOUT);
> >
> > returns on a timeout, then checks msg_done and races with something
> > setting msg_done.  If that is the case, you would need the smp_rmb()
> > before checking msg_done.
> 
> I believe so.
> 
> > However, the timeout above is unnecessary.  You are using
> > ipmi_request_settime(), so you can set the timeout when the IPMI
> > command fails and returns a failure message.  The driver guarantees a
> > return message for each request.  Just remove the timeout from the
> > completion, set the timeout and retries in the ipmi request, and the
> > completion should handle the barrier issues.
> 
> Good point.
> 
> > Plus, from a quick glance at the code, it doesn't look like it will
> > properly handle a situation where the timeout occurs and is handled
> > then the response comes in later.
> 
> Lv, what about this?

Please refer to my reply to Corey's comment. :-)

Thanks and best regards
-Lv

> 
> Rafael
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-26  0:18                 ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:18 UTC (permalink / raw)
  To: Rafael J. Wysocki, minyard
  Cc: Wysocki, Rafael J, Brown, Len, Zhao, Yakui, linux-kernel,
	linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 8590 bytes --]

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> Sent: Friday, July 26, 2013 3:33 AM
> 
> On Thursday, July 25, 2013 01:12:38 PM Corey Minyard wrote:
> > On 07/25/2013 07:06 AM, Rafael J. Wysocki wrote:
> > > On Thursday, July 25, 2013 03:09:35 AM Zheng, Lv wrote:
> > >> -stable according to the previous conversation.
> > >>
> > >>> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > >>> Sent: Thursday, July 25, 2013 7:38 AM
> > >>>
> > >>> On Tuesday, July 23, 2013 04:09:15 PM Lv Zheng wrote:
> > >>>> This patch fixes races caused by unprotected ACPI IPMI transfers.
> > >>>>
> > >>>> We can see the following crashes may occur:
> > >>>> 1. There is no tx_msg_lock held for iterating tx_msg_list in
> > >>>>     ipmi_flush_tx_msg() while it is parellel unlinked on failure in
> > >>>>     acpi_ipmi_space_handler() under protection of tx_msg_lock.
> > >>>> 2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
> > >>>>     while it is parellel accessed in ipmi_flush_tx_msg() and
> > >>>>     ipmi_msg_handler().
> > >>>>
> > >>>> This patch enhances tx_msg_lock to protect all tx_msg accesses to
> > >>>> solve this issue.  Then tx_msg_lock is always held around
> > >>>> complete() and tx_msg accesses.
> > >>>> Calling smp_wmb() before setting msg_done flag so that messages
> > >>>> completed due to flushing will not be handled as 'done' messages
> > >>>> while their contents are not vaild.
> > >>>>
> > >>>> Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > >>>> Cc: Zhao Yakui <yakui.zhao@intel.com>
> > >>>> Reviewed-by: Huang Ying <ying.huang@intel.com>
> > >>>> ---
> > >>>>   drivers/acpi/acpi_ipmi.c |   10 ++++++++--
> > >>>>   1 file changed, 8 insertions(+), 2 deletions(-)
> > >>>>
> > >>>> diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > >>>> index
> > >>>> b37c189..527ee43 100644
> > >>>> --- a/drivers/acpi/acpi_ipmi.c
> > >>>> +++ b/drivers/acpi/acpi_ipmi.c
> > >>>> @@ -230,11 +230,14 @@ static void ipmi_flush_tx_msg(struct
> > >>> acpi_ipmi_device *ipmi)
> > >>>>   	struct acpi_ipmi_msg *tx_msg, *temp;
> > >>>>   	int count = HZ / 10;
> > >>>>   	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > >>>> +	unsigned long flags;
> > >>>>
> > >>>> +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > >>>>   	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list,
> head) {
> > >>>>   		/* wake up the sleep thread on the Tx msg */
> > >>>>   		complete(&tx_msg->tx_complete);
> > >>>>   	}
> > >>>> +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > >>>>
> > >>>>   	/* wait for about 100ms to flush the tx message list */
> > >>>>   	while (count--) {
> > >>>> @@ -268,13 +271,12 @@ static void ipmi_msg_handler(struct
> > >>> ipmi_recv_msg *msg, void *user_msg_data)
> > >>>>   			break;
> > >>>>   		}
> > >>>>   	}
> > >>>> -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >>>>
> > >>>>   	if (!msg_found) {
> > >>>>   		dev_warn(&pnp_dev->dev,
> > >>>>   			 "Unexpected response (msg id %ld) is returned.\n",
> > >>>>   			 msg->msgid);
> > >>>> -		goto out_msg;
> > >>>> +		goto out_lock;
> > >>>>   	}
> > >>>>
> > >>>>   	/* copy the response data to Rx_data buffer */ @@ -286,10
> > >>>> +288,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > >>>> *msg, void
> > >>> *user_msg_data)
> > >>>>   	}
> > >>>>   	tx_msg->rx_len = msg->msg.data_len;
> > >>>>   	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > >>>> +	/* tx_msg content must be valid before setting msg_done flag */
> > >>>> +	smp_wmb();
> > >>> That's suspicious.
> > >>>
> > >>> If you need the write barrier here, you'll most likely need a read
> > >>> barrier somewhere else.  Where's that?
> > >> It might depend on whether the content written before the smp_wmb() is
> used or not by the other side codes under the condition set after the
> smp_wmb().
> > >>
> > >> So comment could be treated as 2 parts:
> > >> 1. do we need a paired smp_rmb().
> > >> 2. do we need a smp_wmb().
> > >>
> > >> For 1.
> > >> If we want a paired smp_rmb(), then it will appear in this function:
> > >>
> > >> 186 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
> > >> 187                 acpi_integer *value, int rem_time)
> > >> 188 {
> > >> 189         struct acpi_ipmi_buffer *buffer;
> > >> 190
> > >> 191         /*
> > >> 192          * value is also used as output parameter. It represents the
> response
> > >> 193          * IPMI message returned by IPMI command.
> > >> 194          */
> > >> 195         buffer = (struct acpi_ipmi_buffer *)value;
> > >> 196         if (!rem_time && !msg->msg_done) {
> > >> 197                 buffer->status = ACPI_IPMI_TIMEOUT;
> > >> 198                 return;
> > >> 199         }
> > >> 200         /*
> > >> 201          * If the flag of msg_done is not set or the recv length is
> zero, it
> > >> 202          * means that the IPMI command is not executed correctly.
> > >> 203          * The status code will be ACPI_IPMI_UNKNOWN.
> > >> 204          */
> > >> 205         if (!msg->msg_done || !msg->rx_len) {
> > >> 206                 buffer->status = ACPI_IPMI_UNKNOWN;
> > >> 207                 return;
> > >> 208         }
> > >> +         smp_rmb();
> > >> 209         /*
> > >> 210          * If the IPMI response message is obtained correctly, the
> status code
> > >> 211          * will be ACPI_IPMI_OK
> > >> 212          */
> > >> 213         buffer->status = ACPI_IPMI_OK;
> > >> 214         buffer->length = msg->rx_len;
> > >> 215         memcpy(buffer->data, msg->rx_data, msg->rx_len);
> > >> 216 }
> > >>
> > >> If we don't then there will only be msg content not correctly read from
> msg->rx_data.
> > >> Note that the rx_len is 0 during initialization and will never exceed the
> sizeof(buffer->data), so the read is safe.
> > >>
> > >> Being without smp_rmb() is also OK in this case, since:
> > >> 1. buffer->data will never be used when buffer->status is not
> > >> ACPI_IPMI_OK and 2. the smp_rmb()/smp_wmb() added in this patch will
> be deleted in [PATCH 07].
> > >>
> > >> So IMO, we needn't add the smp_rmb(), what do you think of this?
> > >>
> > >> For 2.
> > >> If we don't add smp_wmb() in the ipmi_msg_handler(), then the codes
> running on other thread in the acpi_format_ipmi_response() may read wrong
> msg->rx_data (a timeout triggers this function, but when
> acpi_format_ipmi_response() is entered, the msg->msg_done flag could be
> seen as 1 but the msg->rx_data is not ready), this is what we want to avoid in
> this quick fix.
> > > Using smp_wmb() without the complementary smp_rmb() doesn't makes
> > > sense, because each of them prevents only one flow of control from
> > > being speculatively reordered, either by the CPU or by the compiler.
> > > If only one of them is used without the other, then the flow of
> > > control without the barrier may be reordered in a way that will
> > > effectively cancel the effect of the barrier in the second flow of control.
> > >
> > > So, either we need *both* smp_wmb() and smp_rmb(), or we don't need
> them at all.
> >
> > If I understand this correctly, the problem would be if:
> >
> > rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> >                                          IPMI_TIMEOUT);
> >
> > returns on a timeout, then checks msg_done and races with something
> > setting msg_done.  If that is the case, you would need the smp_rmb()
> > before checking msg_done.
> 
> I believe so.
> 
> > However, the timeout above is unnecessary.  You are using
> > ipmi_request_settime(), so you can set the timeout when the IPMI
> > command fails and returns a failure message.  The driver guarantees a
> > return message for each request.  Just remove the timeout from the
> > completion, set the timeout and retries in the ipmi request, and the
> > completion should handle the barrier issues.
> 
> Good point.
> 
> > Plus, from a quick glance at the code, it doesn't look like it will
> > properly handle a situation where the timeout occurs and is handled
> > then the response comes in later.
> 
> Lv, what about this?

Please refer to my reply to Corey's comment. :-)

Thanks and best regards
-Lv

> 
> Rafael
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body
> of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-25 20:27     ` Rafael J. Wysocki
@ 2013-07-26  0:47         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer



> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 4:27 AM
> 
> On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > This patch adds reference couting for ACPI operation region handlers
> > to fix races caused by the ACPICA address space callback invocations.
> >
> > ACPICA address space callback invocation is not suitable for Linux
> > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > the address space callbacks by invoking them under a module safe
> environment.
> > The IPMI address space handler is also upgraded in this patch.
> > The acpi_unregister_region() is designed to meet the following
> > requirements:
> > 1. It acts as a barrier for operation region callbacks - no callback will
> >    happen after acpi_unregister_region().
> > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> >    functions.
> > Using reference counting rather than module referencing allows such
> > benefits to be achieved even when acpi_unregister_region() is called
> > in the environments other than module->exit().
> > The header file of include/acpi/acpi_bus.h should contain the
> > declarations that have references to some ACPICA defined types.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |   16 ++--
> >  drivers/acpi/osl.c       |  224
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  include/acpi/acpi_bus.h  |    5 ++
> >  3 files changed, 235 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 5f8f495..2a09156 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -539,20 +539,18 @@ out_ref:
> >  static int __init acpi_ipmi_init(void)  {
> >  	int result = 0;
> > -	acpi_status status;
> >
> >  	if (acpi_disabled)
> >  		return result;
> >
> >  	mutex_init(&driver_data.ipmi_lock);
> >
> > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > -						    ACPI_ADR_SPACE_IPMI,
> > -						    &acpi_ipmi_space_handler,
> > -						    NULL, NULL);
> > -	if (ACPI_FAILURE(status)) {
> > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > +				      &acpi_ipmi_space_handler,
> > +				      NULL, NULL);
> > +	if (result) {
> >  		pr_warn("Can't register IPMI opregion space handle\n");
> > -		return -EINVAL;
> > +		return result;
> >  	}
> >
> >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> >  	}
> >  	mutex_unlock(&driver_data.ipmi_lock);
> >
> > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > -					  ACPI_ADR_SPACE_IPMI,
> > -					  &acpi_ipmi_space_handler);
> > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> >  }
> >
> >  module_init(acpi_ipmi_init);
> > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > 6ab2c35..8398e51 100644
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;  static
> > struct workqueue_struct *kacpi_notify_wq;  static struct
> > workqueue_struct *kacpi_hotplug_wq;
> >
> > +struct acpi_region {
> > +	unsigned long flags;
> > +#define ACPI_REGION_DEFAULT		0x01
> > +#define ACPI_REGION_INSTALLED		0x02
> > +#define ACPI_REGION_REGISTERED		0x04
> > +#define ACPI_REGION_UNREGISTERING	0x08
> > +#define ACPI_REGION_INSTALLING		0x10
> 
> What about (1UL << 1), (1UL << 2) etc.?
> 
> Also please remove the #defines out of the struct definition.

OK.

> 
> > +	/*
> > +	 * NOTE: Upgrading All Region Handlers
> > +	 * This flag is only used during the period where not all of the
> > +	 * region handers are upgraded to the new interfaces.
> > +	 */
> > +#define ACPI_REGION_MANAGED		0x80
> > +	acpi_adr_space_handler handler;
> > +	acpi_adr_space_setup setup;
> > +	void *context;
> > +	/* Invoking references */
> > +	atomic_t refcnt;
> 
> Actually, why don't you use krefs?

If you take a look at other piece of my codes, you'll find there are two reasons:

1. I'm using while (atomic_read() > 1) to implement the objects' flushing and there is no kref API to do so.
  I just think it is not suitable for me to introduce such an API into kref.h and start another argument around kref designs in this bug fix patch. :-)
  I'll start a discussion about kref design using another thread.
2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind of atomic_t coding style.
  If atomic_t is changed to struct kref, I will need to implement two API, __ipmi_dev_release() to take a struct kref as parameter and call ipmi_dev_release inside it.
  By not using kref, I needn't write codes to implement such API.

> 
> > +};
> > +
> > +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> = {
> > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > +		.flags = ACPI_REGION_DEFAULT,
> > +	},
> > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > +		.flags = ACPI_REGION_DEFAULT,
> > +	},
> > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > +		.flags = ACPI_REGION_DEFAULT,
> > +	},
> > +	[ACPI_ADR_SPACE_IPMI] = {
> > +		.flags = ACPI_REGION_MANAGED,
> > +	},
> > +};
> > +static DEFINE_MUTEX(acpi_mutex_region);
> > +
> >  /*
> >   * This list of permanent mappings is for memory that may be accessed
> from
> >   * interrupt context, where we can't do the ioremap().
> > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> u32 type, void *context,
> >  		kfree(hp_work);
> >  }
> >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > +
> > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > +	/*
> > +	 * NOTE: Default and Managed
> > +	 * We only need to avoid region management on the regions managed
> > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need additional
> > +	 * check as many operation region handlers are not upgraded, so
> > +	 * only those known to be safe are managed (ACPI_REGION_MANAGED).
> > +	 */
> > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > +
> > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > +
> > +static acpi_status
> > +acpi_region_default_handler(u32 function,
> > +			    acpi_physical_address address,
> > +			    u32 bit_width, u64 *value,
> > +			    void *handler_context, void *region_context) {
> > +	acpi_adr_space_handler handler;
> > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > +	void *context;
> > +	acpi_status status = AE_NOT_EXIST;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return status;
> > +	}
> > +
> > +	atomic_inc(&rgn->refcnt);
> > +	handler = rgn->handler;
> > +	context = rgn->context;
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	status = handler(function, address, bit_width, value, context,
> > +			 region_context);
> 
> Why don't we call the handler under the mutex?
> 
> What exactly prevents context from becoming NULL before the call above?

It's a kind of programming style related concern.
IMO, using locks around callback function is a buggy programming style that could lead to dead locks.
Let me explain this using an example.

Object A exports a register/unregister API for other objects.
Object B calls A's register/unregister API to register/unregister B's callback.
It's likely that object B will hold lock_of_B around unregister/register when object B is destroyed/created, the lock_of_B is likely also used inside the callback.
So when object A holds the lock_of_A around the callback invocation, it leads to dead lock since:
1. the locking order for the register/unregister side will be: lock(lock_of_B), lock(lock_of_A)
2. the locking order for the callback side will be: lock(lock_of_A), lock(lock_of_B)
They are in the reversed order!

IMO, Linux may need to introduce __callback, __api as decelerators for the functions, and use sparse to enforce this rule, sparse knows if a callback is invoked under some locks.

In the case of ACPICA space_handlers, as you may know, when an ACPI operation region handler is invoked, there will be no lock held inside ACPICA (interpreter lock must be freed before executing operation region handlers).
So the likelihood of the dead lock is pretty much high here!

> 
> > +	atomic_dec(&rgn->refcnt);
> > +
> > +	return status;
> > +}
> > +
> > +static acpi_status
> > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > +			  void *handler_context, void **region_context) {
> > +	acpi_adr_space_setup setup;
> > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > +	void *context;
> > +	acpi_status status = AE_OK;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return status;
> > +	}
> > +
> > +	atomic_inc(&rgn->refcnt);
> > +	setup = rgn->setup;
> > +	context = rgn->context;
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	status = setup(handle, function, context, region_context);
> 
> Can setup drop rgn->refcnt ?

The reason is same as the handler, as a setup is also a callback.

> 
> > +	atomic_dec(&rgn->refcnt);
> > +
> > +	return status;
> > +}
> > +
> > +static int __acpi_install_region(struct acpi_region *rgn,
> > +				 acpi_adr_space_type space_id)
> > +{
> > +	int res = 0;
> > +	acpi_status status;
> > +	int installing = 0;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > +		goto out_lock;
> > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > +		res = -EBUSY;
> > +		goto out_lock;
> > +	}
> > +
> > +	installing = 1;
> > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> space_id,
> > +						    acpi_region_default_handler,
> > +						    acpi_region_default_setup,
> > +						    rgn);
> > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > +	if (ACPI_FAILURE(status))
> > +		res = -EINVAL;
> > +	else
> > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > +
> > +out_lock:
> > +	mutex_unlock(&acpi_mutex_region);
> > +	if (installing) {
> > +		if (res)
> > +			pr_err("Failed to install region %d\n", space_id);
> > +		else
> > +			pr_info("Region %d installed\n", space_id);
> > +	}
> > +	return res;
> > +}
> > +
> > +int acpi_register_region(acpi_adr_space_type space_id,
> > +			 acpi_adr_space_handler handler,
> > +			 acpi_adr_space_setup setup, void *context) {
> > +	int res;
> > +	struct acpi_region *rgn;
> > +
> > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > +		return -EINVAL;
> > +
> > +	rgn = &acpi_regions[space_id];
> > +	if (!acpi_region_managed(rgn))
> > +		return -EINVAL;
> > +
> > +	res = __acpi_install_region(rgn, space_id);
> > +	if (res)
> > +		return res;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return -EBUSY;
> > +	}
> > +
> > +	rgn->handler = handler;
> > +	rgn->setup = setup;
> > +	rgn->context = context;
> > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > +	atomic_set(&rgn->refcnt, 1);
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	pr_info("Region %d registered\n", space_id);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > +
> > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > +	struct acpi_region *rgn;
> > +
> > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > +		return;
> > +
> > +	rgn = &acpi_regions[space_id];
> > +	if (!acpi_region_managed(rgn))
> > +		return;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return;
> > +	}
> > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return;
> 
> What about
> 
> 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> 		mutex_unlock(&acpi_mutex_region);
> 		return;
> 	}
> 

OK.

> > +	}
> > +
> > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > +	rgn->handler = NULL;
> > +	rgn->setup = NULL;
> > +	rgn->context = NULL;
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	while (atomic_read(&rgn->refcnt) > 1)
> > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> 
> Wouldn't it be better to use a wait queue here?

Yes, I'll try.

> 
> > +	atomic_dec(&rgn->refcnt);
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> ACPI_REGION_UNREGISTERING);
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	pr_info("Region %d unregistered\n", space_id); }
> > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > a2c2fbb..15fad0d 100644
> > --- a/include/acpi/acpi_bus.h
> > +++ b/include/acpi/acpi_bus.h
> > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > *bus) { return 0; }
> >
> >  #endif				/* CONFIG_ACPI */
> >
> > +int acpi_register_region(acpi_adr_space_type space_id,
> > +			 acpi_adr_space_handler handler,
> > +			 acpi_adr_space_setup setup, void *context); void
> > +acpi_unregister_region(acpi_adr_space_type space_id);
> > +
> >  #endif /*__ACPI_BUS_H__*/
> 
> Thanks,
> Rafael

Thanks
-Lv

> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-26  0:47         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  0:47 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 13669 bytes --]



> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 4:27 AM
> 
> On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > This patch adds reference couting for ACPI operation region handlers
> > to fix races caused by the ACPICA address space callback invocations.
> >
> > ACPICA address space callback invocation is not suitable for Linux
> > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > the address space callbacks by invoking them under a module safe
> environment.
> > The IPMI address space handler is also upgraded in this patch.
> > The acpi_unregister_region() is designed to meet the following
> > requirements:
> > 1. It acts as a barrier for operation region callbacks - no callback will
> >    happen after acpi_unregister_region().
> > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> >    functions.
> > Using reference counting rather than module referencing allows such
> > benefits to be achieved even when acpi_unregister_region() is called
> > in the environments other than module->exit().
> > The header file of include/acpi/acpi_bus.h should contain the
> > declarations that have references to some ACPICA defined types.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |   16 ++--
> >  drivers/acpi/osl.c       |  224
> ++++++++++++++++++++++++++++++++++++++++++++++
> >  include/acpi/acpi_bus.h  |    5 ++
> >  3 files changed, 235 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 5f8f495..2a09156 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -539,20 +539,18 @@ out_ref:
> >  static int __init acpi_ipmi_init(void)  {
> >  	int result = 0;
> > -	acpi_status status;
> >
> >  	if (acpi_disabled)
> >  		return result;
> >
> >  	mutex_init(&driver_data.ipmi_lock);
> >
> > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > -						    ACPI_ADR_SPACE_IPMI,
> > -						    &acpi_ipmi_space_handler,
> > -						    NULL, NULL);
> > -	if (ACPI_FAILURE(status)) {
> > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > +				      &acpi_ipmi_space_handler,
> > +				      NULL, NULL);
> > +	if (result) {
> >  		pr_warn("Can't register IPMI opregion space handle\n");
> > -		return -EINVAL;
> > +		return result;
> >  	}
> >
> >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> >  	}
> >  	mutex_unlock(&driver_data.ipmi_lock);
> >
> > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > -					  ACPI_ADR_SPACE_IPMI,
> > -					  &acpi_ipmi_space_handler);
> > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> >  }
> >
> >  module_init(acpi_ipmi_init);
> > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > 6ab2c35..8398e51 100644
> > --- a/drivers/acpi/osl.c
> > +++ b/drivers/acpi/osl.c
> > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;  static
> > struct workqueue_struct *kacpi_notify_wq;  static struct
> > workqueue_struct *kacpi_hotplug_wq;
> >
> > +struct acpi_region {
> > +	unsigned long flags;
> > +#define ACPI_REGION_DEFAULT		0x01
> > +#define ACPI_REGION_INSTALLED		0x02
> > +#define ACPI_REGION_REGISTERED		0x04
> > +#define ACPI_REGION_UNREGISTERING	0x08
> > +#define ACPI_REGION_INSTALLING		0x10
> 
> What about (1UL << 1), (1UL << 2) etc.?
> 
> Also please remove the #defines out of the struct definition.

OK.

> 
> > +	/*
> > +	 * NOTE: Upgrading All Region Handlers
> > +	 * This flag is only used during the period where not all of the
> > +	 * region handers are upgraded to the new interfaces.
> > +	 */
> > +#define ACPI_REGION_MANAGED		0x80
> > +	acpi_adr_space_handler handler;
> > +	acpi_adr_space_setup setup;
> > +	void *context;
> > +	/* Invoking references */
> > +	atomic_t refcnt;
> 
> Actually, why don't you use krefs?

If you take a look at other piece of my codes, you'll find there are two reasons:

1. I'm using while (atomic_read() > 1) to implement the objects' flushing and there is no kref API to do so.
  I just think it is not suitable for me to introduce such an API into kref.h and start another argument around kref designs in this bug fix patch. :-)
  I'll start a discussion about kref design using another thread.
2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind of atomic_t coding style.
  If atomic_t is changed to struct kref, I will need to implement two API, __ipmi_dev_release() to take a struct kref as parameter and call ipmi_dev_release inside it.
  By not using kref, I needn't write codes to implement such API.

> 
> > +};
> > +
> > +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> = {
> > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > +		.flags = ACPI_REGION_DEFAULT,
> > +	},
> > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > +		.flags = ACPI_REGION_DEFAULT,
> > +	},
> > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > +		.flags = ACPI_REGION_DEFAULT,
> > +	},
> > +	[ACPI_ADR_SPACE_IPMI] = {
> > +		.flags = ACPI_REGION_MANAGED,
> > +	},
> > +};
> > +static DEFINE_MUTEX(acpi_mutex_region);
> > +
> >  /*
> >   * This list of permanent mappings is for memory that may be accessed
> from
> >   * interrupt context, where we can't do the ioremap().
> > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> u32 type, void *context,
> >  		kfree(hp_work);
> >  }
> >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > +
> > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > +	/*
> > +	 * NOTE: Default and Managed
> > +	 * We only need to avoid region management on the regions managed
> > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need additional
> > +	 * check as many operation region handlers are not upgraded, so
> > +	 * only those known to be safe are managed (ACPI_REGION_MANAGED).
> > +	 */
> > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > +
> > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > +
> > +static acpi_status
> > +acpi_region_default_handler(u32 function,
> > +			    acpi_physical_address address,
> > +			    u32 bit_width, u64 *value,
> > +			    void *handler_context, void *region_context) {
> > +	acpi_adr_space_handler handler;
> > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > +	void *context;
> > +	acpi_status status = AE_NOT_EXIST;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return status;
> > +	}
> > +
> > +	atomic_inc(&rgn->refcnt);
> > +	handler = rgn->handler;
> > +	context = rgn->context;
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	status = handler(function, address, bit_width, value, context,
> > +			 region_context);
> 
> Why don't we call the handler under the mutex?
> 
> What exactly prevents context from becoming NULL before the call above?

It's a kind of programming style related concern.
IMO, using locks around callback function is a buggy programming style that could lead to dead locks.
Let me explain this using an example.

Object A exports a register/unregister API for other objects.
Object B calls A's register/unregister API to register/unregister B's callback.
It's likely that object B will hold lock_of_B around unregister/register when object B is destroyed/created, the lock_of_B is likely also used inside the callback.
So when object A holds the lock_of_A around the callback invocation, it leads to dead lock since:
1. the locking order for the register/unregister side will be: lock(lock_of_B), lock(lock_of_A)
2. the locking order for the callback side will be: lock(lock_of_A), lock(lock_of_B)
They are in the reversed order!

IMO, Linux may need to introduce __callback, __api as decelerators for the functions, and use sparse to enforce this rule, sparse knows if a callback is invoked under some locks.

In the case of ACPICA space_handlers, as you may know, when an ACPI operation region handler is invoked, there will be no lock held inside ACPICA (interpreter lock must be freed before executing operation region handlers).
So the likelihood of the dead lock is pretty much high here!

> 
> > +	atomic_dec(&rgn->refcnt);
> > +
> > +	return status;
> > +}
> > +
> > +static acpi_status
> > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > +			  void *handler_context, void **region_context) {
> > +	acpi_adr_space_setup setup;
> > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > +	void *context;
> > +	acpi_status status = AE_OK;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return status;
> > +	}
> > +
> > +	atomic_inc(&rgn->refcnt);
> > +	setup = rgn->setup;
> > +	context = rgn->context;
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	status = setup(handle, function, context, region_context);
> 
> Can setup drop rgn->refcnt ?

The reason is same as the handler, as a setup is also a callback.

> 
> > +	atomic_dec(&rgn->refcnt);
> > +
> > +	return status;
> > +}
> > +
> > +static int __acpi_install_region(struct acpi_region *rgn,
> > +				 acpi_adr_space_type space_id)
> > +{
> > +	int res = 0;
> > +	acpi_status status;
> > +	int installing = 0;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > +		goto out_lock;
> > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > +		res = -EBUSY;
> > +		goto out_lock;
> > +	}
> > +
> > +	installing = 1;
> > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> space_id,
> > +						    acpi_region_default_handler,
> > +						    acpi_region_default_setup,
> > +						    rgn);
> > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > +	if (ACPI_FAILURE(status))
> > +		res = -EINVAL;
> > +	else
> > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > +
> > +out_lock:
> > +	mutex_unlock(&acpi_mutex_region);
> > +	if (installing) {
> > +		if (res)
> > +			pr_err("Failed to install region %d\n", space_id);
> > +		else
> > +			pr_info("Region %d installed\n", space_id);
> > +	}
> > +	return res;
> > +}
> > +
> > +int acpi_register_region(acpi_adr_space_type space_id,
> > +			 acpi_adr_space_handler handler,
> > +			 acpi_adr_space_setup setup, void *context) {
> > +	int res;
> > +	struct acpi_region *rgn;
> > +
> > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > +		return -EINVAL;
> > +
> > +	rgn = &acpi_regions[space_id];
> > +	if (!acpi_region_managed(rgn))
> > +		return -EINVAL;
> > +
> > +	res = __acpi_install_region(rgn, space_id);
> > +	if (res)
> > +		return res;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return -EBUSY;
> > +	}
> > +
> > +	rgn->handler = handler;
> > +	rgn->setup = setup;
> > +	rgn->context = context;
> > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > +	atomic_set(&rgn->refcnt, 1);
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	pr_info("Region %d registered\n", space_id);
> > +
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > +
> > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > +	struct acpi_region *rgn;
> > +
> > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > +		return;
> > +
> > +	rgn = &acpi_regions[space_id];
> > +	if (!acpi_region_managed(rgn))
> > +		return;
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return;
> > +	}
> > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > +		mutex_unlock(&acpi_mutex_region);
> > +		return;
> 
> What about
> 
> 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> 		mutex_unlock(&acpi_mutex_region);
> 		return;
> 	}
> 

OK.

> > +	}
> > +
> > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > +	rgn->handler = NULL;
> > +	rgn->setup = NULL;
> > +	rgn->context = NULL;
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	while (atomic_read(&rgn->refcnt) > 1)
> > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> 
> Wouldn't it be better to use a wait queue here?

Yes, I'll try.

> 
> > +	atomic_dec(&rgn->refcnt);
> > +
> > +	mutex_lock(&acpi_mutex_region);
> > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> ACPI_REGION_UNREGISTERING);
> > +	mutex_unlock(&acpi_mutex_region);
> > +
> > +	pr_info("Region %d unregistered\n", space_id); }
> > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > a2c2fbb..15fad0d 100644
> > --- a/include/acpi/acpi_bus.h
> > +++ b/include/acpi/acpi_bus.h
> > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > *bus) { return 0; }
> >
> >  #endif				/* CONFIG_ACPI */
> >
> > +int acpi_register_region(acpi_adr_space_type space_id,
> > +			 acpi_adr_space_handler handler,
> > +			 acpi_adr_space_setup setup, void *context); void
> > +acpi_unregister_region(acpi_adr_space_type space_id);
> > +
> >  #endif /*__ACPI_BUS_H__*/
> 
> Thanks,
> Rafael

Thanks
-Lv

> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-26  0:16               ` Zheng, Lv
@ 2013-07-26  0:48                 ` Corey Minyard
  -1 siblings, 0 replies; 99+ messages in thread
From: Corey Minyard @ 2013-07-26  0:48 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Brown, Len, Wysocki, Rafael J, linux-kernel, Rafael J. Wysocki,
	linux-acpi, openipmi-developer, Zhao, Yakui

On 07/25/2013 07:16 PM, Zheng, Lv wrote:
>>
>> If I understand this correctly, the problem would be if:
>>
>> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>>                                           IPMI_TIMEOUT);
>>
>> returns on a timeout, then checks msg_done and races with something setting
>> msg_done.  If that is the case, you would need the smp_rmb() before checking
>> msg_done.
>>
>> However, the timeout above is unnecessary.  You are using
>> ipmi_request_settime(), so you can set the timeout when the IPMI command
>> fails and returns a failure message.  The driver guarantees a return message
>> for each request.  Just remove the timeout from the completion, set the
>> timeout and retries in the ipmi request, and the completion should handle the
>> barrier issues.
> It's just difficult for me to determine retry count and timeout value, maybe retry=0, timeout=IPMI_TIMEOUT is OK.
> The code of the timeout completion is already there, I think the quick fix code should not introduce this logic.
> I'll add a new patch to apply your comment.

Since it is a local BMC, I doubt a retry is required.  That is probably 
fine.  Or you could set retry=1 and timeout=IPMI_TIMEOUT/2 if you wanted 
to be more sure, but I doubt it would make a difference.  The only time 
you really need to worry about retries is if you are resetting the BMC 
or it is being overloaded.

>
>> Plus, from a quick glance at the code, it doesn't look like it will properly handle a
>> situation where the timeout occurs and is handled then the response comes in
>> later.
> PATCH 07 fixed this issue.
> Here we just need the smp_rmb() or holding tx_msg_lock() around the acpi_format_ipmi_response().

If you apply the fix like I suggest, then the race goes away.  If 
there's no timeout and it just waits for the completion, things get a 
lot simpler.

>
> Thanks for commenting.

No problem, thanks for working on this.

-corey

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-26  0:48                 ` Corey Minyard
  0 siblings, 0 replies; 99+ messages in thread
From: Corey Minyard @ 2013-07-26  0:48 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Rafael J. Wysocki, Wysocki, Rafael J, Brown, Len, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On 07/25/2013 07:16 PM, Zheng, Lv wrote:
>>
>> If I understand this correctly, the problem would be if:
>>
>> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
>>                                           IPMI_TIMEOUT);
>>
>> returns on a timeout, then checks msg_done and races with something setting
>> msg_done.  If that is the case, you would need the smp_rmb() before checking
>> msg_done.
>>
>> However, the timeout above is unnecessary.  You are using
>> ipmi_request_settime(), so you can set the timeout when the IPMI command
>> fails and returns a failure message.  The driver guarantees a return message
>> for each request.  Just remove the timeout from the completion, set the
>> timeout and retries in the ipmi request, and the completion should handle the
>> barrier issues.
> It's just difficult for me to determine retry count and timeout value, maybe retry=0, timeout=IPMI_TIMEOUT is OK.
> The code of the timeout completion is already there, I think the quick fix code should not introduce this logic.
> I'll add a new patch to apply your comment.

Since it is a local BMC, I doubt a retry is required.  That is probably 
fine.  Or you could set retry=1 and timeout=IPMI_TIMEOUT/2 if you wanted 
to be more sure, but I doubt it would make a difference.  The only time 
you really need to worry about retries is if you are resetting the BMC 
or it is being overloaded.

>
>> Plus, from a quick glance at the code, it doesn't look like it will properly handle a
>> situation where the timeout occurs and is handled then the response comes in
>> later.
> PATCH 07 fixed this issue.
> Here we just need the smp_rmb() or holding tx_msg_lock() around the acpi_format_ipmi_response().

If you apply the fix like I suggest, then the race goes away.  If 
there's no timeout and it just waits for the completion, things get a 
lot simpler.

>
> Thanks for commenting.

No problem, thanks for working on this.

-corey

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
  2013-07-25 21:59     ` Rafael J. Wysocki
@ 2013-07-26  1:17         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 5:59 AM
> 
> On Tuesday, July 23, 2013 04:09:26 PM Lv Zheng wrote:
> > This patch uses reference counting to fix the race caused by the
> > unprotected ACPI IPMI user.
> >
> > As the acpi_ipmi_device->user_interface check in
> > acpi_ipmi_space_handler() can happen before setting user_interface to
> > NULL and codes after the check in acpi_ipmi_space_handler() can happen
> > after user_interface becoming NULL, then the on-going
> > acpi_ipmi_space_handler() still can pass an invalid
> > acpi_ipmi_device->user_interface to ipmi_request_settime().  Such race
> > condition is not allowed by the IPMI layer's API design as crash will happen in
> ipmi_request_settime().
> > In IPMI layer, smi_gone()/new_smi() callbacks are protected by
> > smi_watchers_mutex, thus their invocations are serialized.  But as a
> > new smi can re-use the freed intf_num, it requires that the callback
> > implementation must not use intf_num as an identification mean or it
> > must ensure all references to the previous smi are all dropped before
> > exiting
> > smi_gone() callback.  In case of acpi_ipmi module, this means
> > ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are
> > completed before exiting ipmi_flush_tx_msg().
> >
> > This patch follows ipmi_devintf.c design:
> > 1. Invoking ipmi_destroy_user() after the reference count of
> >    acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
> >    rule on ipmi_destroy_user() and ipmi_request_settime().
> > 2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
> >    this acpi_ipmi_device are all freed, this can be used to implement the
> >    new flushing mechanism.  Note complete() must be retried so that the
> >    on-going tx_msg won't block flushing at the point to add tx_msg into
> >    tx_msg_list where reference of acpi_ipmi_device is held.  This matches
> >    the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
> > 3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
> >    the list so that no new tx_msg can be created after entering flushing
> >    process.
> > 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.
> >
> > The forthcoming IPMI operation region handler installation changes
> > also requires acpi_ipmi_device be handled in the reference counting style.
> >
> > Authorship is also updated due to this design change.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |  249
> > +++++++++++++++++++++++++++-------------------
> >  1 file changed, 149 insertions(+), 100 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 527ee43..cbf25e0 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -1,8 +1,9 @@
> >  /*
> >   *  acpi_ipmi.c - ACPI IPMI opregion
> >   *
> > - *  Copyright (C) 2010 Intel Corporation
> > - *  Copyright (C) 2010 Zhao Yakui <yakui.zhao@intel.com>
> > + *  Copyright (C) 2010, 2013 Intel Corporation
> > + *    Author: Zhao Yakui <yakui.zhao@intel.com>
> > + *            Lv Zheng <lv.zheng@intel.com>
> >   *
> >   *
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~
> >   *
> > @@ -67,6 +68,7 @@ struct acpi_ipmi_device {
> >  	long curr_msgid;
> >  	unsigned long flags;
> >  	struct ipmi_smi_info smi_data;
> > +	atomic_t refcnt;
> 
> Can you use a kref instead?

Please see my concerns in another email.

> 
> >  };
> >
> >  struct ipmi_driver_data {
> > @@ -107,8 +109,8 @@ struct acpi_ipmi_buffer {  static void
> > ipmi_register_bmc(int iface, struct device *dev);  static void
> > ipmi_bmc_gone(int iface);  static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data); -static void
> > acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device); -static
> > void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
> > +static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
> > +static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
> >
> >  static struct ipmi_driver_data driver_data = {
> >  	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
> > @@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
> >  	},
> >  };
> >
> > +static struct acpi_ipmi_device *
> > +ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > +handle) {
> > +	struct acpi_ipmi_device *ipmi_device;
> > +	int err;
> > +	ipmi_user_t user;
> > +
> > +	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> > +	if (!ipmi_device)
> > +		return NULL;
> > +
> > +	atomic_set(&ipmi_device->refcnt, 1);
> > +	INIT_LIST_HEAD(&ipmi_device->head);
> > +	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> > +	spin_lock_init(&ipmi_device->tx_msg_lock);
> > +
> > +	ipmi_device->handle = handle;
> > +	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > +	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> ipmi_smi_info));
> > +	ipmi_device->ipmi_ifnum = iface;
> > +
> > +	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > +			       ipmi_device, &user);
> > +	if (err) {
> > +		put_device(smi_data->dev);
> > +		kfree(ipmi_device);
> > +		return NULL;
> > +	}
> > +	ipmi_device->user_interface = user;
> > +	ipmi_install_space_handler(ipmi_device);
> > +
> > +	return ipmi_device;
> > +}
> > +
> > +static struct acpi_ipmi_device *
> > +acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device) {
> > +	if (ipmi_device)
> > +		atomic_inc(&ipmi_device->refcnt);
> > +	return ipmi_device;
> > +}
> > +
> > +static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device) {
> > +	ipmi_remove_space_handler(ipmi_device);
> > +	ipmi_destroy_user(ipmi_device->user_interface);
> > +	put_device(ipmi_device->smi_data.dev);
> > +	kfree(ipmi_device);
> > +}
> > +
> > +static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device) {
> > +	if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
> > +		ipmi_dev_release(ipmi_device);
> > +}
> > +
> > +static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
> > +{
> > +	int dev_found = 0;
> > +	struct acpi_ipmi_device *ipmi_device;
> > +
> 
> Why don't you do
> 
> 	struct acpi_ipmi_device *ipmi_device, *ret = NULL;
> 
> and then ->
> 
> > +	mutex_lock(&driver_data.ipmi_lock);
> > +	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> > +		if (ipmi_device->ipmi_ifnum == iface) {
> 
> ->			ret = ipmi_device; ->
> 
> > +			dev_found = 1;
> > +			acpi_ipmi_dev_get(ipmi_device);
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&driver_data.ipmi_lock);
> > +
> > +	return dev_found ? ipmi_device : NULL;
> 
> ->	return ret;

OK.

> 
> > +}
> > +
> >  static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct
> > acpi_ipmi_device *ipmi)  {
> >  	struct acpi_ipmi_msg *ipmi_msg;
> > @@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct
> > acpi_ipmi_msg *msg,  static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)  {
> >  	struct acpi_ipmi_msg *tx_msg, *temp;
> > -	int count = HZ / 10;
> > -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >  	unsigned long flags;
> >
> > -	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > -	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > -		/* wake up the sleep thread on the Tx msg */
> > -		complete(&tx_msg->tx_complete);
> > -	}
> > -	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > -
> > -	/* wait for about 100ms to flush the tx message list */
> > -	while (count--) {
> > -		if (list_empty(&ipmi->tx_msg_list))
> > -			break;
> > -		schedule_timeout(1);
> > +	/*
> > +	 * NOTE: Synchronous Flushing
> > +	 * Wait until refnct dropping to 1 - no other users unless this
> > +	 * context.  This function should always be called before
> > +	 * acpi_ipmi_device destruction.
> > +	 */
> > +	while (atomic_read(&ipmi->refcnt) > 1) {
> 
> Isn't this racy?  What if we see that the refcount is 1 and break the loop, but
> someone else bumps up the refcount at the same time?

No, it's not racy.
Flushing codes here is invoked after acpi_ipmi_device disappearing from the object managers.
Please look at the ipmi_bmc_gone() and acpi_ipmi_exit().
The ipmi_flush_tx_msg() will only be called after a "list_del()".
There will be no new transfers created in the acpi_ipmi_space_handler() as acpi_ipmi_get_targeted_smi() will return NULL after the "list_del()".

So there are no chances that it reaches to 1 and go back again as the refcount will only increases from 1 to > 1 unless it is still in an object managers.
The trick here is to drop all of the object managers' reference and only hold the "call chain" reference here (thus it is 1) in the ipmi_bmc_gone() and acpi_ipmi_exit().
In case of this patch, the object reference count is converted into "call chain" reference count in the ipmi_bmc_gone() and acpi_ipmi_exit().
The waiting codes here then can wait the reference count dropping to 1 which indicates all on-going transfer references are also get dropped.

> 
> > +		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > +		list_for_each_entry_safe(tx_msg, temp,
> > +					 &ipmi->tx_msg_list, head) {
> > +			/* wake up the sleep thread on the Tx msg */
> > +			complete(&tx_msg->tx_complete);
> > +		}
> > +		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> >  	}
> > -	if (!list_empty(&ipmi->tx_msg_list))
> > -		dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
> >  }
> >
> >  static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> > *user_msg_data) @@ -304,22 +379,26 @@ static void
> > ipmi_register_bmc(int iface, struct device *dev)  {
> >  	struct acpi_ipmi_device *ipmi_device, *temp;
> >  	struct pnp_dev *pnp_dev;
> > -	ipmi_user_t		user;
> >  	int err;
> >  	struct ipmi_smi_info smi_data;
> >  	acpi_handle handle;
> >
> >  	err = ipmi_get_smi_info(iface, &smi_data);
> > -
> >  	if (err)
> >  		return;
> >
> > -	if (smi_data.addr_src != SI_ACPI) {
> > -		put_device(smi_data.dev);
> > -		return;
> > -	}
> > -
> > +	if (smi_data.addr_src != SI_ACPI)
> > +		goto err_ref;
> >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> > +	if (!handle)
> > +		goto err_ref;
> > +	pnp_dev = to_pnp_dev(smi_data.dev);
> > +
> > +	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > +	if (!ipmi_device) {
> > +		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > +		goto err_ref;
> > +	}
> >
> >  	mutex_lock(&driver_data.ipmi_lock);
> >  	list_for_each_entry(temp, &driver_data.ipmi_devices, head) { @@
> > -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device
> *dev)
> >  		 * to the device list, don't add it again.
> >  		 */
> >  		if (temp->handle == handle)
> > -			goto out;
> > +			goto err_lock;
> >  	}
> >
> > -	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> > -
> > -	if (!ipmi_device)
> > -		goto out;
> > -
> > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > -	ipmi_device->handle = handle;
> > -	ipmi_device->pnp_dev = pnp_dev;
> > -
> > -	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > -					ipmi_device, &user);
> > -	if (err) {
> > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > -		kfree(ipmi_device);
> > -		goto out;
> > -	}
> > -	acpi_add_ipmi_device(ipmi_device);
> > -	ipmi_device->user_interface = user;
> > -	ipmi_device->ipmi_ifnum = iface;
> > +	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> >  	mutex_unlock(&driver_data.ipmi_lock);
> > -	memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct
> ipmi_smi_info));
> > +	put_device(smi_data.dev);
> >  	return;
> >
> > -out:
> > +err_lock:
> >  	mutex_unlock(&driver_data.ipmi_lock);
> > +	ipmi_dev_release(ipmi_device);
> > +err_ref:
> >  	put_device(smi_data.dev);
> >  	return;
> >  }
> >
> >  static void ipmi_bmc_gone(int iface)
> >  {
> > -	struct acpi_ipmi_device *ipmi_device, *temp;
> > +	int dev_found = 0;
> > +	struct acpi_ipmi_device *ipmi_device;
> >
> >  	mutex_lock(&driver_data.ipmi_lock);
> > -	list_for_each_entry_safe(ipmi_device, temp,
> > -				&driver_data.ipmi_devices, head) {
> > -		if (ipmi_device->ipmi_ifnum != iface)
> > -			continue;
> > -
> > -		acpi_remove_ipmi_device(ipmi_device);
> > -		put_device(ipmi_device->smi_data.dev);
> > -		kfree(ipmi_device);
> > -		break;
> > +	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> > +		if (ipmi_device->ipmi_ifnum == iface) {
> > +			dev_found = 1;
> 
> You can do the list_del() here, because you're under the mutex, so others won't
> see the list in an inconsistens state and you're about to break anyway.

I'm trying to improve the code maintainability (hence the software internal quality) here for the reviewers.
If we introduce a list_del()/break inside a list_for_each_entry(), then it is pretty much likely that the list_for_each_entry() does not appear in a future patch that deletes the "break".
And reviewers could not detect such bug.
The coding style like what I'm showing here can avoid such issue.
I was thinking maintainers would be happy with such codes - it can prevent many unhappy small mistakes from happening.

Thanks for commenting.

Best regards
-Lv

> 
> > +			break;
> > +		}
> >  	}
> > +	if (dev_found)
> > +		list_del(&ipmi_device->head);
> >  	mutex_unlock(&driver_data.ipmi_lock);
> > +
> > +	if (dev_found) {
> > +		ipmi_flush_tx_msg(ipmi_device);
> > +		acpi_ipmi_dev_put(ipmi_device);
> > +	}
> >  }
> >
> >  /*
> > ----------------------------------------------------------------------
> > ---- @@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function,
> > acpi_physical_address address,
> >  			void *handler_context, void *region_context)  {
> >  	struct acpi_ipmi_msg *tx_msg;
> > -	struct acpi_ipmi_device *ipmi_device = handler_context;
> > +	int iface = (long)handler_context;
> > +	struct acpi_ipmi_device *ipmi_device;
> >  	int err, rem_time;
> >  	acpi_status status;
> >  	unsigned long flags;
> > @@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> >  	if ((function & ACPI_IO_MASK) == ACPI_READ)
> >  		return AE_TYPE;
> >
> > -	if (!ipmi_device->user_interface)
> > +	ipmi_device = acpi_ipmi_get_targeted_smi(iface);
> > +	if (!ipmi_device)
> >  		return AE_NOT_EXIST;
> >
> >  	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> > -	if (!tx_msg)
> > -		return AE_NO_MEMORY;
> > +	if (!tx_msg) {
> > +		status = AE_NO_MEMORY;
> > +		goto out_ref;
> > +	}
> >
> >  	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> >  		status = AE_TYPE;
> > @@ -449,6 +520,8 @@ out_list:
> >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >  out_msg:
> >  	kfree(tx_msg);
> > +out_ref:
> > +	acpi_ipmi_dev_put(ipmi_device);
> >  	return status;
> >  }
> >
> > @@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct
> acpi_ipmi_device *ipmi)
> >  	status = acpi_install_address_space_handler(ipmi->handle,
> >  						    ACPI_ADR_SPACE_IPMI,
> >  						    &acpi_ipmi_space_handler,
> > -						    NULL, ipmi);
> > +						    NULL, (void *)((long)ipmi->ipmi_ifnum));
> >  	if (ACPI_FAILURE(status)) {
> >  		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >  		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
> > @@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct
> acpi_ipmi_device *ipmi)
> >  	return 0;
> >  }
> >
> > -static void acpi_add_ipmi_device(struct acpi_ipmi_device
> > *ipmi_device) -{
> > -
> > -	INIT_LIST_HEAD(&ipmi_device->head);
> > -
> > -	spin_lock_init(&ipmi_device->tx_msg_lock);
> > -	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> > -	ipmi_install_space_handler(ipmi_device);
> > -
> > -	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> > -}
> > -
> > -static void acpi_remove_ipmi_device(struct acpi_ipmi_device
> > *ipmi_device) -{
> > -	/*
> > -	 * If the IPMI user interface is created, it should be
> > -	 * destroyed.
> > -	 */
> > -	if (ipmi_device->user_interface) {
> > -		ipmi_destroy_user(ipmi_device->user_interface);
> > -		ipmi_device->user_interface = NULL;
> > -	}
> > -	/* flush the Tx_msg list */
> > -	if (!list_empty(&ipmi_device->tx_msg_list))
> > -		ipmi_flush_tx_msg(ipmi_device);
> > -
> > -	list_del(&ipmi_device->head);
> > -	ipmi_remove_space_handler(ipmi_device);
> > -}
> > -
> >  static int __init acpi_ipmi_init(void)  {
> >  	int result = 0;
> > @@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)
> >
> >  static void __exit acpi_ipmi_exit(void)  {
> > -	struct acpi_ipmi_device *ipmi_device, *temp;
> > +	struct acpi_ipmi_device *ipmi_device;
> >
> >  	if (acpi_disabled)
> >  		return;
> > @@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
> >  	 * handler and free it.
> >  	 */
> >  	mutex_lock(&driver_data.ipmi_lock);
> > -	list_for_each_entry_safe(ipmi_device, temp,
> > -				&driver_data.ipmi_devices, head) {
> > -		acpi_remove_ipmi_device(ipmi_device);
> > -		put_device(ipmi_device->smi_data.dev);
> > -		kfree(ipmi_device);
> > +	while (!list_empty(&driver_data.ipmi_devices)) {
> > +		ipmi_device = list_first_entry(&driver_data.ipmi_devices,
> > +					       struct acpi_ipmi_device,
> > +					       head);
> > +		list_del(&ipmi_device->head);
> > +		mutex_unlock(&driver_data.ipmi_lock);
> > +
> > +		ipmi_flush_tx_msg(ipmi_device);
> > +		acpi_ipmi_dev_put(ipmi_device);
> > +
> > +		mutex_lock(&driver_data.ipmi_lock);
> >  	}
> >  	mutex_unlock(&driver_data.ipmi_lock);
> >  }
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
@ 2013-07-26  1:17         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:17 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 17859 bytes --]

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 5:59 AM
> 
> On Tuesday, July 23, 2013 04:09:26 PM Lv Zheng wrote:
> > This patch uses reference counting to fix the race caused by the
> > unprotected ACPI IPMI user.
> >
> > As the acpi_ipmi_device->user_interface check in
> > acpi_ipmi_space_handler() can happen before setting user_interface to
> > NULL and codes after the check in acpi_ipmi_space_handler() can happen
> > after user_interface becoming NULL, then the on-going
> > acpi_ipmi_space_handler() still can pass an invalid
> > acpi_ipmi_device->user_interface to ipmi_request_settime().  Such race
> > condition is not allowed by the IPMI layer's API design as crash will happen in
> ipmi_request_settime().
> > In IPMI layer, smi_gone()/new_smi() callbacks are protected by
> > smi_watchers_mutex, thus their invocations are serialized.  But as a
> > new smi can re-use the freed intf_num, it requires that the callback
> > implementation must not use intf_num as an identification mean or it
> > must ensure all references to the previous smi are all dropped before
> > exiting
> > smi_gone() callback.  In case of acpi_ipmi module, this means
> > ipmi_flush_tx_msg() must ensure all on-going IPMI transfers are
> > completed before exiting ipmi_flush_tx_msg().
> >
> > This patch follows ipmi_devintf.c design:
> > 1. Invoking ipmi_destroy_user() after the reference count of
> >    acpi_ipmi_device dropping to 0, this matches IPMI layer's API calling
> >    rule on ipmi_destroy_user() and ipmi_request_settime().
> > 2. References of acpi_ipmi_device dropping to 1 means tx_msg related to
> >    this acpi_ipmi_device are all freed, this can be used to implement the
> >    new flushing mechanism.  Note complete() must be retried so that the
> >    on-going tx_msg won't block flushing at the point to add tx_msg into
> >    tx_msg_list where reference of acpi_ipmi_device is held.  This matches
> >    the IPMI layer's callback rule on smi_gone()/new_smi() serialization.
> > 3. ipmi_flush_tx_msg() is performed after deleting acpi_ipmi_device from
> >    the list so that no new tx_msg can be created after entering flushing
> >    process.
> > 4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.
> >
> > The forthcoming IPMI operation region handler installation changes
> > also requires acpi_ipmi_device be handled in the reference counting style.
> >
> > Authorship is also updated due to this design change.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |  249
> > +++++++++++++++++++++++++++-------------------
> >  1 file changed, 149 insertions(+), 100 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 527ee43..cbf25e0 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -1,8 +1,9 @@
> >  /*
> >   *  acpi_ipmi.c - ACPI IPMI opregion
> >   *
> > - *  Copyright (C) 2010 Intel Corporation
> > - *  Copyright (C) 2010 Zhao Yakui <yakui.zhao@intel.com>
> > + *  Copyright (C) 2010, 2013 Intel Corporation
> > + *    Author: Zhao Yakui <yakui.zhao@intel.com>
> > + *            Lv Zheng <lv.zheng@intel.com>
> >   *
> >   *
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~
> >   *
> > @@ -67,6 +68,7 @@ struct acpi_ipmi_device {
> >  	long curr_msgid;
> >  	unsigned long flags;
> >  	struct ipmi_smi_info smi_data;
> > +	atomic_t refcnt;
> 
> Can you use a kref instead?

Please see my concerns in another email.

> 
> >  };
> >
> >  struct ipmi_driver_data {
> > @@ -107,8 +109,8 @@ struct acpi_ipmi_buffer {  static void
> > ipmi_register_bmc(int iface, struct device *dev);  static void
> > ipmi_bmc_gone(int iface);  static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data); -static void
> > acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device); -static
> > void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
> > +static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
> > +static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
> >
> >  static struct ipmi_driver_data driver_data = {
> >  	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
> > @@ -122,6 +124,80 @@ static struct ipmi_driver_data driver_data = {
> >  	},
> >  };
> >
> > +static struct acpi_ipmi_device *
> > +ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > +handle) {
> > +	struct acpi_ipmi_device *ipmi_device;
> > +	int err;
> > +	ipmi_user_t user;
> > +
> > +	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> > +	if (!ipmi_device)
> > +		return NULL;
> > +
> > +	atomic_set(&ipmi_device->refcnt, 1);
> > +	INIT_LIST_HEAD(&ipmi_device->head);
> > +	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> > +	spin_lock_init(&ipmi_device->tx_msg_lock);
> > +
> > +	ipmi_device->handle = handle;
> > +	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > +	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> ipmi_smi_info));
> > +	ipmi_device->ipmi_ifnum = iface;
> > +
> > +	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > +			       ipmi_device, &user);
> > +	if (err) {
> > +		put_device(smi_data->dev);
> > +		kfree(ipmi_device);
> > +		return NULL;
> > +	}
> > +	ipmi_device->user_interface = user;
> > +	ipmi_install_space_handler(ipmi_device);
> > +
> > +	return ipmi_device;
> > +}
> > +
> > +static struct acpi_ipmi_device *
> > +acpi_ipmi_dev_get(struct acpi_ipmi_device *ipmi_device) {
> > +	if (ipmi_device)
> > +		atomic_inc(&ipmi_device->refcnt);
> > +	return ipmi_device;
> > +}
> > +
> > +static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device) {
> > +	ipmi_remove_space_handler(ipmi_device);
> > +	ipmi_destroy_user(ipmi_device->user_interface);
> > +	put_device(ipmi_device->smi_data.dev);
> > +	kfree(ipmi_device);
> > +}
> > +
> > +static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device) {
> > +	if (ipmi_device && atomic_dec_and_test(&ipmi_device->refcnt))
> > +		ipmi_dev_release(ipmi_device);
> > +}
> > +
> > +static struct acpi_ipmi_device *acpi_ipmi_get_targeted_smi(int iface)
> > +{
> > +	int dev_found = 0;
> > +	struct acpi_ipmi_device *ipmi_device;
> > +
> 
> Why don't you do
> 
> 	struct acpi_ipmi_device *ipmi_device, *ret = NULL;
> 
> and then ->
> 
> > +	mutex_lock(&driver_data.ipmi_lock);
> > +	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> > +		if (ipmi_device->ipmi_ifnum == iface) {
> 
> ->			ret = ipmi_device; ->
> 
> > +			dev_found = 1;
> > +			acpi_ipmi_dev_get(ipmi_device);
> > +			break;
> > +		}
> > +	}
> > +	mutex_unlock(&driver_data.ipmi_lock);
> > +
> > +	return dev_found ? ipmi_device : NULL;
> 
> ->	return ret;

OK.

> 
> > +}
> > +
> >  static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct
> > acpi_ipmi_device *ipmi)  {
> >  	struct acpi_ipmi_msg *ipmi_msg;
> > @@ -228,25 +304,24 @@ static void acpi_format_ipmi_response(struct
> > acpi_ipmi_msg *msg,  static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)  {
> >  	struct acpi_ipmi_msg *tx_msg, *temp;
> > -	int count = HZ / 10;
> > -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >  	unsigned long flags;
> >
> > -	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > -	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
> > -		/* wake up the sleep thread on the Tx msg */
> > -		complete(&tx_msg->tx_complete);
> > -	}
> > -	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > -
> > -	/* wait for about 100ms to flush the tx message list */
> > -	while (count--) {
> > -		if (list_empty(&ipmi->tx_msg_list))
> > -			break;
> > -		schedule_timeout(1);
> > +	/*
> > +	 * NOTE: Synchronous Flushing
> > +	 * Wait until refnct dropping to 1 - no other users unless this
> > +	 * context.  This function should always be called before
> > +	 * acpi_ipmi_device destruction.
> > +	 */
> > +	while (atomic_read(&ipmi->refcnt) > 1) {
> 
> Isn't this racy?  What if we see that the refcount is 1 and break the loop, but
> someone else bumps up the refcount at the same time?

No, it's not racy.
Flushing codes here is invoked after acpi_ipmi_device disappearing from the object managers.
Please look at the ipmi_bmc_gone() and acpi_ipmi_exit().
The ipmi_flush_tx_msg() will only be called after a "list_del()".
There will be no new transfers created in the acpi_ipmi_space_handler() as acpi_ipmi_get_targeted_smi() will return NULL after the "list_del()".

So there are no chances that it reaches to 1 and go back again as the refcount will only increases from 1 to > 1 unless it is still in an object managers.
The trick here is to drop all of the object managers' reference and only hold the "call chain" reference here (thus it is 1) in the ipmi_bmc_gone() and acpi_ipmi_exit().
In case of this patch, the object reference count is converted into "call chain" reference count in the ipmi_bmc_gone() and acpi_ipmi_exit().
The waiting codes here then can wait the reference count dropping to 1 which indicates all on-going transfer references are also get dropped.

> 
> > +		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > +		list_for_each_entry_safe(tx_msg, temp,
> > +					 &ipmi->tx_msg_list, head) {
> > +			/* wake up the sleep thread on the Tx msg */
> > +			complete(&tx_msg->tx_complete);
> > +		}
> > +		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> >  	}
> > -	if (!list_empty(&ipmi->tx_msg_list))
> > -		dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
> >  }
> >
> >  static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> > *user_msg_data) @@ -304,22 +379,26 @@ static void
> > ipmi_register_bmc(int iface, struct device *dev)  {
> >  	struct acpi_ipmi_device *ipmi_device, *temp;
> >  	struct pnp_dev *pnp_dev;
> > -	ipmi_user_t		user;
> >  	int err;
> >  	struct ipmi_smi_info smi_data;
> >  	acpi_handle handle;
> >
> >  	err = ipmi_get_smi_info(iface, &smi_data);
> > -
> >  	if (err)
> >  		return;
> >
> > -	if (smi_data.addr_src != SI_ACPI) {
> > -		put_device(smi_data.dev);
> > -		return;
> > -	}
> > -
> > +	if (smi_data.addr_src != SI_ACPI)
> > +		goto err_ref;
> >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> > +	if (!handle)
> > +		goto err_ref;
> > +	pnp_dev = to_pnp_dev(smi_data.dev);
> > +
> > +	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > +	if (!ipmi_device) {
> > +		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > +		goto err_ref;
> > +	}
> >
> >  	mutex_lock(&driver_data.ipmi_lock);
> >  	list_for_each_entry(temp, &driver_data.ipmi_devices, head) { @@
> > -328,54 +407,42 @@ static void ipmi_register_bmc(int iface, struct device
> *dev)
> >  		 * to the device list, don't add it again.
> >  		 */
> >  		if (temp->handle == handle)
> > -			goto out;
> > +			goto err_lock;
> >  	}
> >
> > -	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
> > -
> > -	if (!ipmi_device)
> > -		goto out;
> > -
> > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > -	ipmi_device->handle = handle;
> > -	ipmi_device->pnp_dev = pnp_dev;
> > -
> > -	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > -					ipmi_device, &user);
> > -	if (err) {
> > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > -		kfree(ipmi_device);
> > -		goto out;
> > -	}
> > -	acpi_add_ipmi_device(ipmi_device);
> > -	ipmi_device->user_interface = user;
> > -	ipmi_device->ipmi_ifnum = iface;
> > +	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> >  	mutex_unlock(&driver_data.ipmi_lock);
> > -	memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct
> ipmi_smi_info));
> > +	put_device(smi_data.dev);
> >  	return;
> >
> > -out:
> > +err_lock:
> >  	mutex_unlock(&driver_data.ipmi_lock);
> > +	ipmi_dev_release(ipmi_device);
> > +err_ref:
> >  	put_device(smi_data.dev);
> >  	return;
> >  }
> >
> >  static void ipmi_bmc_gone(int iface)
> >  {
> > -	struct acpi_ipmi_device *ipmi_device, *temp;
> > +	int dev_found = 0;
> > +	struct acpi_ipmi_device *ipmi_device;
> >
> >  	mutex_lock(&driver_data.ipmi_lock);
> > -	list_for_each_entry_safe(ipmi_device, temp,
> > -				&driver_data.ipmi_devices, head) {
> > -		if (ipmi_device->ipmi_ifnum != iface)
> > -			continue;
> > -
> > -		acpi_remove_ipmi_device(ipmi_device);
> > -		put_device(ipmi_device->smi_data.dev);
> > -		kfree(ipmi_device);
> > -		break;
> > +	list_for_each_entry(ipmi_device, &driver_data.ipmi_devices, head) {
> > +		if (ipmi_device->ipmi_ifnum == iface) {
> > +			dev_found = 1;
> 
> You can do the list_del() here, because you're under the mutex, so others won't
> see the list in an inconsistens state and you're about to break anyway.

I'm trying to improve the code maintainability (hence the software internal quality) here for the reviewers.
If we introduce a list_del()/break inside a list_for_each_entry(), then it is pretty much likely that the list_for_each_entry() does not appear in a future patch that deletes the "break".
And reviewers could not detect such bug.
The coding style like what I'm showing here can avoid such issue.
I was thinking maintainers would be happy with such codes - it can prevent many unhappy small mistakes from happening.

Thanks for commenting.

Best regards
-Lv

> 
> > +			break;
> > +		}
> >  	}
> > +	if (dev_found)
> > +		list_del(&ipmi_device->head);
> >  	mutex_unlock(&driver_data.ipmi_lock);
> > +
> > +	if (dev_found) {
> > +		ipmi_flush_tx_msg(ipmi_device);
> > +		acpi_ipmi_dev_put(ipmi_device);
> > +	}
> >  }
> >
> >  /*
> > ----------------------------------------------------------------------
> > ---- @@ -400,7 +467,8 @@ acpi_ipmi_space_handler(u32 function,
> > acpi_physical_address address,
> >  			void *handler_context, void *region_context)  {
> >  	struct acpi_ipmi_msg *tx_msg;
> > -	struct acpi_ipmi_device *ipmi_device = handler_context;
> > +	int iface = (long)handler_context;
> > +	struct acpi_ipmi_device *ipmi_device;
> >  	int err, rem_time;
> >  	acpi_status status;
> >  	unsigned long flags;
> > @@ -414,12 +482,15 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> >  	if ((function & ACPI_IO_MASK) == ACPI_READ)
> >  		return AE_TYPE;
> >
> > -	if (!ipmi_device->user_interface)
> > +	ipmi_device = acpi_ipmi_get_targeted_smi(iface);
> > +	if (!ipmi_device)
> >  		return AE_NOT_EXIST;
> >
> >  	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> > -	if (!tx_msg)
> > -		return AE_NO_MEMORY;
> > +	if (!tx_msg) {
> > +		status = AE_NO_MEMORY;
> > +		goto out_ref;
> > +	}
> >
> >  	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> >  		status = AE_TYPE;
> > @@ -449,6 +520,8 @@ out_list:
> >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >  out_msg:
> >  	kfree(tx_msg);
> > +out_ref:
> > +	acpi_ipmi_dev_put(ipmi_device);
> >  	return status;
> >  }
> >
> > @@ -473,7 +546,7 @@ static int ipmi_install_space_handler(struct
> acpi_ipmi_device *ipmi)
> >  	status = acpi_install_address_space_handler(ipmi->handle,
> >  						    ACPI_ADR_SPACE_IPMI,
> >  						    &acpi_ipmi_space_handler,
> > -						    NULL, ipmi);
> > +						    NULL, (void *)((long)ipmi->ipmi_ifnum));
> >  	if (ACPI_FAILURE(status)) {
> >  		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >  		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
> > @@ -484,36 +557,6 @@ static int ipmi_install_space_handler(struct
> acpi_ipmi_device *ipmi)
> >  	return 0;
> >  }
> >
> > -static void acpi_add_ipmi_device(struct acpi_ipmi_device
> > *ipmi_device) -{
> > -
> > -	INIT_LIST_HEAD(&ipmi_device->head);
> > -
> > -	spin_lock_init(&ipmi_device->tx_msg_lock);
> > -	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
> > -	ipmi_install_space_handler(ipmi_device);
> > -
> > -	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
> > -}
> > -
> > -static void acpi_remove_ipmi_device(struct acpi_ipmi_device
> > *ipmi_device) -{
> > -	/*
> > -	 * If the IPMI user interface is created, it should be
> > -	 * destroyed.
> > -	 */
> > -	if (ipmi_device->user_interface) {
> > -		ipmi_destroy_user(ipmi_device->user_interface);
> > -		ipmi_device->user_interface = NULL;
> > -	}
> > -	/* flush the Tx_msg list */
> > -	if (!list_empty(&ipmi_device->tx_msg_list))
> > -		ipmi_flush_tx_msg(ipmi_device);
> > -
> > -	list_del(&ipmi_device->head);
> > -	ipmi_remove_space_handler(ipmi_device);
> > -}
> > -
> >  static int __init acpi_ipmi_init(void)  {
> >  	int result = 0;
> > @@ -530,7 +573,7 @@ static int __init acpi_ipmi_init(void)
> >
> >  static void __exit acpi_ipmi_exit(void)  {
> > -	struct acpi_ipmi_device *ipmi_device, *temp;
> > +	struct acpi_ipmi_device *ipmi_device;
> >
> >  	if (acpi_disabled)
> >  		return;
> > @@ -544,11 +587,17 @@ static void __exit acpi_ipmi_exit(void)
> >  	 * handler and free it.
> >  	 */
> >  	mutex_lock(&driver_data.ipmi_lock);
> > -	list_for_each_entry_safe(ipmi_device, temp,
> > -				&driver_data.ipmi_devices, head) {
> > -		acpi_remove_ipmi_device(ipmi_device);
> > -		put_device(ipmi_device->smi_data.dev);
> > -		kfree(ipmi_device);
> > +	while (!list_empty(&driver_data.ipmi_devices)) {
> > +		ipmi_device = list_first_entry(&driver_data.ipmi_devices,
> > +					       struct acpi_ipmi_device,
> > +					       head);
> > +		list_del(&ipmi_device->head);
> > +		mutex_unlock(&driver_data.ipmi_lock);
> > +
> > +		ipmi_flush_tx_msg(ipmi_device);
> > +		acpi_ipmi_dev_put(ipmi_device);
> > +
> > +		mutex_lock(&driver_data.ipmi_lock);
> >  	}
> >  	mutex_unlock(&driver_data.ipmi_lock);
> >  }
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  2013-07-25 22:23     ` Rafael J. Wysocki
@ 2013-07-26  1:21         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> Sent: Friday, July 26, 2013 6:23 AM
> 
> On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> > This patch adds reference counting for ACPI IPMI transfers to tune the
> > locking granularity of tx_msg_lock.
> >
> > The acpi_ipmi_msg handling is re-designed using referece counting.
> > 1. tx_msg is always unlinked before complete(), so that:
> >    1.1. it is safe to put complete() out side of tx_msg_lock;
> >    1.2. complete() can only happen once, thus smp_wmb() is not required.
> > 2. Increasing the reference of tx_msg before calling
> >    ipmi_request_settime() and introducing tx_msg_lock protected
> >    ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
> >    tx_msg unlinking in the failure cases.
> > 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
> >    and freed in the contexts other than acpi_ipmi_space_handler().
> >
> > The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> > tuning:
> > 1. ipmi_lock is always leaf:
> >    irq_context: 0
> >    [ffffffff81a943f8] smi_watchers_mutex
> >    [ffffffffa06eca60] driver_data.ipmi_lock
> >    irq_context: 0
> >    [ffffffff82767b40] &buffer->mutex
> >    [ffffffffa00a6678] s_active#103
> >    [ffffffffa06eca60] driver_data.ipmi_lock
> > 2. without this patch applied, lock used by complete() is held after
> >    holding tx_msg_lock:
> >    irq_context: 0
> >    [ffffffff82767b40] &buffer->mutex
> >    [ffffffffa00a6678] s_active#103
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    [ffffffffa06eccf0] &x->wait#25
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    [ffffffffa06eccf0] &x->wait#25
> >    [ffffffff81e36620] &p->pi_lock
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    [ffffffffa06eccf0] &x->wait#25
> >    [ffffffff81e36620] &p->pi_lock
> >    [ffffffff81e5d0a8] &rq->lock
> > 3. with this patch applied, tx_msg_lock is always leaf:
> >    irq_context: 0
> >    [ffffffff82767b40] &buffer->mutex
> >    [ffffffffa00a66d8] s_active#107
> >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> >    irq_context: 1
> >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |  107
> +++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 77 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > index 2a09156..0ee1ea6 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
> >  	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
> >  	u8	rx_len;
> >  	struct acpi_ipmi_device *device;
> > +	atomic_t	refcnt;
> 
> Again: kref, please?

Please see the concerns in another email.

> 
> >  };
> >
> >  /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> > @@ -195,22 +196,47 @@ static struct acpi_ipmi_device
> *acpi_ipmi_get_selected_smi(void)
> >  	return ipmi_device;
> >  }
> >
> > -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device
> *ipmi)
> > +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
> >  {
> > +	struct acpi_ipmi_device *ipmi;
> >  	struct acpi_ipmi_msg *ipmi_msg;
> > -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >
> > +	ipmi = acpi_ipmi_get_selected_smi();
> > +	if (!ipmi)
> > +		return NULL;
> >  	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> > -	if (!ipmi_msg)	{
> > -		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> > +	if (!ipmi_msg) {
> > +		acpi_ipmi_dev_put(ipmi);
> >  		return NULL;
> >  	}
> > +	atomic_set(&ipmi_msg->refcnt, 1);
> >  	init_completion(&ipmi_msg->tx_complete);
> >  	INIT_LIST_HEAD(&ipmi_msg->head);
> >  	ipmi_msg->device = ipmi;
> > +
> >  	return ipmi_msg;
> >  }
> >
> > +static struct acpi_ipmi_msg *
> > +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> > +{
> > +	if (tx_msg)
> > +		atomic_inc(&tx_msg->refcnt);
> > +	return tx_msg;
> > +}
> > +
> > +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> > +{
> > +	acpi_ipmi_dev_put(tx_msg->device);
> > +	kfree(tx_msg);
> > +}
> > +
> > +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> > +{
> > +	if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> > +		ipmi_msg_release(tx_msg);
> > +}
> > +
> >  #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
> >  #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
> >  static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> > @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct
> acpi_ipmi_msg *msg,
> >
> >  static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> >  {
> > -	struct acpi_ipmi_msg *tx_msg, *temp;
> > +	struct acpi_ipmi_msg *tx_msg;
> >  	unsigned long flags;
> >
> >  	/*
> > @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct
> acpi_ipmi_device *ipmi)
> >  	 */
> >  	while (atomic_read(&ipmi->refcnt) > 1) {
> >  		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > -		list_for_each_entry_safe(tx_msg, temp,
> > -					 &ipmi->tx_msg_list, head) {
> > +		while (!list_empty(&ipmi->tx_msg_list)) {
> > +			tx_msg = list_first_entry(&ipmi->tx_msg_list,
> > +						  struct acpi_ipmi_msg,
> > +						  head);
> > +			list_del(&tx_msg->head);
> > +			spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> >  			/* wake up the sleep thread on the Tx msg */
> >  			complete(&tx_msg->tx_complete);
> > +			acpi_ipmi_msg_put(tx_msg);
> > +			spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >  		}
> >  		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> >  		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> >  	}
> >  }
> >
> > +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> > +			       struct acpi_ipmi_msg *msg)
> > +{
> > +	struct acpi_ipmi_msg *tx_msg;
> > +	int msg_found = 0;
> 
> Use bool?

OK.
There are other int flags in the original codes, do I need to do a cleanup for all of them (dev_found)?

> 
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > +	list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> > +		if (msg == tx_msg) {
> > +			msg_found = 1;
> > +			break;
> > +		}
> > +	}
> > +	if (msg_found)
> > +		list_del(&tx_msg->head);
> 
> The list_del() can be done when you set msg_found.

Please see my concerns in another email.

Thanks and best regards
-Lv

> 
> > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> > +	if (msg_found)
> > +		acpi_ipmi_msg_put(tx_msg);
> > +}
> > +
> >  static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> *user_msg_data)
> >  {
> >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > @@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  			break;
> >  		}
> >  	}
> > +	if (msg_found)
> > +		list_del(&tx_msg->head);
> > +	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> >  	if (!msg_found) {
> >  		dev_warn(&pnp_dev->dev,
> >  			 "Unexpected response (msg id %ld) is returned.\n",
> >  			 msg->msgid);
> > -		goto out_lock;
> > +		goto out_msg;
> >  	}
> >
> >  	/* copy the response data to Rx_data buffer */
> > @@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  	}
> >  	tx_msg->rx_len = msg->msg.data_len;
> >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > -	/* tx_msg content must be valid before setting msg_done flag */
> > -	smp_wmb();
> >  	tx_msg->msg_done = 1;
> >
> >  out_comp:
> >  	complete(&tx_msg->tx_complete);
> > -out_lock:
> > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > +	acpi_ipmi_msg_put(tx_msg);
> >  out_msg:
> >  	ipmi_free_recv_msg(msg);
> >  }
> > @@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> >  	if ((function & ACPI_IO_MASK) == ACPI_READ)
> >  		return AE_TYPE;
> >
> > -	ipmi_device = acpi_ipmi_get_selected_smi();
> > -	if (!ipmi_device)
> > +	tx_msg = ipmi_msg_alloc();
> > +	if (!tx_msg)
> >  		return AE_NOT_EXIST;
> > -
> > -	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> > -	if (!tx_msg) {
> > -		status = AE_NO_MEMORY;
> > -		goto out_ref;
> > -	}
> > +	ipmi_device = tx_msg->device;
> >
> >  	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> > -		status = AE_TYPE;
> > -		goto out_msg;
> > +		ipmi_msg_release(tx_msg);
> > +		return AE_TYPE;
> >  	}
> >
> > +	acpi_ipmi_msg_get(tx_msg);
> >  	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> >  	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
> >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > @@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> >  				   NULL, 0, 0, 0);
> >  	if (err) {
> >  		status = AE_ERROR;
> > -		goto out_list;
> > +		goto out_msg;
> >  	}
> >  	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> >  					       IPMI_TIMEOUT);
> >  	acpi_format_ipmi_response(tx_msg, value, rem_time);
> >  	status = AE_OK;
> >
> > -out_list:
> > -	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> > -	list_del(&tx_msg->head);
> > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >  out_msg:
> > -	kfree(tx_msg);
> > -out_ref:
> > -	acpi_ipmi_dev_put(ipmi_device);
> > +	ipmi_cancel_tx_msg(ipmi_device, tx_msg);
> > +	acpi_ipmi_msg_put(tx_msg);
> >  	return status;
> >  }
> >
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
@ 2013-07-26  1:21         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:21 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 10291 bytes --]

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> Sent: Friday, July 26, 2013 6:23 AM
> 
> On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> > This patch adds reference counting for ACPI IPMI transfers to tune the
> > locking granularity of tx_msg_lock.
> >
> > The acpi_ipmi_msg handling is re-designed using referece counting.
> > 1. tx_msg is always unlinked before complete(), so that:
> >    1.1. it is safe to put complete() out side of tx_msg_lock;
> >    1.2. complete() can only happen once, thus smp_wmb() is not required.
> > 2. Increasing the reference of tx_msg before calling
> >    ipmi_request_settime() and introducing tx_msg_lock protected
> >    ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
> >    tx_msg unlinking in the failure cases.
> > 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
> >    and freed in the contexts other than acpi_ipmi_space_handler().
> >
> > The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> > tuning:
> > 1. ipmi_lock is always leaf:
> >    irq_context: 0
> >    [ffffffff81a943f8] smi_watchers_mutex
> >    [ffffffffa06eca60] driver_data.ipmi_lock
> >    irq_context: 0
> >    [ffffffff82767b40] &buffer->mutex
> >    [ffffffffa00a6678] s_active#103
> >    [ffffffffa06eca60] driver_data.ipmi_lock
> > 2. without this patch applied, lock used by complete() is held after
> >    holding tx_msg_lock:
> >    irq_context: 0
> >    [ffffffff82767b40] &buffer->mutex
> >    [ffffffffa00a6678] s_active#103
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    [ffffffffa06eccf0] &x->wait#25
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    [ffffffffa06eccf0] &x->wait#25
> >    [ffffffff81e36620] &p->pi_lock
> >    irq_context: 1
> >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> >    [ffffffffa06eccf0] &x->wait#25
> >    [ffffffff81e36620] &p->pi_lock
> >    [ffffffff81e5d0a8] &rq->lock
> > 3. with this patch applied, tx_msg_lock is always leaf:
> >    irq_context: 0
> >    [ffffffff82767b40] &buffer->mutex
> >    [ffffffffa00a66d8] s_active#107
> >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> >    irq_context: 1
> >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |  107
> +++++++++++++++++++++++++++++++++-------------
> >  1 file changed, 77 insertions(+), 30 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > index 2a09156..0ee1ea6 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
> >  	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
> >  	u8	rx_len;
> >  	struct acpi_ipmi_device *device;
> > +	atomic_t	refcnt;
> 
> Again: kref, please?

Please see the concerns in another email.

> 
> >  };
> >
> >  /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> > @@ -195,22 +196,47 @@ static struct acpi_ipmi_device
> *acpi_ipmi_get_selected_smi(void)
> >  	return ipmi_device;
> >  }
> >
> > -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device
> *ipmi)
> > +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
> >  {
> > +	struct acpi_ipmi_device *ipmi;
> >  	struct acpi_ipmi_msg *ipmi_msg;
> > -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> >
> > +	ipmi = acpi_ipmi_get_selected_smi();
> > +	if (!ipmi)
> > +		return NULL;
> >  	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> > -	if (!ipmi_msg)	{
> > -		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> > +	if (!ipmi_msg) {
> > +		acpi_ipmi_dev_put(ipmi);
> >  		return NULL;
> >  	}
> > +	atomic_set(&ipmi_msg->refcnt, 1);
> >  	init_completion(&ipmi_msg->tx_complete);
> >  	INIT_LIST_HEAD(&ipmi_msg->head);
> >  	ipmi_msg->device = ipmi;
> > +
> >  	return ipmi_msg;
> >  }
> >
> > +static struct acpi_ipmi_msg *
> > +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> > +{
> > +	if (tx_msg)
> > +		atomic_inc(&tx_msg->refcnt);
> > +	return tx_msg;
> > +}
> > +
> > +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> > +{
> > +	acpi_ipmi_dev_put(tx_msg->device);
> > +	kfree(tx_msg);
> > +}
> > +
> > +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> > +{
> > +	if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> > +		ipmi_msg_release(tx_msg);
> > +}
> > +
> >  #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
> >  #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
> >  static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> > @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct
> acpi_ipmi_msg *msg,
> >
> >  static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> >  {
> > -	struct acpi_ipmi_msg *tx_msg, *temp;
> > +	struct acpi_ipmi_msg *tx_msg;
> >  	unsigned long flags;
> >
> >  	/*
> > @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct
> acpi_ipmi_device *ipmi)
> >  	 */
> >  	while (atomic_read(&ipmi->refcnt) > 1) {
> >  		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > -		list_for_each_entry_safe(tx_msg, temp,
> > -					 &ipmi->tx_msg_list, head) {
> > +		while (!list_empty(&ipmi->tx_msg_list)) {
> > +			tx_msg = list_first_entry(&ipmi->tx_msg_list,
> > +						  struct acpi_ipmi_msg,
> > +						  head);
> > +			list_del(&tx_msg->head);
> > +			spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> >  			/* wake up the sleep thread on the Tx msg */
> >  			complete(&tx_msg->tx_complete);
> > +			acpi_ipmi_msg_put(tx_msg);
> > +			spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> >  		}
> >  		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> >  		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> >  	}
> >  }
> >
> > +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> > +			       struct acpi_ipmi_msg *msg)
> > +{
> > +	struct acpi_ipmi_msg *tx_msg;
> > +	int msg_found = 0;
> 
> Use bool?

OK.
There are other int flags in the original codes, do I need to do a cleanup for all of them (dev_found)?

> 
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > +	list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> > +		if (msg == tx_msg) {
> > +			msg_found = 1;
> > +			break;
> > +		}
> > +	}
> > +	if (msg_found)
> > +		list_del(&tx_msg->head);
> 
> The list_del() can be done when you set msg_found.

Please see my concerns in another email.

Thanks and best regards
-Lv

> 
> > +	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > +
> > +	if (msg_found)
> > +		acpi_ipmi_msg_put(tx_msg);
> > +}
> > +
> >  static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void
> *user_msg_data)
> >  {
> >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > @@ -343,12 +399,15 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  			break;
> >  		}
> >  	}
> > +	if (msg_found)
> > +		list_del(&tx_msg->head);
> > +	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> >  	if (!msg_found) {
> >  		dev_warn(&pnp_dev->dev,
> >  			 "Unexpected response (msg id %ld) is returned.\n",
> >  			 msg->msgid);
> > -		goto out_lock;
> > +		goto out_msg;
> >  	}
> >
> >  	/* copy the response data to Rx_data buffer */
> > @@ -360,14 +419,11 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  	}
> >  	tx_msg->rx_len = msg->msg.data_len;
> >  	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
> > -	/* tx_msg content must be valid before setting msg_done flag */
> > -	smp_wmb();
> >  	tx_msg->msg_done = 1;
> >
> >  out_comp:
> >  	complete(&tx_msg->tx_complete);
> > -out_lock:
> > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > +	acpi_ipmi_msg_put(tx_msg);
> >  out_msg:
> >  	ipmi_free_recv_msg(msg);
> >  }
> > @@ -493,21 +549,17 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> >  	if ((function & ACPI_IO_MASK) == ACPI_READ)
> >  		return AE_TYPE;
> >
> > -	ipmi_device = acpi_ipmi_get_selected_smi();
> > -	if (!ipmi_device)
> > +	tx_msg = ipmi_msg_alloc();
> > +	if (!tx_msg)
> >  		return AE_NOT_EXIST;
> > -
> > -	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
> > -	if (!tx_msg) {
> > -		status = AE_NO_MEMORY;
> > -		goto out_ref;
> > -	}
> > +	ipmi_device = tx_msg->device;
> >
> >  	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
> > -		status = AE_TYPE;
> > -		goto out_msg;
> > +		ipmi_msg_release(tx_msg);
> > +		return AE_TYPE;
> >  	}
> >
> > +	acpi_ipmi_msg_get(tx_msg);
> >  	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> >  	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
> >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > @@ -518,21 +570,16 @@ acpi_ipmi_space_handler(u32 function,
> acpi_physical_address address,
> >  				   NULL, 0, 0, 0);
> >  	if (err) {
> >  		status = AE_ERROR;
> > -		goto out_list;
> > +		goto out_msg;
> >  	}
> >  	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> >  					       IPMI_TIMEOUT);
> >  	acpi_format_ipmi_response(tx_msg, value, rem_time);
> >  	status = AE_OK;
> >
> > -out_list:
> > -	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
> > -	list_del(&tx_msg->head);
> > -	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >  out_msg:
> > -	kfree(tx_msg);
> > -out_ref:
> > -	acpi_ipmi_dev_put(ipmi_device);
> > +	ipmi_cancel_tx_msg(ipmi_device, tx_msg);
> > +	acpi_ipmi_msg_put(tx_msg);
> >  	return status;
> >  }
> >
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
  2013-07-25 22:25     ` Rafael J. Wysocki
@ 2013-07-26  1:25         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Brown, Len, Corey Minyard, Wysocki, Rafael J, linux-kernel, Zhao,
	Yakui, linux-acpi, openipmi-developer

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 6:26 AM
> 
> On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > This is a trivial patch:
> > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> >    actually used.
> > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
> >    by dev_warn() invocations, so changes it to struct device.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
> >  1 file changed, 14 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 0ee1ea6..7f93ffd 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> >  	struct list_head tx_msg_list;
> >  	spinlock_t	tx_msg_lock;
> >  	acpi_handle handle;
> > -	struct pnp_dev *pnp_dev;
> > +	struct device *dev;
> >  	ipmi_user_t	user_interface;
> >  	int ipmi_ifnum; /* IPMI interface number */
> >  	long curr_msgid;
> > -	struct ipmi_smi_info smi_data;
> >  	atomic_t refcnt;
> >  };
> >
> > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {  };
> >
> >  static struct acpi_ipmi_device *
> > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > handle)
> > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> 
> Why is the second arg called pdev?

OK, I will change it to dev.

> 
> >  {
> >  	struct acpi_ipmi_device *ipmi_device;
> >  	int err;
> > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> *smi_data, acpi_handle handle)
> >  	spin_lock_init(&ipmi_device->tx_msg_lock);
> >
> >  	ipmi_device->handle = handle;
> > -	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > -	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> ipmi_smi_info));
> > +	ipmi_device->dev = get_device(pdev);
> >  	ipmi_device->ipmi_ifnum = iface;
> >
> >  	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> >  			       ipmi_device, &user);
> >  	if (err) {
> > -		put_device(smi_data->dev);
> > +		put_device(pdev);
> >  		kfree(ipmi_device);
> >  		return NULL;
> >  	}
> > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > *ipmi_device)  static void ipmi_dev_release(struct acpi_ipmi_device
> > *ipmi_device)  {
> >  	ipmi_destroy_user(ipmi_device->user_interface);
> > -	put_device(ipmi_device->smi_data.dev);
> > +	put_device(ipmi_device->dev);
> >  	kfree(ipmi_device);
> >  }
> >
> > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> acpi_ipmi_msg *tx_msg,
> >  	buffer = (struct acpi_ipmi_buffer *)value;
> >  	/* copy the tx message data */
> >  	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > -		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > +		dev_WARN_ONCE(tx_msg->device->dev, true,
> >  			      "Unexpected request (msg len %d).\n",
> >  			      buffer->length);
> >  		return -EINVAL;
> > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> >  	int msg_found = 0;
> >  	struct acpi_ipmi_msg *tx_msg;
> > -	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > +	struct device *dev = ipmi_device->dev;
> >  	unsigned long flags;
> >
> >  	if (msg->user != ipmi_device->user_interface) {
> > -		dev_warn(&pnp_dev->dev,
> > +		dev_warn(dev,
> >  			 "Unexpected response is returned. returned user %p, expected
> user %p\n",
> >  			 msg->user, ipmi_device->user_interface);
> >  		goto out_msg;
> > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> *msg, void *user_msg_data)
> >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> >  	if (!msg_found) {
> > -		dev_warn(&pnp_dev->dev,
> > +		dev_warn(dev,
> >  			 "Unexpected response (msg id %ld) is returned.\n",
> >  			 msg->msgid);
> >  		goto out_msg;
> > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > *msg, void *user_msg_data)
> >
> >  	/* copy the response data to Rx_data buffer */
> >  	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > -		dev_WARN_ONCE(&pnp_dev->dev, true,
> > +		dev_WARN_ONCE(dev, true,
> >  			      "Unexpected response (msg len %d).\n",
> >  			      msg->msg.data_len);
> >  		goto out_comp;
> > @@ -431,7 +429,7 @@ out_msg:
> >  static void ipmi_register_bmc(int iface, struct device *dev)  {
> >  	struct acpi_ipmi_device *ipmi_device, *temp;
> > -	struct pnp_dev *pnp_dev;
> > +	struct device *pdev;
> 
> And here?

The dev is the parameter of the ipmi_register_bmc(), it is not possible to name the "struct ipmi_smi_info " as dev here for this quick fix.

Thanks
-Lv

> 
> >  	int err;
> >  	struct ipmi_smi_info smi_data;
> >  	acpi_handle handle;
> > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> device *dev)
> >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> >  	if (!handle)
> >  		goto err_ref;
> > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > +	pdev = smi_data.dev;
> >
> > -	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > +	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> >  	if (!ipmi_device) {
> > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > +		dev_warn(pdev, "Can't create IPMI user interface\n");
> >  		goto err_ref;
> >  	}
> >
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
@ 2013-07-26  1:25         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5703 bytes --]

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 6:26 AM
> 
> On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > This is a trivial patch:
> > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> >    actually used.
> > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
> >    by dev_warn() invocations, so changes it to struct device.
> >
> > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > ---
> >  drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
> >  1 file changed, 14 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > 0ee1ea6..7f93ffd 100644
> > --- a/drivers/acpi/acpi_ipmi.c
> > +++ b/drivers/acpi/acpi_ipmi.c
> > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> >  	struct list_head tx_msg_list;
> >  	spinlock_t	tx_msg_lock;
> >  	acpi_handle handle;
> > -	struct pnp_dev *pnp_dev;
> > +	struct device *dev;
> >  	ipmi_user_t	user_interface;
> >  	int ipmi_ifnum; /* IPMI interface number */
> >  	long curr_msgid;
> > -	struct ipmi_smi_info smi_data;
> >  	atomic_t refcnt;
> >  };
> >
> > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {  };
> >
> >  static struct acpi_ipmi_device *
> > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > handle)
> > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> 
> Why is the second arg called pdev?

OK, I will change it to dev.

> 
> >  {
> >  	struct acpi_ipmi_device *ipmi_device;
> >  	int err;
> > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> *smi_data, acpi_handle handle)
> >  	spin_lock_init(&ipmi_device->tx_msg_lock);
> >
> >  	ipmi_device->handle = handle;
> > -	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > -	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> ipmi_smi_info));
> > +	ipmi_device->dev = get_device(pdev);
> >  	ipmi_device->ipmi_ifnum = iface;
> >
> >  	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> >  			       ipmi_device, &user);
> >  	if (err) {
> > -		put_device(smi_data->dev);
> > +		put_device(pdev);
> >  		kfree(ipmi_device);
> >  		return NULL;
> >  	}
> > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > *ipmi_device)  static void ipmi_dev_release(struct acpi_ipmi_device
> > *ipmi_device)  {
> >  	ipmi_destroy_user(ipmi_device->user_interface);
> > -	put_device(ipmi_device->smi_data.dev);
> > +	put_device(ipmi_device->dev);
> >  	kfree(ipmi_device);
> >  }
> >
> > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> acpi_ipmi_msg *tx_msg,
> >  	buffer = (struct acpi_ipmi_buffer *)value;
> >  	/* copy the tx message data */
> >  	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > -		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > +		dev_WARN_ONCE(tx_msg->device->dev, true,
> >  			      "Unexpected request (msg len %d).\n",
> >  			      buffer->length);
> >  		return -EINVAL;
> > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg *msg, void *user_msg_data)
> >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> >  	int msg_found = 0;
> >  	struct acpi_ipmi_msg *tx_msg;
> > -	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > +	struct device *dev = ipmi_device->dev;
> >  	unsigned long flags;
> >
> >  	if (msg->user != ipmi_device->user_interface) {
> > -		dev_warn(&pnp_dev->dev,
> > +		dev_warn(dev,
> >  			 "Unexpected response is returned. returned user %p, expected
> user %p\n",
> >  			 msg->user, ipmi_device->user_interface);
> >  		goto out_msg;
> > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> *msg, void *user_msg_data)
> >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> >
> >  	if (!msg_found) {
> > -		dev_warn(&pnp_dev->dev,
> > +		dev_warn(dev,
> >  			 "Unexpected response (msg id %ld) is returned.\n",
> >  			 msg->msgid);
> >  		goto out_msg;
> > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > *msg, void *user_msg_data)
> >
> >  	/* copy the response data to Rx_data buffer */
> >  	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > -		dev_WARN_ONCE(&pnp_dev->dev, true,
> > +		dev_WARN_ONCE(dev, true,
> >  			      "Unexpected response (msg len %d).\n",
> >  			      msg->msg.data_len);
> >  		goto out_comp;
> > @@ -431,7 +429,7 @@ out_msg:
> >  static void ipmi_register_bmc(int iface, struct device *dev)  {
> >  	struct acpi_ipmi_device *ipmi_device, *temp;
> > -	struct pnp_dev *pnp_dev;
> > +	struct device *pdev;
> 
> And here?

The dev is the parameter of the ipmi_register_bmc(), it is not possible to name the "struct ipmi_smi_info " as dev here for this quick fix.

Thanks
-Lv

> 
> >  	int err;
> >  	struct ipmi_smi_info smi_data;
> >  	acpi_handle handle;
> > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> device *dev)
> >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> >  	if (!handle)
> >  		goto err_ref;
> > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > +	pdev = smi_data.dev;
> >
> > -	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > +	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> >  	if (!ipmi_device) {
> > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > +		dev_warn(pdev, "Can't create IPMI user interface\n");
> >  		goto err_ref;
> >  	}
> >
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-07-26  0:48                 ` Corey Minyard
@ 2013-07-26  1:30                   ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:30 UTC (permalink / raw)
  To: minyard
  Cc: Rafael J. Wysocki, Wysocki, Rafael J, Brown, Len, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

> From: Corey Minyard [mailto:tcminyard@gmail.com]
> Sent: Friday, July 26, 2013 8:48 AM
> 
> On 07/25/2013 07:16 PM, Zheng, Lv wrote:
> >>
> >> If I understand this correctly, the problem would be if:
> >>
> >> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> >>                                           IPMI_TIMEOUT);
> >>
> >> returns on a timeout, then checks msg_done and races with something
> >> setting msg_done.  If that is the case, you would need the smp_rmb()
> >> before checking msg_done.
> >>
> >> However, the timeout above is unnecessary.  You are using
> >> ipmi_request_settime(), so you can set the timeout when the IPMI
> >> command fails and returns a failure message.  The driver guarantees a
> >> return message for each request.  Just remove the timeout from the
> >> completion, set the timeout and retries in the ipmi request, and the
> >> completion should handle the barrier issues.
> > It's just difficult for me to determine retry count and timeout value, maybe
> retry=0, timeout=IPMI_TIMEOUT is OK.
> > The code of the timeout completion is already there, I think the quick fix code
> should not introduce this logic.
> > I'll add a new patch to apply your comment.
> 
> Since it is a local BMC, I doubt a retry is required.  That is probably fine.  Or
> you could set retry=1 and timeout=IPMI_TIMEOUT/2 if you wanted to be more
> sure, but I doubt it would make a difference.  The only time you really need to
> worry about retries is if you are resetting the BMC or it is being overloaded.

I think for ACPI IPMI operation region, retries can be implemented in the ASL codes by the BIOS.
I'll check if retry=0 is correct.

> 
> >
> >> Plus, from a quick glance at the code, it doesn't look like it will
> >> properly handle a situation where the timeout occurs and is handled
> >> then the response comes in later.
> > PATCH 07 fixed this issue.
> > Here we just need the smp_rmb() or holding tx_msg_lock() around the
> acpi_format_ipmi_response().
> 
> If you apply the fix like I suggest, then the race goes away.  If there's no
> timeout and it just waits for the completion, things get a lot simpler.

Exactly.  I'll try to apply this in this patch, then the PATCH 07 is also need to be re-worked.

Thanks and best regards
-Lv


> >
> > Thanks for commenting.
> 
> No problem, thanks for working on this.
> 
> -corey

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
@ 2013-07-26  1:30                   ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:30 UTC (permalink / raw)
  To: minyard
  Cc: Rafael J. Wysocki, Wysocki, Rafael J, Brown, Len, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2531 bytes --]

> From: Corey Minyard [mailto:tcminyard@gmail.com]
> Sent: Friday, July 26, 2013 8:48 AM
> 
> On 07/25/2013 07:16 PM, Zheng, Lv wrote:
> >>
> >> If I understand this correctly, the problem would be if:
> >>
> >> rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
> >>                                           IPMI_TIMEOUT);
> >>
> >> returns on a timeout, then checks msg_done and races with something
> >> setting msg_done.  If that is the case, you would need the smp_rmb()
> >> before checking msg_done.
> >>
> >> However, the timeout above is unnecessary.  You are using
> >> ipmi_request_settime(), so you can set the timeout when the IPMI
> >> command fails and returns a failure message.  The driver guarantees a
> >> return message for each request.  Just remove the timeout from the
> >> completion, set the timeout and retries in the ipmi request, and the
> >> completion should handle the barrier issues.
> > It's just difficult for me to determine retry count and timeout value, maybe
> retry=0, timeout=IPMI_TIMEOUT is OK.
> > The code of the timeout completion is already there, I think the quick fix code
> should not introduce this logic.
> > I'll add a new patch to apply your comment.
> 
> Since it is a local BMC, I doubt a retry is required.  That is probably fine.  Or
> you could set retry=1 and timeout=IPMI_TIMEOUT/2 if you wanted to be more
> sure, but I doubt it would make a difference.  The only time you really need to
> worry about retries is if you are resetting the BMC or it is being overloaded.

I think for ACPI IPMI operation region, retries can be implemented in the ASL codes by the BIOS.
I'll check if retry=0 is correct.

> 
> >
> >> Plus, from a quick glance at the code, it doesn't look like it will
> >> properly handle a situation where the timeout occurs and is handled
> >> then the response comes in later.
> > PATCH 07 fixed this issue.
> > Here we just need the smp_rmb() or holding tx_msg_lock() around the
> acpi_format_ipmi_response().
> 
> If you apply the fix like I suggest, then the race goes away.  If there's no
> timeout and it just waits for the completion, things get a lot simpler.

Exactly.  I'll try to apply this in this patch, then the PATCH 07 is also need to be re-worked.

Thanks and best regards
-Lv


> >
> > Thanks for commenting.
> 
> No problem, thanks for working on this.
> 
> -corey
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-25 21:29     ` Rafael J. Wysocki
@ 2013-07-26  1:54         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:54 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Wysocki, Rafael J, Brown, Len, linux-kernel, linux-acpi

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 5:29 AM
> 
> On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > This patch adds reference couting for ACPI operation region handlers to fix
> > races caused by the ACPICA address space callback invocations.
> >
> > ACPICA address space callback invocation is not suitable for Linux
> > CONFIG_MODULE=y execution environment.
> 
> Actually, can you please explain to me what *exactly* the problem is?

OK.  I'll add race explanations in the next revision.

The problem is there is no "lock" held inside ACPICA for invoking operation region handlers.
Thus races happens between the acpi_remove/install_address_space_handler and the handler/setup callbacks.

This is correct per ACPI specification.
As if there is interpreter locks held for invoking operation region handlers, the timeout implemented inside the operation region handlers will make all locking facilities (Acquire or Sleep,...) timed out.
Please refer to ACPI specification "5.5.2 Control Method Execution":
Interpretation of a Control Method is not preemptive, but it can block. When a control method does block, OSPM can initiate or continue the execution of a different control method. A control method can only assume that access to global objects is exclusive for any period the control method does not block.

So it is pretty much likely that ACPI IO transfers are locked inside the operation region callback implementations.
Using locking facility to protect the callback invocation will risk dead locks.

Thanks
-Lv

> Rafael


^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-26  1:54         ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  1:54 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Wysocki, Rafael J, Brown, Len, linux-kernel, linux-acpi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1742 bytes --]

> From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> Sent: Friday, July 26, 2013 5:29 AM
> 
> On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > This patch adds reference couting for ACPI operation region handlers to fix
> > races caused by the ACPICA address space callback invocations.
> >
> > ACPICA address space callback invocation is not suitable for Linux
> > CONFIG_MODULE=y execution environment.
> 
> Actually, can you please explain to me what *exactly* the problem is?

OK.  I'll add race explanations in the next revision.

The problem is there is no "lock" held inside ACPICA for invoking operation region handlers.
Thus races happens between the acpi_remove/install_address_space_handler and the handler/setup callbacks.

This is correct per ACPI specification.
As if there is interpreter locks held for invoking operation region handlers, the timeout implemented inside the operation region handlers will make all locking facilities (Acquire or Sleep,...) timed out.
Please refer to ACPI specification "5.5.2 Control Method Execution":
Interpretation of a Control Method is not preemptive, but it can block. When a control method does block, OSPM can initiate or continue the execution of a different control method. A control method can only assume that access to global objects is exclusive for any period the control method does not block.

So it is pretty much likely that ACPI IO transfers are locked inside the operation region callback implementations.
Using locking facility to protect the callback invocation will risk dead locks.

Thanks
-Lv

> Rafael

ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-26  0:47         ` Zheng, Lv
@ 2013-07-26  8:09           ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  8:09 UTC (permalink / raw)
  To: Zheng, Lv, Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng, Lv
> Sent: Friday, July 26, 2013 8:48 AM
> 
> 
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 4:27 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > > the address space callbacks by invoking them under a module safe
> > environment.
> > > The IPMI address space handler is also upgraded in this patch.
> > > The acpi_unregister_region() is designed to meet the following
> > > requirements:
> > > 1. It acts as a barrier for operation region callbacks - no callback will
> > >    happen after acpi_unregister_region().
> > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > >    functions.
> > > Using reference counting rather than module referencing allows such
> > > benefits to be achieved even when acpi_unregister_region() is called
> > > in the environments other than module->exit().
> > > The header file of include/acpi/acpi_bus.h should contain the
> > > declarations that have references to some ACPICA defined types.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   16 ++--
> > >  drivers/acpi/osl.c       |  224
> > ++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/acpi/acpi_bus.h  |    5 ++
> > >  3 files changed, 235 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > index
> > > 5f8f495..2a09156 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -539,20 +539,18 @@ out_ref:
> > >  static int __init acpi_ipmi_init(void)  {
> > >  	int result = 0;
> > > -	acpi_status status;
> > >
> > >  	if (acpi_disabled)
> > >  		return result;
> > >
> > >  	mutex_init(&driver_data.ipmi_lock);
> > >
> > > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > -						    ACPI_ADR_SPACE_IPMI,
> > > -						    &acpi_ipmi_space_handler,
> > > -						    NULL, NULL);
> > > -	if (ACPI_FAILURE(status)) {
> > > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > +				      &acpi_ipmi_space_handler,
> > > +				      NULL, NULL);
> > > +	if (result) {
> > >  		pr_warn("Can't register IPMI opregion space handle\n");
> > > -		return -EINVAL;
> > > +		return result;
> > >  	}
> > >
> > >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > >  	}
> > >  	mutex_unlock(&driver_data.ipmi_lock);
> > >
> > > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > -					  ACPI_ADR_SPACE_IPMI,
> > > -					  &acpi_ipmi_space_handler);
> > > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > >  }
> > >
> > >  module_init(acpi_ipmi_init);
> > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > 6ab2c35..8398e51 100644
> > > --- a/drivers/acpi/osl.c
> > > +++ b/drivers/acpi/osl.c
> > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> > > static struct workqueue_struct *kacpi_notify_wq;  static struct
> > > workqueue_struct *kacpi_hotplug_wq;
> > >
> > > +struct acpi_region {
> > > +	unsigned long flags;
> > > +#define ACPI_REGION_DEFAULT		0x01
> > > +#define ACPI_REGION_INSTALLED		0x02
> > > +#define ACPI_REGION_REGISTERED		0x04
> > > +#define ACPI_REGION_UNREGISTERING	0x08
> > > +#define ACPI_REGION_INSTALLING		0x10
> >
> > What about (1UL << 1), (1UL << 2) etc.?
> >
> > Also please remove the #defines out of the struct definition.
> 
> OK.
> 
> >
> > > +	/*
> > > +	 * NOTE: Upgrading All Region Handlers
> > > +	 * This flag is only used during the period where not all of the
> > > +	 * region handers are upgraded to the new interfaces.
> > > +	 */
> > > +#define ACPI_REGION_MANAGED		0x80
> > > +	acpi_adr_space_handler handler;
> > > +	acpi_adr_space_setup setup;
> > > +	void *context;
> > > +	/* Invoking references */
> > > +	atomic_t refcnt;
> >
> > Actually, why don't you use krefs?
> 
> If you take a look at other piece of my codes, you'll find there are two reasons:
> 
> 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and
> there is no kref API to do so.
>   I just think it is not suitable for me to introduce such an API into kref.h and
> start another argument around kref designs in this bug fix patch. :-)
>   I'll start a discussion about kref design using another thread.
> 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind
> of atomic_t coding style.
>   If atomic_t is changed to struct kref, I will need to implement two API,
> __ipmi_dev_release() to take a struct kref as parameter and call
> ipmi_dev_release inside it.
>   By not using kref, I needn't write codes to implement such API.
> 
> >
> > > +};
> > > +
> > > +static struct acpi_region
> acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > = {
> > > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_IPMI] = {
> > > +		.flags = ACPI_REGION_MANAGED,
> > > +	},
> > > +};
> > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > +
> > >  /*
> > >   * This list of permanent mappings is for memory that may be
> > > accessed
> > from
> > >   * interrupt context, where we can't do the ioremap().
> > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> > u32 type, void *context,
> > >  		kfree(hp_work);
> > >  }
> > >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > +
> > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > +	/*
> > > +	 * NOTE: Default and Managed
> > > +	 * We only need to avoid region management on the regions
> managed
> > > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need
> additional
> > > +	 * check as many operation region handlers are not upgraded, so
> > > +	 * only those known to be safe are managed
> (ACPI_REGION_MANAGED).
> > > +	 */
> > > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > > +
> > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > +
> > > +static acpi_status
> > > +acpi_region_default_handler(u32 function,
> > > +			    acpi_physical_address address,
> > > +			    u32 bit_width, u64 *value,
> > > +			    void *handler_context, void *region_context) {
> > > +	acpi_adr_space_handler handler;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_NOT_EXIST;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	handler = rgn->handler;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = handler(function, address, bit_width, value, context,
> > > +			 region_context);
> >
> > Why don't we call the handler under the mutex?

I think my reply is against this question, let me remove it up.

> It's a kind of programming style related concern.
> IMO, using locks around callback function is a buggy programming style that
> could lead to dead locks.
> Let me explain this using an example.
> 
> Object A exports a register/unregister API for other objects.
> Object B calls A's register/unregister API to register/unregister B's callback.

Sorry I have to use object rather than module here as there might be several objects inside a module that have the same situation need to be handled.

> It's likely that object B will hold lock_of_B around unregister/register when
> object B is destroyed/created, the lock_of_B is likely also used inside the
> callback.
> So when object A holds the lock_of_A around the callback invocation, it leads to
> dead lock since:
> 1. the locking order for the register/unregister side will be: lock(lock_of_B),
> lock(lock_of_A) 2. the locking order for the callback side will be: lock(lock_of_A),
> lock(lock_of_B) They are in the reversed order!

I think this example is not quite correct.
There is another aspect in unregister implementation which is the intent of this patch:
No callback running/imitated after "unregister", we can call this a "flush" requirement.
Inside of the callback, lock_of_B should not be held if "flush" is required.

> 
> IMO, Linux may need to introduce __callback, __api as decelerators for the
> functions, and use sparse to enforce this rule, sparse knows if a callback is
> invoked under some locks.
> 
> In the case of ACPICA space_handlers, as you may know, when an ACPI
> operation region handler is invoked, there will be no lock held inside ACPICA
> (interpreter lock must be freed before executing operation region handlers).
> So the likelihood of the dead lock is pretty much high here!

I need to mention another requirement of the operation region handler.
It is required that multiple operation region handlers are executed at the same time, or the IO operations invoked by the BIOS ASL codes will be serialized.
IMO, IO operations invoked by the BIOS ASL need to be parallelized.
Thus mutex is not useful to implement a protection here.

So the mutex is unlocked before executing the handler, IMO, reference counting is useful here to meet this requirement.

> >
> > What exactly prevents context from becoming NULL before the call above?

I think my answers did not answer this question directly.

Sorry that I'm not clear what you want to ask here.  Let me just try to be practical.

The code is here:
>
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {

> > > +	handler = rgn->handler;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);

The handler is ensured not to be NULL within the mutex lock.

Thanks for commenting.

Best regards
-Lv

> >
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static acpi_status
> > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > +			  void *handler_context, void **region_context) {
> > > +	acpi_adr_space_setup setup;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_OK;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	setup = rgn->setup;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = setup(handle, function, context, region_context);
> >
> > Can setup drop rgn->refcnt ?
> 
> The reason is same as the handler, as a setup is also a callback.
> 
> >
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > +				 acpi_adr_space_type space_id)
> > > +{
> > > +	int res = 0;
> > > +	acpi_status status;
> > > +	int installing = 0;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > > +		goto out_lock;
> > > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > +		res = -EBUSY;
> > > +		goto out_lock;
> > > +	}
> > > +
> > > +	installing = 1;
> > > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > space_id,
> > > +						    acpi_region_default_handler,
> > > +						    acpi_region_default_setup,
> > > +						    rgn);
> > > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > +	if (ACPI_FAILURE(status))
> > > +		res = -EINVAL;
> > > +	else
> > > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > > +
> > > +out_lock:
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +	if (installing) {
> > > +		if (res)
> > > +			pr_err("Failed to install region %d\n", space_id);
> > > +		else
> > > +			pr_info("Region %d installed\n", space_id);
> > > +	}
> > > +	return res;
> > > +}
> > > +
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context) {
> > > +	int res;
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return -EINVAL;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return -EINVAL;
> > > +
> > > +	res = __acpi_install_region(rgn, space_id);
> > > +	if (res)
> > > +		return res;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	rgn->handler = handler;
> > > +	rgn->setup = setup;
> > > +	rgn->context = context;
> > > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > > +	atomic_set(&rgn->refcnt, 1);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d registered\n", space_id);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > +
> > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> > > +	}
> > > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> >
> > What about
> >
> > 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > 		mutex_unlock(&acpi_mutex_region);
> > 		return;
> > 	}
> >
> 
> OK.
> 
> > > +	}
> > > +
> > > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > +	rgn->handler = NULL;
> > > +	rgn->setup = NULL;
> > > +	rgn->context = NULL;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	while (atomic_read(&rgn->refcnt) > 1)
> > > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> >
> > Wouldn't it be better to use a wait queue here?
> 
> Yes, I'll try.
> 
> >
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > ACPI_REGION_UNREGISTERING);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d unregistered\n", space_id); }
> > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > a2c2fbb..15fad0d 100644
> > > --- a/include/acpi/acpi_bus.h
> > > +++ b/include/acpi/acpi_bus.h
> > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > *bus) { return 0; }
> > >
> > >  #endif				/* CONFIG_ACPI */
> > >
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context); void
> > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > +
> > >  #endif /*__ACPI_BUS_H__*/
> >
> > Thanks,
> > Rafael
> 
> Thanks
> -Lv
> 
> >
> >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
> N     r  y   b X  ǧv ^ )޺{.n +    { i b {ay \x1dʇڙ ,j   f   h   z \x1e w
> j:+v   w j m         zZ+     ݢj"  ! i

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-26  8:09           ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  8:09 UTC (permalink / raw)
  To: Zheng, Lv, Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 16233 bytes --]

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng, Lv
> Sent: Friday, July 26, 2013 8:48 AM
> 
> 
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 4:27 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > > the address space callbacks by invoking them under a module safe
> > environment.
> > > The IPMI address space handler is also upgraded in this patch.
> > > The acpi_unregister_region() is designed to meet the following
> > > requirements:
> > > 1. It acts as a barrier for operation region callbacks - no callback will
> > >    happen after acpi_unregister_region().
> > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > >    functions.
> > > Using reference counting rather than module referencing allows such
> > > benefits to be achieved even when acpi_unregister_region() is called
> > > in the environments other than module->exit().
> > > The header file of include/acpi/acpi_bus.h should contain the
> > > declarations that have references to some ACPICA defined types.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   16 ++--
> > >  drivers/acpi/osl.c       |  224
> > ++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/acpi/acpi_bus.h  |    5 ++
> > >  3 files changed, 235 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > index
> > > 5f8f495..2a09156 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -539,20 +539,18 @@ out_ref:
> > >  static int __init acpi_ipmi_init(void)  {
> > >  	int result = 0;
> > > -	acpi_status status;
> > >
> > >  	if (acpi_disabled)
> > >  		return result;
> > >
> > >  	mutex_init(&driver_data.ipmi_lock);
> > >
> > > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > -						    ACPI_ADR_SPACE_IPMI,
> > > -						    &acpi_ipmi_space_handler,
> > > -						    NULL, NULL);
> > > -	if (ACPI_FAILURE(status)) {
> > > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > +				      &acpi_ipmi_space_handler,
> > > +				      NULL, NULL);
> > > +	if (result) {
> > >  		pr_warn("Can't register IPMI opregion space handle\n");
> > > -		return -EINVAL;
> > > +		return result;
> > >  	}
> > >
> > >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > >  	}
> > >  	mutex_unlock(&driver_data.ipmi_lock);
> > >
> > > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > -					  ACPI_ADR_SPACE_IPMI,
> > > -					  &acpi_ipmi_space_handler);
> > > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > >  }
> > >
> > >  module_init(acpi_ipmi_init);
> > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > 6ab2c35..8398e51 100644
> > > --- a/drivers/acpi/osl.c
> > > +++ b/drivers/acpi/osl.c
> > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> > > static struct workqueue_struct *kacpi_notify_wq;  static struct
> > > workqueue_struct *kacpi_hotplug_wq;
> > >
> > > +struct acpi_region {
> > > +	unsigned long flags;
> > > +#define ACPI_REGION_DEFAULT		0x01
> > > +#define ACPI_REGION_INSTALLED		0x02
> > > +#define ACPI_REGION_REGISTERED		0x04
> > > +#define ACPI_REGION_UNREGISTERING	0x08
> > > +#define ACPI_REGION_INSTALLING		0x10
> >
> > What about (1UL << 1), (1UL << 2) etc.?
> >
> > Also please remove the #defines out of the struct definition.
> 
> OK.
> 
> >
> > > +	/*
> > > +	 * NOTE: Upgrading All Region Handlers
> > > +	 * This flag is only used during the period where not all of the
> > > +	 * region handers are upgraded to the new interfaces.
> > > +	 */
> > > +#define ACPI_REGION_MANAGED		0x80
> > > +	acpi_adr_space_handler handler;
> > > +	acpi_adr_space_setup setup;
> > > +	void *context;
> > > +	/* Invoking references */
> > > +	atomic_t refcnt;
> >
> > Actually, why don't you use krefs?
> 
> If you take a look at other piece of my codes, you'll find there are two reasons:
> 
> 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and
> there is no kref API to do so.
>   I just think it is not suitable for me to introduce such an API into kref.h and
> start another argument around kref designs in this bug fix patch. :-)
>   I'll start a discussion about kref design using another thread.
> 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind
> of atomic_t coding style.
>   If atomic_t is changed to struct kref, I will need to implement two API,
> __ipmi_dev_release() to take a struct kref as parameter and call
> ipmi_dev_release inside it.
>   By not using kref, I needn't write codes to implement such API.
> 
> >
> > > +};
> > > +
> > > +static struct acpi_region
> acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > = {
> > > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_IPMI] = {
> > > +		.flags = ACPI_REGION_MANAGED,
> > > +	},
> > > +};
> > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > +
> > >  /*
> > >   * This list of permanent mappings is for memory that may be
> > > accessed
> > from
> > >   * interrupt context, where we can't do the ioremap().
> > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> > u32 type, void *context,
> > >  		kfree(hp_work);
> > >  }
> > >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > +
> > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > +	/*
> > > +	 * NOTE: Default and Managed
> > > +	 * We only need to avoid region management on the regions
> managed
> > > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need
> additional
> > > +	 * check as many operation region handlers are not upgraded, so
> > > +	 * only those known to be safe are managed
> (ACPI_REGION_MANAGED).
> > > +	 */
> > > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > > +
> > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > +
> > > +static acpi_status
> > > +acpi_region_default_handler(u32 function,
> > > +			    acpi_physical_address address,
> > > +			    u32 bit_width, u64 *value,
> > > +			    void *handler_context, void *region_context) {
> > > +	acpi_adr_space_handler handler;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_NOT_EXIST;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	handler = rgn->handler;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = handler(function, address, bit_width, value, context,
> > > +			 region_context);
> >
> > Why don't we call the handler under the mutex?

I think my reply is against this question, let me remove it up.

> It's a kind of programming style related concern.
> IMO, using locks around callback function is a buggy programming style that
> could lead to dead locks.
> Let me explain this using an example.
> 
> Object A exports a register/unregister API for other objects.
> Object B calls A's register/unregister API to register/unregister B's callback.

Sorry I have to use object rather than module here as there might be several objects inside a module that have the same situation need to be handled.

> It's likely that object B will hold lock_of_B around unregister/register when
> object B is destroyed/created, the lock_of_B is likely also used inside the
> callback.
> So when object A holds the lock_of_A around the callback invocation, it leads to
> dead lock since:
> 1. the locking order for the register/unregister side will be: lock(lock_of_B),
> lock(lock_of_A) 2. the locking order for the callback side will be: lock(lock_of_A),
> lock(lock_of_B) They are in the reversed order!

I think this example is not quite correct.
There is another aspect in unregister implementation which is the intent of this patch:
No callback running/imitated after "unregister", we can call this a "flush" requirement.
Inside of the callback, lock_of_B should not be held if "flush" is required.

> 
> IMO, Linux may need to introduce __callback, __api as decelerators for the
> functions, and use sparse to enforce this rule, sparse knows if a callback is
> invoked under some locks.
> 
> In the case of ACPICA space_handlers, as you may know, when an ACPI
> operation region handler is invoked, there will be no lock held inside ACPICA
> (interpreter lock must be freed before executing operation region handlers).
> So the likelihood of the dead lock is pretty much high here!

I need to mention another requirement of the operation region handler.
It is required that multiple operation region handlers are executed at the same time, or the IO operations invoked by the BIOS ASL codes will be serialized.
IMO, IO operations invoked by the BIOS ASL need to be parallelized.
Thus mutex is not useful to implement a protection here.

So the mutex is unlocked before executing the handler, IMO, reference counting is useful here to meet this requirement.

> >
> > What exactly prevents context from becoming NULL before the call above?

I think my answers did not answer this question directly.

Sorry that I'm not clear what you want to ask here.  Let me just try to be practical.

The code is here:
>
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {

> > > +	handler = rgn->handler;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);

The handler is ensured not to be NULL within the mutex lock.

Thanks for commenting.

Best regards
-Lv

> >
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static acpi_status
> > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > +			  void *handler_context, void **region_context) {
> > > +	acpi_adr_space_setup setup;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_OK;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	setup = rgn->setup;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = setup(handle, function, context, region_context);
> >
> > Can setup drop rgn->refcnt ?
> 
> The reason is same as the handler, as a setup is also a callback.
> 
> >
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > +				 acpi_adr_space_type space_id)
> > > +{
> > > +	int res = 0;
> > > +	acpi_status status;
> > > +	int installing = 0;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > > +		goto out_lock;
> > > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > +		res = -EBUSY;
> > > +		goto out_lock;
> > > +	}
> > > +
> > > +	installing = 1;
> > > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > space_id,
> > > +						    acpi_region_default_handler,
> > > +						    acpi_region_default_setup,
> > > +						    rgn);
> > > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > +	if (ACPI_FAILURE(status))
> > > +		res = -EINVAL;
> > > +	else
> > > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > > +
> > > +out_lock:
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +	if (installing) {
> > > +		if (res)
> > > +			pr_err("Failed to install region %d\n", space_id);
> > > +		else
> > > +			pr_info("Region %d installed\n", space_id);
> > > +	}
> > > +	return res;
> > > +}
> > > +
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context) {
> > > +	int res;
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return -EINVAL;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return -EINVAL;
> > > +
> > > +	res = __acpi_install_region(rgn, space_id);
> > > +	if (res)
> > > +		return res;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	rgn->handler = handler;
> > > +	rgn->setup = setup;
> > > +	rgn->context = context;
> > > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > > +	atomic_set(&rgn->refcnt, 1);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d registered\n", space_id);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > +
> > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> > > +	}
> > > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> >
> > What about
> >
> > 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > 		mutex_unlock(&acpi_mutex_region);
> > 		return;
> > 	}
> >
> 
> OK.
> 
> > > +	}
> > > +
> > > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > +	rgn->handler = NULL;
> > > +	rgn->setup = NULL;
> > > +	rgn->context = NULL;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	while (atomic_read(&rgn->refcnt) > 1)
> > > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> >
> > Wouldn't it be better to use a wait queue here?
> 
> Yes, I'll try.
> 
> >
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > ACPI_REGION_UNREGISTERING);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d unregistered\n", space_id); }
> > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > a2c2fbb..15fad0d 100644
> > > --- a/include/acpi/acpi_bus.h
> > > +++ b/include/acpi/acpi_bus.h
> > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > *bus) { return 0; }
> > >
> > >  #endif				/* CONFIG_ACPI */
> > >
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context); void
> > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > +
> > >  #endif /*__ACPI_BUS_H__*/
> >
> > Thanks,
> > Rafael
> 
> Thanks
> -Lv
> 
> >
> >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
> N     r  y   b X  ǧv ^ )޺{.n +    { i b {ay \x1dʇڙ ,j   f   h   z \x1e w
> j:+v   w j m         zZ+     ݢj"  ! i
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-26  1:54         ` Zheng, Lv
@ 2013-07-26  8:15           ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  8:15 UTC (permalink / raw)
  To: Zheng, Lv, Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, linux-kernel, linux-acpi

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng, Lv
> Sent: Friday, July 26, 2013 9:54 AM
> To: Rafael J. Wysocki
> Cc: Wysocki, Rafael J; Brown, Len; linux-kernel@vger.kernel.org;
> linux-acpi@vger.kernel.org
> Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI
> operation region handlers
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 5:29 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.
> >
> > Actually, can you please explain to me what *exactly* the problem is?
> 
> OK.  I'll add race explanations in the next revision.
> 
> The problem is there is no "lock" held inside ACPICA for invoking operation
> region handlers.
> Thus races happens between the acpi_remove/install_address_space_handler
> and the handler/setup callbacks.

This seems not a good explanation of the intent of this patch.
I think the intent is here in the patch description:

1. It acts as a barrier for operation region callbacks - no callback will
   happen after acpi_unregister_region().
2. acpi_unregister_region() is safe to be called in moudle->exit()
   functions.

Hmm, maybe I need to re-order the patch description for this patch.

Thanks for commenting.

Best regards
-Lv

> 
> This is correct per ACPI specification.
> As if there is interpreter locks held for invoking operation region handlers, the
> timeout implemented inside the operation region handlers will make all locking
> facilities (Acquire or Sleep,...) timed out.
> Please refer to ACPI specification "5.5.2 Control Method Execution":
> Interpretation of a Control Method is not preemptive, but it can block. When a
> control method does block, OSPM can initiate or continue the execution of a
> different control method. A control method can only assume that access to
> global objects is exclusive for any period the control method does not block.
> 
> So it is pretty much likely that ACPI IO transfers are locked inside the operation
> region callback implementations.
> Using locking facility to protect the callback invocation will risk dead locks.
> 
> Thanks
> -Lv
> 
> > Rafael

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-26  8:15           ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-26  8:15 UTC (permalink / raw)
  To: Zheng, Lv, Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, linux-kernel, linux-acpi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2609 bytes --]

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Zheng, Lv
> Sent: Friday, July 26, 2013 9:54 AM
> To: Rafael J. Wysocki
> Cc: Wysocki, Rafael J; Brown, Len; linux-kernel@vger.kernel.org;
> linux-acpi@vger.kernel.org
> Subject: RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI
> operation region handlers
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 5:29 AM
> >
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.
> >
> > Actually, can you please explain to me what *exactly* the problem is?
> 
> OK.  I'll add race explanations in the next revision.
> 
> The problem is there is no "lock" held inside ACPICA for invoking operation
> region handlers.
> Thus races happens between the acpi_remove/install_address_space_handler
> and the handler/setup callbacks.

This seems not a good explanation of the intent of this patch.
I think the intent is here in the patch description:

1. It acts as a barrier for operation region callbacks - no callback will
   happen after acpi_unregister_region().
2. acpi_unregister_region() is safe to be called in moudle->exit()
   functions.

Hmm, maybe I need to re-order the patch description for this patch.

Thanks for commenting.

Best regards
-Lv

> 
> This is correct per ACPI specification.
> As if there is interpreter locks held for invoking operation region handlers, the
> timeout implemented inside the operation region handlers will make all locking
> facilities (Acquire or Sleep,...) timed out.
> Please refer to ACPI specification "5.5.2 Control Method Execution":
> Interpretation of a Control Method is not preemptive, but it can block. When a
> control method does block, OSPM can initiate or continue the execution of a
> different control method. A control method can only assume that access to
> global objects is exclusive for any period the control method does not block.
> 
> So it is pretty much likely that ACPI IO transfers are locked inside the operation
> region callback implementations.
> Using locking facility to protect the callback invocation will risk dead locks.
> 
> Thanks
> -Lv
> 
> > Rafael
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
  2013-07-26  1:25         ` Zheng, Lv
@ 2013-07-26 13:38           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-26 13:38 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Friday, July 26, 2013 01:25:12 AM Zheng, Lv wrote:
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 6:26 AM
> > 
> > On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > > This is a trivial patch:
> > > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> > >    actually used.
> > > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
> > >    by dev_warn() invocations, so changes it to struct device.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
> > >  1 file changed, 14 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > 0ee1ea6..7f93ffd 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> > >  	struct list_head tx_msg_list;
> > >  	spinlock_t	tx_msg_lock;
> > >  	acpi_handle handle;
> > > -	struct pnp_dev *pnp_dev;
> > > +	struct device *dev;
> > >  	ipmi_user_t	user_interface;
> > >  	int ipmi_ifnum; /* IPMI interface number */
> > >  	long curr_msgid;
> > > -	struct ipmi_smi_info smi_data;
> > >  	atomic_t refcnt;
> > >  };
> > >
> > > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {  };
> > >
> > >  static struct acpi_ipmi_device *
> > > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > > handle)
> > > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> > 
> > Why is the second arg called pdev?
> 
> OK, I will change it to dev.

OK, thanks.

> > 
> > >  {
> > >  	struct acpi_ipmi_device *ipmi_device;
> > >  	int err;
> > > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> > *smi_data, acpi_handle handle)
> > >  	spin_lock_init(&ipmi_device->tx_msg_lock);
> > >
> > >  	ipmi_device->handle = handle;
> > > -	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > > -	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> > ipmi_smi_info));
> > > +	ipmi_device->dev = get_device(pdev);
> > >  	ipmi_device->ipmi_ifnum = iface;
> > >
> > >  	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > >  			       ipmi_device, &user);
> > >  	if (err) {
> > > -		put_device(smi_data->dev);
> > > +		put_device(pdev);
> > >  		kfree(ipmi_device);
> > >  		return NULL;
> > >  	}
> > > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > > *ipmi_device)  static void ipmi_dev_release(struct acpi_ipmi_device
> > > *ipmi_device)  {
> > >  	ipmi_destroy_user(ipmi_device->user_interface);
> > > -	put_device(ipmi_device->smi_data.dev);
> > > +	put_device(ipmi_device->dev);
> > >  	kfree(ipmi_device);
> > >  }
> > >
> > > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> > acpi_ipmi_msg *tx_msg,
> > >  	buffer = (struct acpi_ipmi_buffer *)value;
> > >  	/* copy the tx message data */
> > >  	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > -		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > > +		dev_WARN_ONCE(tx_msg->device->dev, true,
> > >  			      "Unexpected request (msg len %d).\n",
> > >  			      buffer->length);
> > >  		return -EINVAL;
> > > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data)
> > >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > >  	int msg_found = 0;
> > >  	struct acpi_ipmi_msg *tx_msg;
> > > -	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > > +	struct device *dev = ipmi_device->dev;
> > >  	unsigned long flags;
> > >
> > >  	if (msg->user != ipmi_device->user_interface) {
> > > -		dev_warn(&pnp_dev->dev,
> > > +		dev_warn(dev,
> > >  			 "Unexpected response is returned. returned user %p, expected
> > user %p\n",
> > >  			 msg->user, ipmi_device->user_interface);
> > >  		goto out_msg;
> > > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > *msg, void *user_msg_data)
> > >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >
> > >  	if (!msg_found) {
> > > -		dev_warn(&pnp_dev->dev,
> > > +		dev_warn(dev,
> > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > >  			 msg->msgid);
> > >  		goto out_msg;
> > > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > > *msg, void *user_msg_data)
> > >
> > >  	/* copy the response data to Rx_data buffer */
> > >  	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > -		dev_WARN_ONCE(&pnp_dev->dev, true,
> > > +		dev_WARN_ONCE(dev, true,
> > >  			      "Unexpected response (msg len %d).\n",
> > >  			      msg->msg.data_len);
> > >  		goto out_comp;
> > > @@ -431,7 +429,7 @@ out_msg:
> > >  static void ipmi_register_bmc(int iface, struct device *dev)  {
> > >  	struct acpi_ipmi_device *ipmi_device, *temp;
> > > -	struct pnp_dev *pnp_dev;
> > > +	struct device *pdev;
> > 
> > And here?
> 
> The dev is the parameter of the ipmi_register_bmc(), it is not possible to name the "struct ipmi_smi_info " as dev here for this quick fix.

Right.  What about smi_dev?  Or just use smi_data.dev directly?  It's just two
places and shouldn't cause any line wraps to happen.

Rafael


> > >  	int err;
> > >  	struct ipmi_smi_info smi_data;
> > >  	acpi_handle handle;
> > > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> > device *dev)
> > >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> > >  	if (!handle)
> > >  		goto err_ref;
> > > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > > +	pdev = smi_data.dev;
> > >
> > > -	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > > +	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> > >  	if (!ipmi_device) {
> > > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > > +		dev_warn(pdev, "Can't create IPMI user interface\n");
> > >  		goto err_ref;
> > >  	}
> > >
> > >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
@ 2013-07-26 13:38           ` Rafael J. Wysocki
  0 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-26 13:38 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Friday, July 26, 2013 01:25:12 AM Zheng, Lv wrote:
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 6:26 AM
> > 
> > On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > > This is a trivial patch:
> > > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> > >    actually used.
> > > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
> > >    by dev_warn() invocations, so changes it to struct device.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
> > >  1 file changed, 14 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > 0ee1ea6..7f93ffd 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> > >  	struct list_head tx_msg_list;
> > >  	spinlock_t	tx_msg_lock;
> > >  	acpi_handle handle;
> > > -	struct pnp_dev *pnp_dev;
> > > +	struct device *dev;
> > >  	ipmi_user_t	user_interface;
> > >  	int ipmi_ifnum; /* IPMI interface number */
> > >  	long curr_msgid;
> > > -	struct ipmi_smi_info smi_data;
> > >  	atomic_t refcnt;
> > >  };
> > >
> > > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {  };
> > >
> > >  static struct acpi_ipmi_device *
> > > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > > handle)
> > > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> > 
> > Why is the second arg called pdev?
> 
> OK, I will change it to dev.

OK, thanks.

> > 
> > >  {
> > >  	struct acpi_ipmi_device *ipmi_device;
> > >  	int err;
> > > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> > *smi_data, acpi_handle handle)
> > >  	spin_lock_init(&ipmi_device->tx_msg_lock);
> > >
> > >  	ipmi_device->handle = handle;
> > > -	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > > -	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> > ipmi_smi_info));
> > > +	ipmi_device->dev = get_device(pdev);
> > >  	ipmi_device->ipmi_ifnum = iface;
> > >
> > >  	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > >  			       ipmi_device, &user);
> > >  	if (err) {
> > > -		put_device(smi_data->dev);
> > > +		put_device(pdev);
> > >  		kfree(ipmi_device);
> > >  		return NULL;
> > >  	}
> > > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > > *ipmi_device)  static void ipmi_dev_release(struct acpi_ipmi_device
> > > *ipmi_device)  {
> > >  	ipmi_destroy_user(ipmi_device->user_interface);
> > > -	put_device(ipmi_device->smi_data.dev);
> > > +	put_device(ipmi_device->dev);
> > >  	kfree(ipmi_device);
> > >  }
> > >
> > > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> > acpi_ipmi_msg *tx_msg,
> > >  	buffer = (struct acpi_ipmi_buffer *)value;
> > >  	/* copy the tx message data */
> > >  	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > -		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > > +		dev_WARN_ONCE(tx_msg->device->dev, true,
> > >  			      "Unexpected request (msg len %d).\n",
> > >  			      buffer->length);
> > >  		return -EINVAL;
> > > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> > ipmi_recv_msg *msg, void *user_msg_data)
> > >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > >  	int msg_found = 0;
> > >  	struct acpi_ipmi_msg *tx_msg;
> > > -	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > > +	struct device *dev = ipmi_device->dev;
> > >  	unsigned long flags;
> > >
> > >  	if (msg->user != ipmi_device->user_interface) {
> > > -		dev_warn(&pnp_dev->dev,
> > > +		dev_warn(dev,
> > >  			 "Unexpected response is returned. returned user %p, expected
> > user %p\n",
> > >  			 msg->user, ipmi_device->user_interface);
> > >  		goto out_msg;
> > > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > *msg, void *user_msg_data)
> > >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > >
> > >  	if (!msg_found) {
> > > -		dev_warn(&pnp_dev->dev,
> > > +		dev_warn(dev,
> > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > >  			 msg->msgid);
> > >  		goto out_msg;
> > > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg
> > > *msg, void *user_msg_data)
> > >
> > >  	/* copy the response data to Rx_data buffer */
> > >  	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > -		dev_WARN_ONCE(&pnp_dev->dev, true,
> > > +		dev_WARN_ONCE(dev, true,
> > >  			      "Unexpected response (msg len %d).\n",
> > >  			      msg->msg.data_len);
> > >  		goto out_comp;
> > > @@ -431,7 +429,7 @@ out_msg:
> > >  static void ipmi_register_bmc(int iface, struct device *dev)  {
> > >  	struct acpi_ipmi_device *ipmi_device, *temp;
> > > -	struct pnp_dev *pnp_dev;
> > > +	struct device *pdev;
> > 
> > And here?
> 
> The dev is the parameter of the ipmi_register_bmc(), it is not possible to name the "struct ipmi_smi_info " as dev here for this quick fix.

Right.  What about smi_dev?  Or just use smi_data.dev directly?  It's just two
places and shouldn't cause any line wraps to happen.

Rafael


> > >  	int err;
> > >  	struct ipmi_smi_info smi_data;
> > >  	acpi_handle handle;
> > > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> > device *dev)
> > >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> > >  	if (!handle)
> > >  		goto err_ref;
> > > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > > +	pdev = smi_data.dev;
> > >
> > > -	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > > +	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> > >  	if (!ipmi_device) {
> > > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
> > > +		dev_warn(pdev, "Can't create IPMI user interface\n");
> > >  		goto err_ref;
> > >  	}
> > >
> > >
> > --
> > I speak only for myself.
> > Rafael J. Wysocki, Intel Open Source Technology Center.
> 
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  2013-07-26  1:21         ` Zheng, Lv
@ 2013-07-26 13:41           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-26 13:41 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Friday, July 26, 2013 01:21:18 AM Zheng, Lv wrote:
> > From: linux-acpi-owner@vger.kernel.org
> > [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> > Sent: Friday, July 26, 2013 6:23 AM
> > 
> > On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> > > This patch adds reference counting for ACPI IPMI transfers to tune the
> > > locking granularity of tx_msg_lock.
> > >
> > > The acpi_ipmi_msg handling is re-designed using referece counting.
> > > 1. tx_msg is always unlinked before complete(), so that:
> > >    1.1. it is safe to put complete() out side of tx_msg_lock;
> > >    1.2. complete() can only happen once, thus smp_wmb() is not required.
> > > 2. Increasing the reference of tx_msg before calling
> > >    ipmi_request_settime() and introducing tx_msg_lock protected
> > >    ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
> > >    tx_msg unlinking in the failure cases.
> > > 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
> > >    and freed in the contexts other than acpi_ipmi_space_handler().
> > >
> > > The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> > > tuning:
> > > 1. ipmi_lock is always leaf:
> > >    irq_context: 0
> > >    [ffffffff81a943f8] smi_watchers_mutex
> > >    [ffffffffa06eca60] driver_data.ipmi_lock
> > >    irq_context: 0
> > >    [ffffffff82767b40] &buffer->mutex
> > >    [ffffffffa00a6678] s_active#103
> > >    [ffffffffa06eca60] driver_data.ipmi_lock
> > > 2. without this patch applied, lock used by complete() is held after
> > >    holding tx_msg_lock:
> > >    irq_context: 0
> > >    [ffffffff82767b40] &buffer->mutex
> > >    [ffffffffa00a6678] s_active#103
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    [ffffffffa06eccf0] &x->wait#25
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    [ffffffffa06eccf0] &x->wait#25
> > >    [ffffffff81e36620] &p->pi_lock
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    [ffffffffa06eccf0] &x->wait#25
> > >    [ffffffff81e36620] &p->pi_lock
> > >    [ffffffff81e5d0a8] &rq->lock
> > > 3. with this patch applied, tx_msg_lock is always leaf:
> > >    irq_context: 0
> > >    [ffffffff82767b40] &buffer->mutex
> > >    [ffffffffa00a66d8] s_active#107
> > >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    irq_context: 1
> > >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |  107
> > +++++++++++++++++++++++++++++++++-------------
> > >  1 file changed, 77 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > index 2a09156..0ee1ea6 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
> > >  	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
> > >  	u8	rx_len;
> > >  	struct acpi_ipmi_device *device;
> > > +	atomic_t	refcnt;
> > 
> > Again: kref, please?
> 
> Please see the concerns in another email.
> 
> > 
> > >  };
> > >
> > >  /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> > > @@ -195,22 +196,47 @@ static struct acpi_ipmi_device
> > *acpi_ipmi_get_selected_smi(void)
> > >  	return ipmi_device;
> > >  }
> > >
> > > -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device
> > *ipmi)
> > > +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
> > >  {
> > > +	struct acpi_ipmi_device *ipmi;
> > >  	struct acpi_ipmi_msg *ipmi_msg;
> > > -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > >
> > > +	ipmi = acpi_ipmi_get_selected_smi();
> > > +	if (!ipmi)
> > > +		return NULL;
> > >  	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> > > -	if (!ipmi_msg)	{
> > > -		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> > > +	if (!ipmi_msg) {
> > > +		acpi_ipmi_dev_put(ipmi);
> > >  		return NULL;
> > >  	}
> > > +	atomic_set(&ipmi_msg->refcnt, 1);
> > >  	init_completion(&ipmi_msg->tx_complete);
> > >  	INIT_LIST_HEAD(&ipmi_msg->head);
> > >  	ipmi_msg->device = ipmi;
> > > +
> > >  	return ipmi_msg;
> > >  }
> > >
> > > +static struct acpi_ipmi_msg *
> > > +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > +	if (tx_msg)
> > > +		atomic_inc(&tx_msg->refcnt);
> > > +	return tx_msg;
> > > +}
> > > +
> > > +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > +	acpi_ipmi_dev_put(tx_msg->device);
> > > +	kfree(tx_msg);
> > > +}
> > > +
> > > +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > +	if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> > > +		ipmi_msg_release(tx_msg);
> > > +}
> > > +
> > >  #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
> > >  #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
> > >  static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> > > @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct
> > acpi_ipmi_msg *msg,
> > >
> > >  static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> > >  {
> > > -	struct acpi_ipmi_msg *tx_msg, *temp;
> > > +	struct acpi_ipmi_msg *tx_msg;
> > >  	unsigned long flags;
> > >
> > >  	/*
> > > @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)
> > >  	 */
> > >  	while (atomic_read(&ipmi->refcnt) > 1) {
> > >  		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > -		list_for_each_entry_safe(tx_msg, temp,
> > > -					 &ipmi->tx_msg_list, head) {
> > > +		while (!list_empty(&ipmi->tx_msg_list)) {
> > > +			tx_msg = list_first_entry(&ipmi->tx_msg_list,
> > > +						  struct acpi_ipmi_msg,
> > > +						  head);
> > > +			list_del(&tx_msg->head);
> > > +			spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > +
> > >  			/* wake up the sleep thread on the Tx msg */
> > >  			complete(&tx_msg->tx_complete);
> > > +			acpi_ipmi_msg_put(tx_msg);
> > > +			spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > >  		}
> > >  		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > +
> > >  		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> > >  	}
> > >  }
> > >
> > > +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> > > +			       struct acpi_ipmi_msg *msg)
> > > +{
> > > +	struct acpi_ipmi_msg *tx_msg;
> > > +	int msg_found = 0;
> > 
> > Use bool?
> 
> OK.
> There are other int flags in the original codes, do I need to do a cleanup for all of them (dev_found)?

Not in this patch, but in general it wouldn't hurt.

> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > +	list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> > > +		if (msg == tx_msg) {
> > > +			msg_found = 1;
> > > +			break;
> > > +		}
> > > +	}
> > > +	if (msg_found)
> > > +		list_del(&tx_msg->head);
> > 
> > The list_del() can be done when you set msg_found.
> 
> Please see my concerns in another email.

OK, I'll reply there.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
@ 2013-07-26 13:41           ` Rafael J. Wysocki
  0 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-26 13:41 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

On Friday, July 26, 2013 01:21:18 AM Zheng, Lv wrote:
> > From: linux-acpi-owner@vger.kernel.org
> > [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> > Sent: Friday, July 26, 2013 6:23 AM
> > 
> > On Tuesday, July 23, 2013 04:09:54 PM Lv Zheng wrote:
> > > This patch adds reference counting for ACPI IPMI transfers to tune the
> > > locking granularity of tx_msg_lock.
> > >
> > > The acpi_ipmi_msg handling is re-designed using referece counting.
> > > 1. tx_msg is always unlinked before complete(), so that:
> > >    1.1. it is safe to put complete() out side of tx_msg_lock;
> > >    1.2. complete() can only happen once, thus smp_wmb() is not required.
> > > 2. Increasing the reference of tx_msg before calling
> > >    ipmi_request_settime() and introducing tx_msg_lock protected
> > >    ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
> > >    tx_msg unlinking in the failure cases.
> > > 3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
> > >    and freed in the contexts other than acpi_ipmi_space_handler().
> > >
> > > The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
> > > tuning:
> > > 1. ipmi_lock is always leaf:
> > >    irq_context: 0
> > >    [ffffffff81a943f8] smi_watchers_mutex
> > >    [ffffffffa06eca60] driver_data.ipmi_lock
> > >    irq_context: 0
> > >    [ffffffff82767b40] &buffer->mutex
> > >    [ffffffffa00a6678] s_active#103
> > >    [ffffffffa06eca60] driver_data.ipmi_lock
> > > 2. without this patch applied, lock used by complete() is held after
> > >    holding tx_msg_lock:
> > >    irq_context: 0
> > >    [ffffffff82767b40] &buffer->mutex
> > >    [ffffffffa00a6678] s_active#103
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    [ffffffffa06eccf0] &x->wait#25
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    [ffffffffa06eccf0] &x->wait#25
> > >    [ffffffff81e36620] &p->pi_lock
> > >    irq_context: 1
> > >    [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    [ffffffffa06eccf0] &x->wait#25
> > >    [ffffffff81e36620] &p->pi_lock
> > >    [ffffffff81e5d0a8] &rq->lock
> > > 3. with this patch applied, tx_msg_lock is always leaf:
> > >    irq_context: 0
> > >    [ffffffff82767b40] &buffer->mutex
> > >    [ffffffffa00a66d8] s_active#107
> > >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> > >    irq_context: 1
> > >    [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Cc: Zhao Yakui <yakui.zhao@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |  107
> > +++++++++++++++++++++++++++++++++-------------
> > >  1 file changed, 77 insertions(+), 30 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
> > > index 2a09156..0ee1ea6 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -105,6 +105,7 @@ struct acpi_ipmi_msg {
> > >  	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
> > >  	u8	rx_len;
> > >  	struct acpi_ipmi_device *device;
> > > +	atomic_t	refcnt;
> > 
> > Again: kref, please?
> 
> Please see the concerns in another email.
> 
> > 
> > >  };
> > >
> > >  /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
> > > @@ -195,22 +196,47 @@ static struct acpi_ipmi_device
> > *acpi_ipmi_get_selected_smi(void)
> > >  	return ipmi_device;
> > >  }
> > >
> > > -static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device
> > *ipmi)
> > > +static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
> > >  {
> > > +	struct acpi_ipmi_device *ipmi;
> > >  	struct acpi_ipmi_msg *ipmi_msg;
> > > -	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
> > >
> > > +	ipmi = acpi_ipmi_get_selected_smi();
> > > +	if (!ipmi)
> > > +		return NULL;
> > >  	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
> > > -	if (!ipmi_msg)	{
> > > -		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
> > > +	if (!ipmi_msg) {
> > > +		acpi_ipmi_dev_put(ipmi);
> > >  		return NULL;
> > >  	}
> > > +	atomic_set(&ipmi_msg->refcnt, 1);
> > >  	init_completion(&ipmi_msg->tx_complete);
> > >  	INIT_LIST_HEAD(&ipmi_msg->head);
> > >  	ipmi_msg->device = ipmi;
> > > +
> > >  	return ipmi_msg;
> > >  }
> > >
> > > +static struct acpi_ipmi_msg *
> > > +acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > +	if (tx_msg)
> > > +		atomic_inc(&tx_msg->refcnt);
> > > +	return tx_msg;
> > > +}
> > > +
> > > +static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > +	acpi_ipmi_dev_put(tx_msg->device);
> > > +	kfree(tx_msg);
> > > +}
> > > +
> > > +static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
> > > +{
> > > +	if (tx_msg && atomic_dec_and_test(&tx_msg->refcnt))
> > > +		ipmi_msg_release(tx_msg);
> > > +}
> > > +
> > >  #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
> > >  #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
> > >  static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
> > > @@ -300,7 +326,7 @@ static void acpi_format_ipmi_response(struct
> > acpi_ipmi_msg *msg,
> > >
> > >  static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
> > >  {
> > > -	struct acpi_ipmi_msg *tx_msg, *temp;
> > > +	struct acpi_ipmi_msg *tx_msg;
> > >  	unsigned long flags;
> > >
> > >  	/*
> > > @@ -311,16 +337,46 @@ static void ipmi_flush_tx_msg(struct
> > acpi_ipmi_device *ipmi)
> > >  	 */
> > >  	while (atomic_read(&ipmi->refcnt) > 1) {
> > >  		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > -		list_for_each_entry_safe(tx_msg, temp,
> > > -					 &ipmi->tx_msg_list, head) {
> > > +		while (!list_empty(&ipmi->tx_msg_list)) {
> > > +			tx_msg = list_first_entry(&ipmi->tx_msg_list,
> > > +						  struct acpi_ipmi_msg,
> > > +						  head);
> > > +			list_del(&tx_msg->head);
> > > +			spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > +
> > >  			/* wake up the sleep thread on the Tx msg */
> > >  			complete(&tx_msg->tx_complete);
> > > +			acpi_ipmi_msg_put(tx_msg);
> > > +			spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > >  		}
> > >  		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
> > > +
> > >  		schedule_timeout_uninterruptible(msecs_to_jiffies(1));
> > >  	}
> > >  }
> > >
> > > +static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
> > > +			       struct acpi_ipmi_msg *msg)
> > > +{
> > > +	struct acpi_ipmi_msg *tx_msg;
> > > +	int msg_found = 0;
> > 
> > Use bool?
> 
> OK.
> There are other int flags in the original codes, do I need to do a cleanup for all of them (dev_found)?

Not in this patch, but in general it wouldn't hurt.

> > > +	unsigned long flags;
> > > +
> > > +	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
> > > +	list_for_each_entry(tx_msg, &ipmi->tx_msg_list, head) {
> > > +		if (msg == tx_msg) {
> > > +			msg_found = 1;
> > > +			break;
> > > +		}
> > > +	}
> > > +	if (msg_found)
> > > +		list_del(&tx_msg->head);
> > 
> > The list_del() can be done when you set msg_found.
> 
> Please see my concerns in another email.

OK, I'll reply there.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-26  0:47         ` Zheng, Lv
@ 2013-07-26 14:00           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-26 14:00 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

On Friday, July 26, 2013 12:47:44 AM Zheng, Lv wrote:
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 4:27 AM
> > 
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > > the address space callbacks by invoking them under a module safe
> > environment.
> > > The IPMI address space handler is also upgraded in this patch.
> > > The acpi_unregister_region() is designed to meet the following
> > > requirements:
> > > 1. It acts as a barrier for operation region callbacks - no callback will
> > >    happen after acpi_unregister_region().
> > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > >    functions.
> > > Using reference counting rather than module referencing allows such
> > > benefits to be achieved even when acpi_unregister_region() is called
> > > in the environments other than module->exit().
> > > The header file of include/acpi/acpi_bus.h should contain the
> > > declarations that have references to some ACPICA defined types.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   16 ++--
> > >  drivers/acpi/osl.c       |  224
> > ++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/acpi/acpi_bus.h  |    5 ++
> > >  3 files changed, 235 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > 5f8f495..2a09156 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -539,20 +539,18 @@ out_ref:
> > >  static int __init acpi_ipmi_init(void)  {
> > >  	int result = 0;
> > > -	acpi_status status;
> > >
> > >  	if (acpi_disabled)
> > >  		return result;
> > >
> > >  	mutex_init(&driver_data.ipmi_lock);
> > >
> > > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > -						    ACPI_ADR_SPACE_IPMI,
> > > -						    &acpi_ipmi_space_handler,
> > > -						    NULL, NULL);
> > > -	if (ACPI_FAILURE(status)) {
> > > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > +				      &acpi_ipmi_space_handler,
> > > +				      NULL, NULL);
> > > +	if (result) {
> > >  		pr_warn("Can't register IPMI opregion space handle\n");
> > > -		return -EINVAL;
> > > +		return result;
> > >  	}
> > >
> > >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > >  	}
> > >  	mutex_unlock(&driver_data.ipmi_lock);
> > >
> > > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > -					  ACPI_ADR_SPACE_IPMI,
> > > -					  &acpi_ipmi_space_handler);
> > > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > >  }
> > >
> > >  module_init(acpi_ipmi_init);
> > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > 6ab2c35..8398e51 100644
> > > --- a/drivers/acpi/osl.c
> > > +++ b/drivers/acpi/osl.c
> > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;  static
> > > struct workqueue_struct *kacpi_notify_wq;  static struct
> > > workqueue_struct *kacpi_hotplug_wq;
> > >
> > > +struct acpi_region {
> > > +	unsigned long flags;
> > > +#define ACPI_REGION_DEFAULT		0x01
> > > +#define ACPI_REGION_INSTALLED		0x02
> > > +#define ACPI_REGION_REGISTERED		0x04
> > > +#define ACPI_REGION_UNREGISTERING	0x08
> > > +#define ACPI_REGION_INSTALLING		0x10
> > 
> > What about (1UL << 1), (1UL << 2) etc.?
> > 
> > Also please remove the #defines out of the struct definition.
> 
> OK.
> 
> > 
> > > +	/*
> > > +	 * NOTE: Upgrading All Region Handlers
> > > +	 * This flag is only used during the period where not all of the
> > > +	 * region handers are upgraded to the new interfaces.
> > > +	 */
> > > +#define ACPI_REGION_MANAGED		0x80
> > > +	acpi_adr_space_handler handler;
> > > +	acpi_adr_space_setup setup;
> > > +	void *context;
> > > +	/* Invoking references */
> > > +	atomic_t refcnt;
> > 
> > Actually, why don't you use krefs?
> 
> If you take a look at other piece of my codes, you'll find there are two reasons:
> 
> 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and there is no kref API to do so.

No, there's not any, but you can read kref.refcount directly, can't you?

Moreover, it is not entirely clear to me that doing the while (atomic_read() > 1)
is actually correct.

>   I just think it is not suitable for me to introduce such an API into kref.h and start another argument around kref designs in this bug fix patch. :-)
>   I'll start a discussion about kref design using another thread.

You don't need to do that at all.

> 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind of atomic_t coding style.
>   If atomic_t is changed to struct kref, I will need to implement two API, __ipmi_dev_release() to take a struct kref as parameter and call ipmi_dev_release inside it.
>   By not using kref, I needn't write codes to implement such API.

I'm not following you, sorry.

Please just use krefs for reference counting, the same way as you use
struct list_head for implementing lists.  This is the way everyone does
that in the kernel and that's for a reason.

Unless you do your reference counting under a lock, in which case using
atomic_t isn't necessary at all and you can use a non-atomic counter.

> > > +};
> > > +
> > > +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > = {
> > > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_IPMI] = {
> > > +		.flags = ACPI_REGION_MANAGED,
> > > +	},
> > > +};
> > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > +
> > >  /*
> > >   * This list of permanent mappings is for memory that may be accessed
> > from
> > >   * interrupt context, where we can't do the ioremap().
> > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> > u32 type, void *context,
> > >  		kfree(hp_work);
> > >  }
> > >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > +
> > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > +	/*
> > > +	 * NOTE: Default and Managed
> > > +	 * We only need to avoid region management on the regions managed
> > > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need additional
> > > +	 * check as many operation region handlers are not upgraded, so
> > > +	 * only those known to be safe are managed (ACPI_REGION_MANAGED).
> > > +	 */
> > > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > > +
> > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > +
> > > +static acpi_status
> > > +acpi_region_default_handler(u32 function,
> > > +			    acpi_physical_address address,
> > > +			    u32 bit_width, u64 *value,
> > > +			    void *handler_context, void *region_context) {
> > > +	acpi_adr_space_handler handler;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_NOT_EXIST;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	handler = rgn->handler;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = handler(function, address, bit_width, value, context,
> > > +			 region_context);
> > 
> > Why don't we call the handler under the mutex?
> > 
> > What exactly prevents context from becoming NULL before the call above?
> 
> It's a kind of programming style related concern.
> IMO, using locks around callback function is a buggy programming style that could lead to dead locks.
> Let me explain this using an example.
> 
> Object A exports a register/unregister API for other objects.
> Object B calls A's register/unregister API to register/unregister B's callback.
> It's likely that object B will hold lock_of_B around unregister/register when object B is destroyed/created, the lock_of_B is likely also used inside the callback.

Why is it likely to be used inside the callback?  Clearly, if a callback is
executed under a lock, that lock can't be acquired by that callback.

> So when object A holds the lock_of_A around the callback invocation, it leads to dead lock since:
> 1. the locking order for the register/unregister side will be: lock(lock_of_B), lock(lock_of_A)
> 2. the locking order for the callback side will be: lock(lock_of_A), lock(lock_of_B)
> They are in the reversed order!
> 
> IMO, Linux may need to introduce __callback, __api as decelerators for the functions, and use sparse to enforce this rule, sparse knows if a callback is invoked under some locks.

Oh, dear.  Yes, sparse knows such things, and so what?

> In the case of ACPICA space_handlers, as you may know, when an ACPI operation region handler is invoked, there will be no lock held inside ACPICA (interpreter lock must be freed before executing operation region handlers).
> So the likelihood of the dead lock is pretty much high here!

Sorry, what are you talking about?

Please let me rephrase my question: What *practical* problems would it lead to
if we executed this particular callback under this particular mutex?

Not *theoretical* in the general theory of everything, *practical* in this
particular piece of code.

And we are talking about a *global* mutex here, not something object-specific.

> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static acpi_status
> > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > +			  void *handler_context, void **region_context) {
> > > +	acpi_adr_space_setup setup;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_OK;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	setup = rgn->setup;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = setup(handle, function, context, region_context);
> > 
> > Can setup drop rgn->refcnt ?
> 
> The reason is same as the handler, as a setup is also a callback.

Let me rephrase: Is it legitimate for setup to modify rgn->refcnt?
If so, then why?

> > 
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > +				 acpi_adr_space_type space_id)
> > > +{
> > > +	int res = 0;
> > > +	acpi_status status;
> > > +	int installing = 0;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > > +		goto out_lock;
> > > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > +		res = -EBUSY;
> > > +		goto out_lock;
> > > +	}
> > > +
> > > +	installing = 1;
> > > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > space_id,
> > > +						    acpi_region_default_handler,
> > > +						    acpi_region_default_setup,
> > > +						    rgn);
> > > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > +	if (ACPI_FAILURE(status))
> > > +		res = -EINVAL;
> > > +	else
> > > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > > +
> > > +out_lock:
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +	if (installing) {
> > > +		if (res)
> > > +			pr_err("Failed to install region %d\n", space_id);
> > > +		else
> > > +			pr_info("Region %d installed\n", space_id);
> > > +	}
> > > +	return res;
> > > +}
> > > +
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context) {
> > > +	int res;
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return -EINVAL;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return -EINVAL;
> > > +
> > > +	res = __acpi_install_region(rgn, space_id);
> > > +	if (res)
> > > +		return res;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	rgn->handler = handler;
> > > +	rgn->setup = setup;
> > > +	rgn->context = context;
> > > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > > +	atomic_set(&rgn->refcnt, 1);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d registered\n", space_id);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > +
> > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> > > +	}
> > > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> > 
> > What about
> > 
> > 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > 		mutex_unlock(&acpi_mutex_region);
> > 		return;
> > 	}
> > 
> 
> OK.
> 
> > > +	}
> > > +
> > > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > +	rgn->handler = NULL;
> > > +	rgn->setup = NULL;
> > > +	rgn->context = NULL;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	while (atomic_read(&rgn->refcnt) > 1)
> > > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> > 
> > Wouldn't it be better to use a wait queue here?
> 
> Yes, I'll try.

By the way, we do we need to do that?

> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > ACPI_REGION_UNREGISTERING);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d unregistered\n", space_id); }
> > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > a2c2fbb..15fad0d 100644
> > > --- a/include/acpi/acpi_bus.h
> > > +++ b/include/acpi/acpi_bus.h
> > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > *bus) { return 0; }
> > >
> > >  #endif				/* CONFIG_ACPI */
> > >
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context); void
> > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > +
> > >  #endif /*__ACPI_BUS_H__*/

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-26 14:00           ` Rafael J. Wysocki
  0 siblings, 0 replies; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-26 14:00 UTC (permalink / raw)
  To: Zheng, Lv
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

On Friday, July 26, 2013 12:47:44 AM Zheng, Lv wrote:
> 
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 4:27 AM
> > 
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers
> > > to fix races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > > the address space callbacks by invoking them under a module safe
> > environment.
> > > The IPMI address space handler is also upgraded in this patch.
> > > The acpi_unregister_region() is designed to meet the following
> > > requirements:
> > > 1. It acts as a barrier for operation region callbacks - no callback will
> > >    happen after acpi_unregister_region().
> > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > >    functions.
> > > Using reference counting rather than module referencing allows such
> > > benefits to be achieved even when acpi_unregister_region() is called
> > > in the environments other than module->exit().
> > > The header file of include/acpi/acpi_bus.h should contain the
> > > declarations that have references to some ACPICA defined types.
> > >
> > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > ---
> > >  drivers/acpi/acpi_ipmi.c |   16 ++--
> > >  drivers/acpi/osl.c       |  224
> > ++++++++++++++++++++++++++++++++++++++++++++++
> > >  include/acpi/acpi_bus.h  |    5 ++
> > >  3 files changed, 235 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > 5f8f495..2a09156 100644
> > > --- a/drivers/acpi/acpi_ipmi.c
> > > +++ b/drivers/acpi/acpi_ipmi.c
> > > @@ -539,20 +539,18 @@ out_ref:
> > >  static int __init acpi_ipmi_init(void)  {
> > >  	int result = 0;
> > > -	acpi_status status;
> > >
> > >  	if (acpi_disabled)
> > >  		return result;
> > >
> > >  	mutex_init(&driver_data.ipmi_lock);
> > >
> > > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > -						    ACPI_ADR_SPACE_IPMI,
> > > -						    &acpi_ipmi_space_handler,
> > > -						    NULL, NULL);
> > > -	if (ACPI_FAILURE(status)) {
> > > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > +				      &acpi_ipmi_space_handler,
> > > +				      NULL, NULL);
> > > +	if (result) {
> > >  		pr_warn("Can't register IPMI opregion space handle\n");
> > > -		return -EINVAL;
> > > +		return result;
> > >  	}
> > >
> > >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > >  	}
> > >  	mutex_unlock(&driver_data.ipmi_lock);
> > >
> > > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > -					  ACPI_ADR_SPACE_IPMI,
> > > -					  &acpi_ipmi_space_handler);
> > > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > >  }
> > >
> > >  module_init(acpi_ipmi_init);
> > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > 6ab2c35..8398e51 100644
> > > --- a/drivers/acpi/osl.c
> > > +++ b/drivers/acpi/osl.c
> > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;  static
> > > struct workqueue_struct *kacpi_notify_wq;  static struct
> > > workqueue_struct *kacpi_hotplug_wq;
> > >
> > > +struct acpi_region {
> > > +	unsigned long flags;
> > > +#define ACPI_REGION_DEFAULT		0x01
> > > +#define ACPI_REGION_INSTALLED		0x02
> > > +#define ACPI_REGION_REGISTERED		0x04
> > > +#define ACPI_REGION_UNREGISTERING	0x08
> > > +#define ACPI_REGION_INSTALLING		0x10
> > 
> > What about (1UL << 1), (1UL << 2) etc.?
> > 
> > Also please remove the #defines out of the struct definition.
> 
> OK.
> 
> > 
> > > +	/*
> > > +	 * NOTE: Upgrading All Region Handlers
> > > +	 * This flag is only used during the period where not all of the
> > > +	 * region handers are upgraded to the new interfaces.
> > > +	 */
> > > +#define ACPI_REGION_MANAGED		0x80
> > > +	acpi_adr_space_handler handler;
> > > +	acpi_adr_space_setup setup;
> > > +	void *context;
> > > +	/* Invoking references */
> > > +	atomic_t refcnt;
> > 
> > Actually, why don't you use krefs?
> 
> If you take a look at other piece of my codes, you'll find there are two reasons:
> 
> 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and there is no kref API to do so.

No, there's not any, but you can read kref.refcount directly, can't you?

Moreover, it is not entirely clear to me that doing the while (atomic_read() > 1)
is actually correct.

>   I just think it is not suitable for me to introduce such an API into kref.h and start another argument around kref designs in this bug fix patch. :-)
>   I'll start a discussion about kref design using another thread.

You don't need to do that at all.

> 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's kind of atomic_t coding style.
>   If atomic_t is changed to struct kref, I will need to implement two API, __ipmi_dev_release() to take a struct kref as parameter and call ipmi_dev_release inside it.
>   By not using kref, I needn't write codes to implement such API.

I'm not following you, sorry.

Please just use krefs for reference counting, the same way as you use
struct list_head for implementing lists.  This is the way everyone does
that in the kernel and that's for a reason.

Unless you do your reference counting under a lock, in which case using
atomic_t isn't necessary at all and you can use a non-atomic counter.

> > > +};
> > > +
> > > +static struct acpi_region acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > = {
> > > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > +		.flags = ACPI_REGION_DEFAULT,
> > > +	},
> > > +	[ACPI_ADR_SPACE_IPMI] = {
> > > +		.flags = ACPI_REGION_MANAGED,
> > > +	},
> > > +};
> > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > +
> > >  /*
> > >   * This list of permanent mappings is for memory that may be accessed
> > from
> > >   * interrupt context, where we can't do the ioremap().
> > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle handle,
> > u32 type, void *context,
> > >  		kfree(hp_work);
> > >  }
> > >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > +
> > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > +	/*
> > > +	 * NOTE: Default and Managed
> > > +	 * We only need to avoid region management on the regions managed
> > > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need additional
> > > +	 * check as many operation region handlers are not upgraded, so
> > > +	 * only those known to be safe are managed (ACPI_REGION_MANAGED).
> > > +	 */
> > > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > > +
> > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > +
> > > +static acpi_status
> > > +acpi_region_default_handler(u32 function,
> > > +			    acpi_physical_address address,
> > > +			    u32 bit_width, u64 *value,
> > > +			    void *handler_context, void *region_context) {
> > > +	acpi_adr_space_handler handler;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_NOT_EXIST;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	handler = rgn->handler;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = handler(function, address, bit_width, value, context,
> > > +			 region_context);
> > 
> > Why don't we call the handler under the mutex?
> > 
> > What exactly prevents context from becoming NULL before the call above?
> 
> It's a kind of programming style related concern.
> IMO, using locks around callback function is a buggy programming style that could lead to dead locks.
> Let me explain this using an example.
> 
> Object A exports a register/unregister API for other objects.
> Object B calls A's register/unregister API to register/unregister B's callback.
> It's likely that object B will hold lock_of_B around unregister/register when object B is destroyed/created, the lock_of_B is likely also used inside the callback.

Why is it likely to be used inside the callback?  Clearly, if a callback is
executed under a lock, that lock can't be acquired by that callback.

> So when object A holds the lock_of_A around the callback invocation, it leads to dead lock since:
> 1. the locking order for the register/unregister side will be: lock(lock_of_B), lock(lock_of_A)
> 2. the locking order for the callback side will be: lock(lock_of_A), lock(lock_of_B)
> They are in the reversed order!
> 
> IMO, Linux may need to introduce __callback, __api as decelerators for the functions, and use sparse to enforce this rule, sparse knows if a callback is invoked under some locks.

Oh, dear.  Yes, sparse knows such things, and so what?

> In the case of ACPICA space_handlers, as you may know, when an ACPI operation region handler is invoked, there will be no lock held inside ACPICA (interpreter lock must be freed before executing operation region handlers).
> So the likelihood of the dead lock is pretty much high here!

Sorry, what are you talking about?

Please let me rephrase my question: What *practical* problems would it lead to
if we executed this particular callback under this particular mutex?

Not *theoretical* in the general theory of everything, *practical* in this
particular piece of code.

And we are talking about a *global* mutex here, not something object-specific.

> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static acpi_status
> > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > +			  void *handler_context, void **region_context) {
> > > +	acpi_adr_space_setup setup;
> > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > +	void *context;
> > > +	acpi_status status = AE_OK;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return status;
> > > +	}
> > > +
> > > +	atomic_inc(&rgn->refcnt);
> > > +	setup = rgn->setup;
> > > +	context = rgn->context;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	status = setup(handle, function, context, region_context);
> > 
> > Can setup drop rgn->refcnt ?
> 
> The reason is same as the handler, as a setup is also a callback.

Let me rephrase: Is it legitimate for setup to modify rgn->refcnt?
If so, then why?

> > 
> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	return status;
> > > +}
> > > +
> > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > +				 acpi_adr_space_type space_id)
> > > +{
> > > +	int res = 0;
> > > +	acpi_status status;
> > > +	int installing = 0;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > > +		goto out_lock;
> > > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > +		res = -EBUSY;
> > > +		goto out_lock;
> > > +	}
> > > +
> > > +	installing = 1;
> > > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > space_id,
> > > +						    acpi_region_default_handler,
> > > +						    acpi_region_default_setup,
> > > +						    rgn);
> > > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > +	if (ACPI_FAILURE(status))
> > > +		res = -EINVAL;
> > > +	else
> > > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > > +
> > > +out_lock:
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +	if (installing) {
> > > +		if (res)
> > > +			pr_err("Failed to install region %d\n", space_id);
> > > +		else
> > > +			pr_info("Region %d installed\n", space_id);
> > > +	}
> > > +	return res;
> > > +}
> > > +
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context) {
> > > +	int res;
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return -EINVAL;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return -EINVAL;
> > > +
> > > +	res = __acpi_install_region(rgn, space_id);
> > > +	if (res)
> > > +		return res;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	rgn->handler = handler;
> > > +	rgn->setup = setup;
> > > +	rgn->context = context;
> > > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > > +	atomic_set(&rgn->refcnt, 1);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d registered\n", space_id);
> > > +
> > > +	return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > +
> > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > +	struct acpi_region *rgn;
> > > +
> > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > +		return;
> > > +
> > > +	rgn = &acpi_regions[space_id];
> > > +	if (!acpi_region_managed(rgn))
> > > +		return;
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> > > +	}
> > > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > +		mutex_unlock(&acpi_mutex_region);
> > > +		return;
> > 
> > What about
> > 
> > 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > 		mutex_unlock(&acpi_mutex_region);
> > 		return;
> > 	}
> > 
> 
> OK.
> 
> > > +	}
> > > +
> > > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > +	rgn->handler = NULL;
> > > +	rgn->setup = NULL;
> > > +	rgn->context = NULL;
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	while (atomic_read(&rgn->refcnt) > 1)
> > > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> > 
> > Wouldn't it be better to use a wait queue here?
> 
> Yes, I'll try.

By the way, we do we need to do that?

> > > +	atomic_dec(&rgn->refcnt);
> > > +
> > > +	mutex_lock(&acpi_mutex_region);
> > > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > ACPI_REGION_UNREGISTERING);
> > > +	mutex_unlock(&acpi_mutex_region);
> > > +
> > > +	pr_info("Region %d unregistered\n", space_id); }
> > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > a2c2fbb..15fad0d 100644
> > > --- a/include/acpi/acpi_bus.h
> > > +++ b/include/acpi/acpi_bus.h
> > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > *bus) { return 0; }
> > >
> > >  #endif				/* CONFIG_ACPI */
> > >
> > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > +			 acpi_adr_space_handler handler,
> > > +			 acpi_adr_space_setup setup, void *context); void
> > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > +
> > >  #endif /*__ACPI_BUS_H__*/

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* Re: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-26  1:54         ` Zheng, Lv
  (?)
  (?)
@ 2013-07-26 14:49         ` Rafael J. Wysocki
  2013-07-29  1:56             ` Zheng, Lv
  -1 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-07-26 14:49 UTC (permalink / raw)
  To: Zheng, Lv; +Cc: Wysocki, Rafael J, Brown, Len, linux-kernel, linux-acpi

On Friday, July 26, 2013 01:54:00 AM Zheng, Lv wrote:
> > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > Sent: Friday, July 26, 2013 5:29 AM
> > 
> > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > This patch adds reference couting for ACPI operation region handlers to fix
> > > races caused by the ACPICA address space callback invocations.
> > >
> > > ACPICA address space callback invocation is not suitable for Linux
> > > CONFIG_MODULE=y execution environment.
> > 
> > Actually, can you please explain to me what *exactly* the problem is?
> 
> OK.  I'll add race explanations in the next revision.
> 
> The problem is there is no "lock" held inside ACPICA for invoking operation
> region handlers.
> Thus races happens between the acpi_remove/install_address_space_handler and
> the handler/setup callbacks.

I see.  Now you're trying to introduce something that would prevent those
races from happening, right?

> This is correct per ACPI specification.
> As if there is interpreter locks held for invoking operation region handlers,
> the timeout implemented inside the operation region handlers will make all
> locking facilities (Acquire or Sleep,...) timed out.
> Please refer to ACPI specification "5.5.2 Control Method Execution":
> Interpretation of a Control Method is not preemptive, but it can block. When
> a control method does block, OSPM can initiate or continue the execution of
> a different control method. A control method can only assume that access to
> global objects is exclusive for any period the control method does not block.
> 
> So it is pretty much likely that ACPI IO transfers are locked inside the
> operation region callback implementations.
> Using locking facility to protect the callback invocation will risk dead locks.

No.  If you use a single global lock around all invocations of operation region
handlers, it won't deadlock, but it will *serialize* things.  This means that
there won't be two handlers executing in parallel.  That may or may not be bad
depending on what those handlers actually do.

Your concern seems to be that if one address space handler is buggy and it
blocks indefinitely, executing it under such a lock would affect the other
address space handlers and in my opinion this is a valid concern.

So the idea seems to be to add wrappers around acpi_install_address_space_handler()
and acpi_remove_address_space_handler (but I don't see where the latter is called
after the change?), such that they will know when it is safe to unregister the
handler.  That is simple enough.

However, I'm not sure it is needed in the context of IPMI.  Your address space
handler's context is NULL, so even it if is executed after
acpi_remove_address_space_handler() has been called for it (or in parallel),
it doesn't depend on anything passed by the caller, so I don't see why the
issue can't be addressed by a proper synchronization between
acpi_ipmi_exit() and acpi_ipmi_space_handler().

Clearly, acpi_ipmi_exit() should wait for all already running instances of
acpi_ipmi_space_handler() to complete and all acpi_ipmi_space_handler()
instances started after acpi_ipmi_exit() has been called must return
immediately.

I would imagine an algorithm like this:

acpi_ipmi_exit()
 1. Take "address space handler lock".
 2. Set "unregistering address space handler" flag.
 3. Check if "count of currently running handlers" is 0.  If so,
    call acpi_remove_address_space_handler(), drop the lock (possibly clear the
    flag) and return.
 4. Otherwise drop the lock and go to sleep in "address space handler wait queue".
 5. When woken up, take "address space handler lock" and go to 3.

acpi_ipmi_space_handler()
 1. Take "address space handler lock".
 2. Check "unregistering address space handler" flag.  If set, drop the lock
    and return.
 3. Increment "count of currently running handlers".
 4. Drop the lock.
 5. Do your work.
 6. Take "address space handler lock".
 7. Decrement "count of currently running handlers" and if 0, signal the
    tasks waiting on it to wake up.
 8. Drop the lock.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
  2013-07-26 13:38           ` Rafael J. Wysocki
@ 2013-07-29  1:12             ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-29  1:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> 
> On Friday, July 26, 2013 01:25:12 AM Zheng, Lv wrote:
> > > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > > Sent: Friday, July 26, 2013 6:26 AM
> > >
> > > On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > > > This is a trivial patch:
> > > > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> > > >    actually used.
> > > > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only
> used
> > > >    by dev_warn() invocations, so changes it to struct device.
> > > >
> > > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > > ---
> > > >  drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
> > > >  1 file changed, 14 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > > 0ee1ea6..7f93ffd 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> > > >  	struct list_head tx_msg_list;
> > > >  	spinlock_t	tx_msg_lock;
> > > >  	acpi_handle handle;
> > > > -	struct pnp_dev *pnp_dev;
> > > > +	struct device *dev;
> > > >  	ipmi_user_t	user_interface;
> > > >  	int ipmi_ifnum; /* IPMI interface number */
> > > >  	long curr_msgid;
> > > > -	struct ipmi_smi_info smi_data;
> > > >  	atomic_t refcnt;
> > > >  };
> > > >
> > > > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {  };
> > > >
> > > >  static struct acpi_ipmi_device *
> > > > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > > > handle)
> > > > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> > >
> > > Why is the second arg called pdev?
> >
> > OK, I will change it to dev.
> 
> OK, thanks.
> 
> > >
> > > >  {
> > > >  	struct acpi_ipmi_device *ipmi_device;
> > > >  	int err;
> > > > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> > > *smi_data, acpi_handle handle)
> > > >  	spin_lock_init(&ipmi_device->tx_msg_lock);
> > > >
> > > >  	ipmi_device->handle = handle;
> > > > -	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > > > -	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> > > ipmi_smi_info));
> > > > +	ipmi_device->dev = get_device(pdev);
> > > >  	ipmi_device->ipmi_ifnum = iface;
> > > >
> > > >  	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > > >  			       ipmi_device, &user);
> > > >  	if (err) {
> > > > -		put_device(smi_data->dev);
> > > > +		put_device(pdev);
> > > >  		kfree(ipmi_device);
> > > >  		return NULL;
> > > >  	}
> > > > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > > > *ipmi_device)  static void ipmi_dev_release(struct acpi_ipmi_device
> > > > *ipmi_device)  {
> > > >  	ipmi_destroy_user(ipmi_device->user_interface);
> > > > -	put_device(ipmi_device->smi_data.dev);
> > > > +	put_device(ipmi_device->dev);
> > > >  	kfree(ipmi_device);
> > > >  }
> > > >
> > > > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> > > acpi_ipmi_msg *tx_msg,
> > > >  	buffer = (struct acpi_ipmi_buffer *)value;
> > > >  	/* copy the tx message data */
> > > >  	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > > -		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > > > +		dev_WARN_ONCE(tx_msg->device->dev, true,
> > > >  			      "Unexpected request (msg len %d).\n",
> > > >  			      buffer->length);
> > > >  		return -EINVAL;
> > > > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> > > ipmi_recv_msg *msg, void *user_msg_data)
> > > >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > > >  	int msg_found = 0;
> > > >  	struct acpi_ipmi_msg *tx_msg;
> > > > -	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > > > +	struct device *dev = ipmi_device->dev;
> > > >  	unsigned long flags;
> > > >
> > > >  	if (msg->user != ipmi_device->user_interface) {
> > > > -		dev_warn(&pnp_dev->dev,
> > > > +		dev_warn(dev,
> > > >  			 "Unexpected response is returned. returned user %p,
> expected
> > > user %p\n",
> > > >  			 msg->user, ipmi_device->user_interface);
> > > >  		goto out_msg;
> > > > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg
> > > *msg, void *user_msg_data)
> > > >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > > >
> > > >  	if (!msg_found) {
> > > > -		dev_warn(&pnp_dev->dev,
> > > > +		dev_warn(dev,
> > > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > > >  			 msg->msgid);
> > > >  		goto out_msg;
> > > > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg
> > > > *msg, void *user_msg_data)
> > > >
> > > >  	/* copy the response data to Rx_data buffer */
> > > >  	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > > -		dev_WARN_ONCE(&pnp_dev->dev, true,
> > > > +		dev_WARN_ONCE(dev, true,
> > > >  			      "Unexpected response (msg len %d).\n",
> > > >  			      msg->msg.data_len);
> > > >  		goto out_comp;
> > > > @@ -431,7 +429,7 @@ out_msg:
> > > >  static void ipmi_register_bmc(int iface, struct device *dev)  {
> > > >  	struct acpi_ipmi_device *ipmi_device, *temp;
> > > > -	struct pnp_dev *pnp_dev;
> > > > +	struct device *pdev;
> > >
> > > And here?
> >
> > The dev is the parameter of the ipmi_register_bmc(), it is not possible to
> name the "struct ipmi_smi_info " as dev here for this quick fix.
> 
> Right.  What about smi_dev?  Or just use smi_data.dev directly?  It's just
> two
> places and shouldn't cause any line wraps to happen.

Sounds good, I'll take your advice. :-)

Thanks and best regards
-Lv

> 
> Rafael
> 
> 
> > > >  	int err;
> > > >  	struct ipmi_smi_info smi_data;
> > > >  	acpi_handle handle;
> > > > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> > > device *dev)
> > > >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> > > >  	if (!handle)
> > > >  		goto err_ref;
> > > > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > > > +	pdev = smi_data.dev;
> > > >
> > > > -	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > > > +	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> > > >  	if (!ipmi_device) {
> > > > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user
> interface\n");
> > > > +		dev_warn(pdev, "Can't create IPMI user interface\n");
> > > >  		goto err_ref;
> > > >  	}
> > > >
> > > >
> > > --
> > > I speak only for myself.
> > > Rafael J. Wysocki, Intel Open Source Technology Center.
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members
@ 2013-07-29  1:12             ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-29  1:12 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 7025 bytes --]

> From: linux-acpi-owner@vger.kernel.org
> [mailto:linux-acpi-owner@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> 
> On Friday, July 26, 2013 01:25:12 AM Zheng, Lv wrote:
> > > From: Rafael J. Wysocki [mailto:rjw@sisk.pl]
> > > Sent: Friday, July 26, 2013 6:26 AM
> > >
> > > On Tuesday, July 23, 2013 04:10:06 PM Lv Zheng wrote:
> > > > This is a trivial patch:
> > > > 1. Deletes a member of the acpi_ipmi_device - smi_data which is not
> > > >    actually used.
> > > > 2. Updates a member of the acpi_ipmi_device - pnp_dev which is only
> used
> > > >    by dev_warn() invocations, so changes it to struct device.
> > > >
> > > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > > ---
> > > >  drivers/acpi/acpi_ipmi.c |   30 ++++++++++++++----------------
> > > >  1 file changed, 14 insertions(+), 16 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > > 0ee1ea6..7f93ffd 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -63,11 +63,10 @@ struct acpi_ipmi_device {
> > > >  	struct list_head tx_msg_list;
> > > >  	spinlock_t	tx_msg_lock;
> > > >  	acpi_handle handle;
> > > > -	struct pnp_dev *pnp_dev;
> > > > +	struct device *dev;
> > > >  	ipmi_user_t	user_interface;
> > > >  	int ipmi_ifnum; /* IPMI interface number */
> > > >  	long curr_msgid;
> > > > -	struct ipmi_smi_info smi_data;
> > > >  	atomic_t refcnt;
> > > >  };
> > > >
> > > > @@ -132,7 +131,7 @@ static struct ipmi_driver_data driver_data = {  };
> > > >
> > > >  static struct acpi_ipmi_device *
> > > > -ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle
> > > > handle)
> > > > +ipmi_dev_alloc(int iface, struct device *pdev, acpi_handle handle)
> > >
> > > Why is the second arg called pdev?
> >
> > OK, I will change it to dev.
> 
> OK, thanks.
> 
> > >
> > > >  {
> > > >  	struct acpi_ipmi_device *ipmi_device;
> > > >  	int err;
> > > > @@ -148,14 +147,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info
> > > *smi_data, acpi_handle handle)
> > > >  	spin_lock_init(&ipmi_device->tx_msg_lock);
> > > >
> > > >  	ipmi_device->handle = handle;
> > > > -	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
> > > > -	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct
> > > ipmi_smi_info));
> > > > +	ipmi_device->dev = get_device(pdev);
> > > >  	ipmi_device->ipmi_ifnum = iface;
> > > >
> > > >  	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
> > > >  			       ipmi_device, &user);
> > > >  	if (err) {
> > > > -		put_device(smi_data->dev);
> > > > +		put_device(pdev);
> > > >  		kfree(ipmi_device);
> > > >  		return NULL;
> > > >  	}
> > > > @@ -175,7 +173,7 @@ acpi_ipmi_dev_get(struct acpi_ipmi_device
> > > > *ipmi_device)  static void ipmi_dev_release(struct acpi_ipmi_device
> > > > *ipmi_device)  {
> > > >  	ipmi_destroy_user(ipmi_device->user_interface);
> > > > -	put_device(ipmi_device->smi_data.dev);
> > > > +	put_device(ipmi_device->dev);
> > > >  	kfree(ipmi_device);
> > > >  }
> > > >
> > > > @@ -263,7 +261,7 @@ static int acpi_format_ipmi_request(struct
> > > acpi_ipmi_msg *tx_msg,
> > > >  	buffer = (struct acpi_ipmi_buffer *)value;
> > > >  	/* copy the tx message data */
> > > >  	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > > -		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
> > > > +		dev_WARN_ONCE(tx_msg->device->dev, true,
> > > >  			      "Unexpected request (msg len %d).\n",
> > > >  			      buffer->length);
> > > >  		return -EINVAL;
> > > > @@ -382,11 +380,11 @@ static void ipmi_msg_handler(struct
> > > ipmi_recv_msg *msg, void *user_msg_data)
> > > >  	struct acpi_ipmi_device *ipmi_device = user_msg_data;
> > > >  	int msg_found = 0;
> > > >  	struct acpi_ipmi_msg *tx_msg;
> > > > -	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
> > > > +	struct device *dev = ipmi_device->dev;
> > > >  	unsigned long flags;
> > > >
> > > >  	if (msg->user != ipmi_device->user_interface) {
> > > > -		dev_warn(&pnp_dev->dev,
> > > > +		dev_warn(dev,
> > > >  			 "Unexpected response is returned. returned user %p,
> expected
> > > user %p\n",
> > > >  			 msg->user, ipmi_device->user_interface);
> > > >  		goto out_msg;
> > > > @@ -404,7 +402,7 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg
> > > *msg, void *user_msg_data)
> > > >  	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
> > > >
> > > >  	if (!msg_found) {
> > > > -		dev_warn(&pnp_dev->dev,
> > > > +		dev_warn(dev,
> > > >  			 "Unexpected response (msg id %ld) is returned.\n",
> > > >  			 msg->msgid);
> > > >  		goto out_msg;
> > > > @@ -412,7 +410,7 @@ static void ipmi_msg_handler(struct
> ipmi_recv_msg
> > > > *msg, void *user_msg_data)
> > > >
> > > >  	/* copy the response data to Rx_data buffer */
> > > >  	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
> > > > -		dev_WARN_ONCE(&pnp_dev->dev, true,
> > > > +		dev_WARN_ONCE(dev, true,
> > > >  			      "Unexpected response (msg len %d).\n",
> > > >  			      msg->msg.data_len);
> > > >  		goto out_comp;
> > > > @@ -431,7 +429,7 @@ out_msg:
> > > >  static void ipmi_register_bmc(int iface, struct device *dev)  {
> > > >  	struct acpi_ipmi_device *ipmi_device, *temp;
> > > > -	struct pnp_dev *pnp_dev;
> > > > +	struct device *pdev;
> > >
> > > And here?
> >
> > The dev is the parameter of the ipmi_register_bmc(), it is not possible to
> name the "struct ipmi_smi_info " as dev here for this quick fix.
> 
> Right.  What about smi_dev?  Or just use smi_data.dev directly?  It's just
> two
> places and shouldn't cause any line wraps to happen.

Sounds good, I'll take your advice. :-)

Thanks and best regards
-Lv

> 
> Rafael
> 
> 
> > > >  	int err;
> > > >  	struct ipmi_smi_info smi_data;
> > > >  	acpi_handle handle;
> > > > @@ -445,11 +443,11 @@ static void ipmi_register_bmc(int iface, struct
> > > device *dev)
> > > >  	handle = smi_data.addr_info.acpi_info.acpi_handle;
> > > >  	if (!handle)
> > > >  		goto err_ref;
> > > > -	pnp_dev = to_pnp_dev(smi_data.dev);
> > > > +	pdev = smi_data.dev;
> > > >
> > > > -	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
> > > > +	ipmi_device = ipmi_dev_alloc(iface, pdev, handle);
> > > >  	if (!ipmi_device) {
> > > > -		dev_warn(&pnp_dev->dev, "Can't create IPMI user
> interface\n");
> > > > +		dev_warn(pdev, "Can't create IPMI user interface\n");
> > > >  		goto err_ref;
> > > >  	}
> > > >
> > > >
> > > --
> > > I speak only for myself.
> > > Rafael J. Wysocki, Intel Open Source Technology Center.
> >
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-26 14:00           ` Rafael J. Wysocki
@ 2013-07-29  1:43             ` Zheng, Lv
  -1 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-29  1:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

> On Friday, July 26, 2013 10:01 PM Rafael J. Wysocki wrote:
> > On Friday, July 26, 2013 12:47:44 AM Zheng, Lv wrote:
> >
> > > On Friday, July 26, 2013 4:27 AM Rafael J. Wysocki wrote:
> > >
> > > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > > This patch adds reference couting for ACPI operation region handlers
> > > > to fix races caused by the ACPICA address space callback invocations.
> > > >
> > > > ACPICA address space callback invocation is not suitable for Linux
> > > > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > > > the address space callbacks by invoking them under a module safe
> > > environment.
> > > > The IPMI address space handler is also upgraded in this patch.
> > > > The acpi_unregister_region() is designed to meet the following
> > > > requirements:
> > > > 1. It acts as a barrier for operation region callbacks - no callback will
> > > >    happen after acpi_unregister_region().
> > > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > > >    functions.
> > > > Using reference counting rather than module referencing allows such
> > > > benefits to be achieved even when acpi_unregister_region() is called
> > > > in the environments other than module->exit().
> > > > The header file of include/acpi/acpi_bus.h should contain the
> > > > declarations that have references to some ACPICA defined types.
> > > >
> > > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > > ---
> > > >  drivers/acpi/acpi_ipmi.c |   16 ++--
> > > >  drivers/acpi/osl.c       |  224
> > > ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  include/acpi/acpi_bus.h  |    5 ++
> > > >  3 files changed, 235 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > > 5f8f495..2a09156 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -539,20 +539,18 @@ out_ref:
> > > >  static int __init acpi_ipmi_init(void)  {
> > > >  	int result = 0;
> > > > -	acpi_status status;
> > > >
> > > >  	if (acpi_disabled)
> > > >  		return result;
> > > >
> > > >  	mutex_init(&driver_data.ipmi_lock);
> > > >
> > > > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > > -						    ACPI_ADR_SPACE_IPMI,
> > > > -						    &acpi_ipmi_space_handler,
> > > > -						    NULL, NULL);
> > > > -	if (ACPI_FAILURE(status)) {
> > > > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > > +				      &acpi_ipmi_space_handler,
> > > > +				      NULL, NULL);
> > > > +	if (result) {
> > > >  		pr_warn("Can't register IPMI opregion space handle\n");
> > > > -		return -EINVAL;
> > > > +		return result;
> > > >  	}
> > > >
> > > >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > > >  	}
> > > >  	mutex_unlock(&driver_data.ipmi_lock);
> > > >
> > > > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > > -					  ACPI_ADR_SPACE_IPMI,
> > > > -					  &acpi_ipmi_space_handler);
> > > > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > > >  }
> > > >
> > > >  module_init(acpi_ipmi_init);
> > > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > > 6ab2c35..8398e51 100644
> > > > --- a/drivers/acpi/osl.c
> > > > +++ b/drivers/acpi/osl.c
> > > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> static
> > > > struct workqueue_struct *kacpi_notify_wq;  static struct
> > > > workqueue_struct *kacpi_hotplug_wq;
> > > >
> > > > +struct acpi_region {
> > > > +	unsigned long flags;
> > > > +#define ACPI_REGION_DEFAULT		0x01
> > > > +#define ACPI_REGION_INSTALLED		0x02
> > > > +#define ACPI_REGION_REGISTERED		0x04
> > > > +#define ACPI_REGION_UNREGISTERING	0x08
> > > > +#define ACPI_REGION_INSTALLING		0x10
> > >
> > > What about (1UL << 1), (1UL << 2) etc.?
> > >
> > > Also please remove the #defines out of the struct definition.
> >
> > OK.
> >
> > >
> > > > +	/*
> > > > +	 * NOTE: Upgrading All Region Handlers
> > > > +	 * This flag is only used during the period where not all of the
> > > > +	 * region handers are upgraded to the new interfaces.
> > > > +	 */
> > > > +#define ACPI_REGION_MANAGED		0x80
> > > > +	acpi_adr_space_handler handler;
> > > > +	acpi_adr_space_setup setup;
> > > > +	void *context;
> > > > +	/* Invoking references */
> > > > +	atomic_t refcnt;
> > >
> > > Actually, why don't you use krefs?
> >
> > If you take a look at other piece of my codes, you'll find there are two
> reasons:
> >
> > 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and
> there is no kref API to do so.
> 
> No, there's not any, but you can read kref.refcount directly, can't you?
> 
> Moreover, it is not entirely clear to me that doing the while (atomic_read() > 1)
> is actually correct.
> 
> >   I just think it is not suitable for me to introduce such an API into kref.h and
> start another argument around kref designs in this bug fix patch. :-)
> >   I'll start a discussion about kref design using another thread.
> 
> You don't need to do that at all.
> 
> > 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's
> kind of atomic_t coding style.
> >   If atomic_t is changed to struct kref, I will need to implement two API,
> __ipmi_dev_release() to take a struct kref as parameter and call
> ipmi_dev_release inside it.
> >   By not using kref, I needn't write codes to implement such API.
> 
> I'm not following you, sorry.
> 
> Please just use krefs for reference counting, the same way as you use
> struct list_head for implementing lists.  This is the way everyone does
> that in the kernel and that's for a reason.
> 
> Unless you do your reference counting under a lock, in which case using
> atomic_t isn't necessary at all and you can use a non-atomic counter.

I'll follow your suggestion of kref.
You can find my concern 2 related stuff in the next revision.
It's trivial.

> 
> > > > +};
> > > > +
> > > > +static struct acpi_region
> acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > > = {
> > > > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > > +		.flags = ACPI_REGION_DEFAULT,
> > > > +	},
> > > > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > > +		.flags = ACPI_REGION_DEFAULT,
> > > > +	},
> > > > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > > +		.flags = ACPI_REGION_DEFAULT,
> > > > +	},
> > > > +	[ACPI_ADR_SPACE_IPMI] = {
> > > > +		.flags = ACPI_REGION_MANAGED,
> > > > +	},
> > > > +};
> > > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > > +
> > > >  /*
> > > >   * This list of permanent mappings is for memory that may be accessed
> > > from
> > > >   * interrupt context, where we can't do the ioremap().
> > > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle
> handle,
> > > u32 type, void *context,
> > > >  		kfree(hp_work);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > > +
> > > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > > +	/*
> > > > +	 * NOTE: Default and Managed
> > > > +	 * We only need to avoid region management on the regions
> managed
> > > > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need
> additional
> > > > +	 * check as many operation region handlers are not upgraded, so
> > > > +	 * only those known to be safe are managed
> (ACPI_REGION_MANAGED).
> > > > +	 */
> > > > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > > > +
> > > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_handler(u32 function,
> > > > +			    acpi_physical_address address,
> > > > +			    u32 bit_width, u64 *value,
> > > > +			    void *handler_context, void *region_context) {
> > > > +	acpi_adr_space_handler handler;
> > > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > +	void *context;
> > > > +	acpi_status status = AE_NOT_EXIST;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return status;
> > > > +	}
> > > > +
> > > > +	atomic_inc(&rgn->refcnt);
> > > > +	handler = rgn->handler;
> > > > +	context = rgn->context;
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	status = handler(function, address, bit_width, value, context,
> > > > +			 region_context);
> > >
> > > Why don't we call the handler under the mutex?
> > >
> > > What exactly prevents context from becoming NULL before the call above?
> >
> > It's a kind of programming style related concern.
> > IMO, using locks around callback function is a buggy programming style that
> could lead to dead locks.
> > Let me explain this using an example.
> >
> > Object A exports a register/unregister API for other objects.
> > Object B calls A's register/unregister API to register/unregister B's callback.
> > It's likely that object B will hold lock_of_B around unregister/register when
> object B is destroyed/created, the lock_of_B is likely also used inside the
> callback.
> 
> Why is it likely to be used inside the callback?  Clearly, if a callback is
> executed under a lock, that lock can't be acquired by that callback.

I think this is not related to the real purpose of why we must not hold a lock in this situation.
So let's ignore this paragraph.

> 
> > So when object A holds the lock_of_A around the callback invocation, it leads
> to dead lock since:
> > 1. the locking order for the register/unregister side will be: lock(lock_of_B),
> lock(lock_of_A)
> > 2. the locking order for the callback side will be: lock(lock_of_A),
> lock(lock_of_B)
> > They are in the reversed order!
> >
> > IMO, Linux may need to introduce __callback, __api as decelerators for the
> functions, and use sparse to enforce this rule, sparse knows if a callback is
> invoked under some locks.
> 
> Oh, dear.  Yes, sparse knows such things, and so what?

I was thinking sparse can give us warnings on __api marked function invocation where __acquire count is not 0, this might be mandatory for high quality codes.
And sparse can also give us warnings on __callback marked function invocations where __acquire count is not 0, this should be optional.
But since it is not related to our topic, let's ignore this paragraph.

> 
> > In the case of ACPICA space_handlers, as you may know, when an ACPI
> operation region handler is invoked, there will be no lock held inside ACPICA
> (interpreter lock must be freed before executing operation region handlers).
> > So the likelihood of the dead lock is pretty much high here!
> 
> Sorry, what are you talking about?
> 
> Please let me rephrase my question: What *practical* problems would it lead
> to
> if we executed this particular callback under this particular mutex?
> 
> Not *theoretical* in the general theory of everything, *practical* in this
> particular piece of code.
> 
> And we are talking about a *global* mutex here, not something object-specific.

I think you have additional replies on this in another email.
Let me reply you there.

> 
> > > > +	atomic_dec(&rgn->refcnt);
> > > > +
> > > > +	return status;
> > > > +}
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > > +			  void *handler_context, void **region_context) {
> > > > +	acpi_adr_space_setup setup;
> > > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > +	void *context;
> > > > +	acpi_status status = AE_OK;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return status;
> > > > +	}
> > > > +
> > > > +	atomic_inc(&rgn->refcnt);
> > > > +	setup = rgn->setup;
> > > > +	context = rgn->context;
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	status = setup(handle, function, context, region_context);
> > >
> > > Can setup drop rgn->refcnt ?
> >
> > The reason is same as the handler, as a setup is also a callback.
> 
> Let me rephrase: Is it legitimate for setup to modify rgn->refcnt?
> If so, then why?

Yes, the race is same as the handler.
When ACPICA is accessing the text segment of the setup function implementation, the module owns the setup function can also be unloaded as there is no lock hold before invoking setup - note that ExitInter also happens to setup invocations.

> 
> > >
> > > > +	atomic_dec(&rgn->refcnt);
> > > > +
> > > > +	return status;
> > > > +}
> > > > +
> > > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > > +				 acpi_adr_space_type space_id)
> > > > +{
> > > > +	int res = 0;
> > > > +	acpi_status status;
> > > > +	int installing = 0;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > > > +		goto out_lock;
> > > > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > > +		res = -EBUSY;
> > > > +		goto out_lock;
> > > > +	}
> > > > +
> > > > +	installing = 1;
> > > > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > > > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > space_id,
> > > > +						    acpi_region_default_handler,
> > > > +						    acpi_region_default_setup,
> > > > +						    rgn);
> > > > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > > +	if (ACPI_FAILURE(status))
> > > > +		res = -EINVAL;
> > > > +	else
> > > > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > > > +
> > > > +out_lock:
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +	if (installing) {
> > > > +		if (res)
> > > > +			pr_err("Failed to install region %d\n", space_id);
> > > > +		else
> > > > +			pr_info("Region %d installed\n", space_id);
> > > > +	}
> > > > +	return res;
> > > > +}
> > > > +
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > +			 acpi_adr_space_handler handler,
> > > > +			 acpi_adr_space_setup setup, void *context) {
> > > > +	int res;
> > > > +	struct acpi_region *rgn;
> > > > +
> > > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > +		return -EINVAL;
> > > > +
> > > > +	rgn = &acpi_regions[space_id];
> > > > +	if (!acpi_region_managed(rgn))
> > > > +		return -EINVAL;
> > > > +
> > > > +	res = __acpi_install_region(rgn, space_id);
> > > > +	if (res)
> > > > +		return res;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return -EBUSY;
> > > > +	}
> > > > +
> > > > +	rgn->handler = handler;
> > > > +	rgn->setup = setup;
> > > > +	rgn->context = context;
> > > > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > > > +	atomic_set(&rgn->refcnt, 1);
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	pr_info("Region %d registered\n", space_id);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > > +
> > > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > > +	struct acpi_region *rgn;
> > > > +
> > > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > +		return;
> > > > +
> > > > +	rgn = &acpi_regions[space_id];
> > > > +	if (!acpi_region_managed(rgn))
> > > > +		return;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return;
> > > > +	}
> > > > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return;
> > >
> > > What about
> > >
> > > 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > > 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > 		mutex_unlock(&acpi_mutex_region);
> > > 		return;
> > > 	}
> > >
> >
> > OK.
> >
> > > > +	}
> > > > +
> > > > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > > +	rgn->handler = NULL;
> > > > +	rgn->setup = NULL;
> > > > +	rgn->context = NULL;
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	while (atomic_read(&rgn->refcnt) > 1)
> > > > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> > >
> > > Wouldn't it be better to use a wait queue here?
> >
> > Yes, I'll try.
> 
> By the way, we do we need to do that?

I think you have additional replies on this in another email.
Let me reply you there.

Thanks for commenting.

Best regards
-Lv

> 
> > > > +	atomic_dec(&rgn->refcnt);
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > > ACPI_REGION_UNREGISTERING);
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	pr_info("Region %d unregistered\n", space_id); }
> > > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > > a2c2fbb..15fad0d 100644
> > > > --- a/include/acpi/acpi_bus.h
> > > > +++ b/include/acpi/acpi_bus.h
> > > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > > *bus) { return 0; }
> > > >
> > > >  #endif				/* CONFIG_ACPI */
> > > >
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > +			 acpi_adr_space_handler handler,
> > > > +			 acpi_adr_space_setup setup, void *context); void
> > > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > > +
> > > >  #endif /*__ACPI_BUS_H__*/
> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-29  1:43             ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-29  1:43 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-kernel, stable, linux-acpi, openipmi-developer

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 17763 bytes --]

> On Friday, July 26, 2013 10:01 PM Rafael J. Wysocki wrote:
> > On Friday, July 26, 2013 12:47:44 AM Zheng, Lv wrote:
> >
> > > On Friday, July 26, 2013 4:27 AM Rafael J. Wysocki wrote:
> > >
> > > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > > This patch adds reference couting for ACPI operation region handlers
> > > > to fix races caused by the ACPICA address space callback invocations.
> > > >
> > > > ACPICA address space callback invocation is not suitable for Linux
> > > > CONFIG_MODULE=y execution environment.  This patch tries to protect
> > > > the address space callbacks by invoking them under a module safe
> > > environment.
> > > > The IPMI address space handler is also upgraded in this patch.
> > > > The acpi_unregister_region() is designed to meet the following
> > > > requirements:
> > > > 1. It acts as a barrier for operation region callbacks - no callback will
> > > >    happen after acpi_unregister_region().
> > > > 2. acpi_unregister_region() is safe to be called in moudle->exit()
> > > >    functions.
> > > > Using reference counting rather than module referencing allows such
> > > > benefits to be achieved even when acpi_unregister_region() is called
> > > > in the environments other than module->exit().
> > > > The header file of include/acpi/acpi_bus.h should contain the
> > > > declarations that have references to some ACPICA defined types.
> > > >
> > > > Signed-off-by: Lv Zheng <lv.zheng@intel.com>
> > > > Reviewed-by: Huang Ying <ying.huang@intel.com>
> > > > ---
> > > >  drivers/acpi/acpi_ipmi.c |   16 ++--
> > > >  drivers/acpi/osl.c       |  224
> > > ++++++++++++++++++++++++++++++++++++++++++++++
> > > >  include/acpi/acpi_bus.h  |    5 ++
> > > >  3 files changed, 235 insertions(+), 10 deletions(-)
> > > >
> > > > diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index
> > > > 5f8f495..2a09156 100644
> > > > --- a/drivers/acpi/acpi_ipmi.c
> > > > +++ b/drivers/acpi/acpi_ipmi.c
> > > > @@ -539,20 +539,18 @@ out_ref:
> > > >  static int __init acpi_ipmi_init(void)  {
> > > >  	int result = 0;
> > > > -	acpi_status status;
> > > >
> > > >  	if (acpi_disabled)
> > > >  		return result;
> > > >
> > > >  	mutex_init(&driver_data.ipmi_lock);
> > > >
> > > > -	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > > -						    ACPI_ADR_SPACE_IPMI,
> > > > -						    &acpi_ipmi_space_handler,
> > > > -						    NULL, NULL);
> > > > -	if (ACPI_FAILURE(status)) {
> > > > +	result = acpi_register_region(ACPI_ADR_SPACE_IPMI,
> > > > +				      &acpi_ipmi_space_handler,
> > > > +				      NULL, NULL);
> > > > +	if (result) {
> > > >  		pr_warn("Can't register IPMI opregion space handle\n");
> > > > -		return -EINVAL;
> > > > +		return result;
> > > >  	}
> > > >
> > > >  	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
> > > > @@ -596,9 +594,7 @@ static void __exit acpi_ipmi_exit(void)
> > > >  	}
> > > >  	mutex_unlock(&driver_data.ipmi_lock);
> > > >
> > > > -	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
> > > > -					  ACPI_ADR_SPACE_IPMI,
> > > > -					  &acpi_ipmi_space_handler);
> > > > +	acpi_unregister_region(ACPI_ADR_SPACE_IPMI);
> > > >  }
> > > >
> > > >  module_init(acpi_ipmi_init);
> > > > diff --git a/drivers/acpi/osl.c b/drivers/acpi/osl.c index
> > > > 6ab2c35..8398e51 100644
> > > > --- a/drivers/acpi/osl.c
> > > > +++ b/drivers/acpi/osl.c
> > > > @@ -86,6 +86,42 @@ static struct workqueue_struct *kacpid_wq;
> static
> > > > struct workqueue_struct *kacpi_notify_wq;  static struct
> > > > workqueue_struct *kacpi_hotplug_wq;
> > > >
> > > > +struct acpi_region {
> > > > +	unsigned long flags;
> > > > +#define ACPI_REGION_DEFAULT		0x01
> > > > +#define ACPI_REGION_INSTALLED		0x02
> > > > +#define ACPI_REGION_REGISTERED		0x04
> > > > +#define ACPI_REGION_UNREGISTERING	0x08
> > > > +#define ACPI_REGION_INSTALLING		0x10
> > >
> > > What about (1UL << 1), (1UL << 2) etc.?
> > >
> > > Also please remove the #defines out of the struct definition.
> >
> > OK.
> >
> > >
> > > > +	/*
> > > > +	 * NOTE: Upgrading All Region Handlers
> > > > +	 * This flag is only used during the period where not all of the
> > > > +	 * region handers are upgraded to the new interfaces.
> > > > +	 */
> > > > +#define ACPI_REGION_MANAGED		0x80
> > > > +	acpi_adr_space_handler handler;
> > > > +	acpi_adr_space_setup setup;
> > > > +	void *context;
> > > > +	/* Invoking references */
> > > > +	atomic_t refcnt;
> > >
> > > Actually, why don't you use krefs?
> >
> > If you take a look at other piece of my codes, you'll find there are two
> reasons:
> >
> > 1. I'm using while (atomic_read() > 1) to implement the objects' flushing and
> there is no kref API to do so.
> 
> No, there's not any, but you can read kref.refcount directly, can't you?
> 
> Moreover, it is not entirely clear to me that doing the while (atomic_read() > 1)
> is actually correct.
> 
> >   I just think it is not suitable for me to introduce such an API into kref.h and
> start another argument around kref designs in this bug fix patch. :-)
> >   I'll start a discussion about kref design using another thread.
> 
> You don't need to do that at all.
> 
> > 2. I'm using ipmi_dev|msg_release() as a pair of ipmi_dev|msg_alloc(), it's
> kind of atomic_t coding style.
> >   If atomic_t is changed to struct kref, I will need to implement two API,
> __ipmi_dev_release() to take a struct kref as parameter and call
> ipmi_dev_release inside it.
> >   By not using kref, I needn't write codes to implement such API.
> 
> I'm not following you, sorry.
> 
> Please just use krefs for reference counting, the same way as you use
> struct list_head for implementing lists.  This is the way everyone does
> that in the kernel and that's for a reason.
> 
> Unless you do your reference counting under a lock, in which case using
> atomic_t isn't necessary at all and you can use a non-atomic counter.

I'll follow your suggestion of kref.
You can find my concern 2 related stuff in the next revision.
It's trivial.

> 
> > > > +};
> > > > +
> > > > +static struct acpi_region
> acpi_regions[ACPI_NUM_PREDEFINED_REGIONS]
> > > = {
> > > > +	[ACPI_ADR_SPACE_SYSTEM_MEMORY] = {
> > > > +		.flags = ACPI_REGION_DEFAULT,
> > > > +	},
> > > > +	[ACPI_ADR_SPACE_SYSTEM_IO] = {
> > > > +		.flags = ACPI_REGION_DEFAULT,
> > > > +	},
> > > > +	[ACPI_ADR_SPACE_PCI_CONFIG] = {
> > > > +		.flags = ACPI_REGION_DEFAULT,
> > > > +	},
> > > > +	[ACPI_ADR_SPACE_IPMI] = {
> > > > +		.flags = ACPI_REGION_MANAGED,
> > > > +	},
> > > > +};
> > > > +static DEFINE_MUTEX(acpi_mutex_region);
> > > > +
> > > >  /*
> > > >   * This list of permanent mappings is for memory that may be accessed
> > > from
> > > >   * interrupt context, where we can't do the ioremap().
> > > > @@ -1799,3 +1835,191 @@ void alloc_acpi_hp_work(acpi_handle
> handle,
> > > u32 type, void *context,
> > > >  		kfree(hp_work);
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(alloc_acpi_hp_work);
> > > > +
> > > > +static bool acpi_region_managed(struct acpi_region *rgn) {
> > > > +	/*
> > > > +	 * NOTE: Default and Managed
> > > > +	 * We only need to avoid region management on the regions
> managed
> > > > +	 * by ACPICA (ACPI_REGION_DEFAULT).  Currently, we need
> additional
> > > > +	 * check as many operation region handlers are not upgraded, so
> > > > +	 * only those known to be safe are managed
> (ACPI_REGION_MANAGED).
> > > > +	 */
> > > > +	return !(rgn->flags & ACPI_REGION_DEFAULT) &&
> > > > +	       (rgn->flags & ACPI_REGION_MANAGED); }
> > > > +
> > > > +static bool acpi_region_callable(struct acpi_region *rgn) {
> > > > +	return (rgn->flags & ACPI_REGION_REGISTERED) &&
> > > > +	       !(rgn->flags & ACPI_REGION_UNREGISTERING); }
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_handler(u32 function,
> > > > +			    acpi_physical_address address,
> > > > +			    u32 bit_width, u64 *value,
> > > > +			    void *handler_context, void *region_context) {
> > > > +	acpi_adr_space_handler handler;
> > > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > +	void *context;
> > > > +	acpi_status status = AE_NOT_EXIST;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (!acpi_region_callable(rgn) || !rgn->handler) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return status;
> > > > +	}
> > > > +
> > > > +	atomic_inc(&rgn->refcnt);
> > > > +	handler = rgn->handler;
> > > > +	context = rgn->context;
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	status = handler(function, address, bit_width, value, context,
> > > > +			 region_context);
> > >
> > > Why don't we call the handler under the mutex?
> > >
> > > What exactly prevents context from becoming NULL before the call above?
> >
> > It's a kind of programming style related concern.
> > IMO, using locks around callback function is a buggy programming style that
> could lead to dead locks.
> > Let me explain this using an example.
> >
> > Object A exports a register/unregister API for other objects.
> > Object B calls A's register/unregister API to register/unregister B's callback.
> > It's likely that object B will hold lock_of_B around unregister/register when
> object B is destroyed/created, the lock_of_B is likely also used inside the
> callback.
> 
> Why is it likely to be used inside the callback?  Clearly, if a callback is
> executed under a lock, that lock can't be acquired by that callback.

I think this is not related to the real purpose of why we must not hold a lock in this situation.
So let's ignore this paragraph.

> 
> > So when object A holds the lock_of_A around the callback invocation, it leads
> to dead lock since:
> > 1. the locking order for the register/unregister side will be: lock(lock_of_B),
> lock(lock_of_A)
> > 2. the locking order for the callback side will be: lock(lock_of_A),
> lock(lock_of_B)
> > They are in the reversed order!
> >
> > IMO, Linux may need to introduce __callback, __api as decelerators for the
> functions, and use sparse to enforce this rule, sparse knows if a callback is
> invoked under some locks.
> 
> Oh, dear.  Yes, sparse knows such things, and so what?

I was thinking sparse can give us warnings on __api marked function invocation where __acquire count is not 0, this might be mandatory for high quality codes.
And sparse can also give us warnings on __callback marked function invocations where __acquire count is not 0, this should be optional.
But since it is not related to our topic, let's ignore this paragraph.

> 
> > In the case of ACPICA space_handlers, as you may know, when an ACPI
> operation region handler is invoked, there will be no lock held inside ACPICA
> (interpreter lock must be freed before executing operation region handlers).
> > So the likelihood of the dead lock is pretty much high here!
> 
> Sorry, what are you talking about?
> 
> Please let me rephrase my question: What *practical* problems would it lead
> to
> if we executed this particular callback under this particular mutex?
> 
> Not *theoretical* in the general theory of everything, *practical* in this
> particular piece of code.
> 
> And we are talking about a *global* mutex here, not something object-specific.

I think you have additional replies on this in another email.
Let me reply you there.

> 
> > > > +	atomic_dec(&rgn->refcnt);
> > > > +
> > > > +	return status;
> > > > +}
> > > > +
> > > > +static acpi_status
> > > > +acpi_region_default_setup(acpi_handle handle, u32 function,
> > > > +			  void *handler_context, void **region_context) {
> > > > +	acpi_adr_space_setup setup;
> > > > +	struct acpi_region *rgn = (struct acpi_region *)handler_context;
> > > > +	void *context;
> > > > +	acpi_status status = AE_OK;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (!acpi_region_callable(rgn) || !rgn->setup) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return status;
> > > > +	}
> > > > +
> > > > +	atomic_inc(&rgn->refcnt);
> > > > +	setup = rgn->setup;
> > > > +	context = rgn->context;
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	status = setup(handle, function, context, region_context);
> > >
> > > Can setup drop rgn->refcnt ?
> >
> > The reason is same as the handler, as a setup is also a callback.
> 
> Let me rephrase: Is it legitimate for setup to modify rgn->refcnt?
> If so, then why?

Yes, the race is same as the handler.
When ACPICA is accessing the text segment of the setup function implementation, the module owns the setup function can also be unloaded as there is no lock hold before invoking setup - note that ExitInter also happens to setup invocations.

> 
> > >
> > > > +	atomic_dec(&rgn->refcnt);
> > > > +
> > > > +	return status;
> > > > +}
> > > > +
> > > > +static int __acpi_install_region(struct acpi_region *rgn,
> > > > +				 acpi_adr_space_type space_id)
> > > > +{
> > > > +	int res = 0;
> > > > +	acpi_status status;
> > > > +	int installing = 0;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (rgn->flags & ACPI_REGION_INSTALLED)
> > > > +		goto out_lock;
> > > > +	if (rgn->flags & ACPI_REGION_INSTALLING) {
> > > > +		res = -EBUSY;
> > > > +		goto out_lock;
> > > > +	}
> > > > +
> > > > +	installing = 1;
> > > > +	rgn->flags |= ACPI_REGION_INSTALLING;
> > > > +	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
> > > space_id,
> > > > +						    acpi_region_default_handler,
> > > > +						    acpi_region_default_setup,
> > > > +						    rgn);
> > > > +	rgn->flags &= ~ACPI_REGION_INSTALLING;
> > > > +	if (ACPI_FAILURE(status))
> > > > +		res = -EINVAL;
> > > > +	else
> > > > +		rgn->flags |= ACPI_REGION_INSTALLED;
> > > > +
> > > > +out_lock:
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +	if (installing) {
> > > > +		if (res)
> > > > +			pr_err("Failed to install region %d\n", space_id);
> > > > +		else
> > > > +			pr_info("Region %d installed\n", space_id);
> > > > +	}
> > > > +	return res;
> > > > +}
> > > > +
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > +			 acpi_adr_space_handler handler,
> > > > +			 acpi_adr_space_setup setup, void *context) {
> > > > +	int res;
> > > > +	struct acpi_region *rgn;
> > > > +
> > > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > +		return -EINVAL;
> > > > +
> > > > +	rgn = &acpi_regions[space_id];
> > > > +	if (!acpi_region_managed(rgn))
> > > > +		return -EINVAL;
> > > > +
> > > > +	res = __acpi_install_region(rgn, space_id);
> > > > +	if (res)
> > > > +		return res;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (rgn->flags & ACPI_REGION_REGISTERED) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return -EBUSY;
> > > > +	}
> > > > +
> > > > +	rgn->handler = handler;
> > > > +	rgn->setup = setup;
> > > > +	rgn->context = context;
> > > > +	rgn->flags |= ACPI_REGION_REGISTERED;
> > > > +	atomic_set(&rgn->refcnt, 1);
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	pr_info("Region %d registered\n", space_id);
> > > > +
> > > > +	return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(acpi_register_region);
> > > > +
> > > > +void acpi_unregister_region(acpi_adr_space_type space_id) {
> > > > +	struct acpi_region *rgn;
> > > > +
> > > > +	if (space_id >= ACPI_NUM_PREDEFINED_REGIONS)
> > > > +		return;
> > > > +
> > > > +	rgn = &acpi_regions[space_id];
> > > > +	if (!acpi_region_managed(rgn))
> > > > +		return;
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	if (!(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return;
> > > > +	}
> > > > +	if (rgn->flags & ACPI_REGION_UNREGISTERING) {
> > > > +		mutex_unlock(&acpi_mutex_region);
> > > > +		return;
> > >
> > > What about
> > >
> > > 	if ((rgn->flags & ACPI_REGION_UNREGISTERING)
> > > 	    || !(rgn->flags & ACPI_REGION_REGISTERED)) {
> > > 		mutex_unlock(&acpi_mutex_region);
> > > 		return;
> > > 	}
> > >
> >
> > OK.
> >
> > > > +	}
> > > > +
> > > > +	rgn->flags |= ACPI_REGION_UNREGISTERING;
> > > > +	rgn->handler = NULL;
> > > > +	rgn->setup = NULL;
> > > > +	rgn->context = NULL;
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	while (atomic_read(&rgn->refcnt) > 1)
> > > > +		schedule_timeout_uninterruptible(usecs_to_jiffies(5));
> > >
> > > Wouldn't it be better to use a wait queue here?
> >
> > Yes, I'll try.
> 
> By the way, we do we need to do that?

I think you have additional replies on this in another email.
Let me reply you there.

Thanks for commenting.

Best regards
-Lv

> 
> > > > +	atomic_dec(&rgn->refcnt);
> > > > +
> > > > +	mutex_lock(&acpi_mutex_region);
> > > > +	rgn->flags &= ~(ACPI_REGION_REGISTERED |
> > > ACPI_REGION_UNREGISTERING);
> > > > +	mutex_unlock(&acpi_mutex_region);
> > > > +
> > > > +	pr_info("Region %d unregistered\n", space_id); }
> > > > +EXPORT_SYMBOL_GPL(acpi_unregister_region);
> > > > diff --git a/include/acpi/acpi_bus.h b/include/acpi/acpi_bus.h index
> > > > a2c2fbb..15fad0d 100644
> > > > --- a/include/acpi/acpi_bus.h
> > > > +++ b/include/acpi/acpi_bus.h
> > > > @@ -542,4 +542,9 @@ static inline int unregister_acpi_bus_type(void
> > > > *bus) { return 0; }
> > > >
> > > >  #endif				/* CONFIG_ACPI */
> > > >
> > > > +int acpi_register_region(acpi_adr_space_type space_id,
> > > > +			 acpi_adr_space_handler handler,
> > > > +			 acpi_adr_space_setup setup, void *context); void
> > > > +acpi_unregister_region(acpi_adr_space_type space_id);
> > > > +
> > > >  #endif /*__ACPI_BUS_H__*/
> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
  2013-07-26 14:49         ` Rafael J. Wysocki
@ 2013-07-29  1:56             ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-29  1:56 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Wysocki, Rafael J, Brown, Len, linux-kernel, linux-acpi

> On Friday, July 26, 2013 10:49 PM Rafael J. Wysocki wrote:
> > On Friday, July 26, 2013 01:54:00 AM Zheng, Lv wrote:
> > > On Friday, July 26, 2013 5:29 AM Rafael J. Wysocki wrote:
> > > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > > This patch adds reference couting for ACPI operation region
> > > > handlers to fix races caused by the ACPICA address space callback
> invocations.
> > > >
> > > > ACPICA address space callback invocation is not suitable for Linux
> > > > CONFIG_MODULE=y execution environment.
> > >
> > > Actually, can you please explain to me what *exactly* the problem is?
> >
> > OK.  I'll add race explanations in the next revision.
> >
> > The problem is there is no "lock" held inside ACPICA for invoking
> > operation region handlers.
> > Thus races happens between the
> > acpi_remove/install_address_space_handler and the handler/setup
> callbacks.
> 
> I see.  Now you're trying to introduce something that would prevent those
> races from happening, right?

Yes. Let me explain this later in this email.

> 
> > This is correct per ACPI specification.
> > As if there is interpreter locks held for invoking operation region
> > handlers, the timeout implemented inside the operation region handlers
> > will make all locking facilities (Acquire or Sleep,...) timed out.
> > Please refer to ACPI specification "5.5.2 Control Method Execution":
> > Interpretation of a Control Method is not preemptive, but it can
> > block. When a control method does block, OSPM can initiate or continue
> > the execution of a different control method. A control method can only
> > assume that access to global objects is exclusive for any period the control
> method does not block.
> >
> > So it is pretty much likely that ACPI IO transfers are locked inside
> > the operation region callback implementations.
> > Using locking facility to protect the callback invocation will risk dead locks.
> 
> No.  If you use a single global lock around all invocations of operation region
> handlers, it won't deadlock, but it will *serialize* things.  This means that
> there won't be two handlers executing in parallel.  That may or may not be
> bad depending on what those handlers actually do.
> 
> Your concern seems to be that if one address space handler is buggy and it
> blocks indefinitely, executing it under such a lock would affect the other address
> space handlers and in my opinion this is a valid concern.

It can be expressed in more detailed ways:

The interpreter runs control methods in the following style according to the ACPI spec.
CM1_Enter -> EnterInter -> CM1_Running -> OpRegion1 -> ExitInter -> EnterInter                            -> CM1_running -> ExitInter -> CM1_Exit
CM2_Enter -> EnterInter ->                                       -> CM2_Running -> OpRegion1 -> ExitInter -> EnterInter               -> CM2_running -> ExitInter -> CM2_Exit

EnterInter: Enter interpreter lock
ExitInter: Leave interpreter lock

Let me introduce two situations:

1. If we hold global "mutex" before "EnterInter", then no second control method can be run "NotSerialized".
If the CM1 just have some codes waiting for a hardware flag and CM2 can access other hardware IOs to trigger this flag, then nothing can happen any longer.
This is a practical bug as what we have already seen in "NotSerialized" marked ACPI control methods behave in the interpreter mode executed in serialized way - kernel parameter "acpi_serialize".

2. If we hold global "mutex" after "EnterInter" and Before OpRegion1
If we do things this way, then all IO accesses are serialized, if we have something in an IPMI operation region failed due to timeout, then any other system IOs that should happen in parallel will just happen after 5 seconds.  This is not an acceptable experience.

> 
> So the idea seems to be to add wrappers around
> acpi_install_address_space_handler()
> and acpi_remove_address_space_handler (but I don't see where the latter is
> called after the change?), such that they will know when it is safe to unregister
> the handler.  That is simple enough.

An obvious bug, it should be put between the while (atomic_read() > 1) block and the final atomic_dec().

> However, I'm not sure it is needed in the context of IPMI.

I think I do this just because I need a quick fix to test IPMI bug-fix series.
The issue is highly related to ACPI interpreter design, and codes should be implemented inside ACPICA.
And there is not only ACPI_ROOT_OBJECT based address space handlers, but also non-ACPI_ROOT_OBJECT based address space handlers, this patch can't protect the latter ones.

> Your address space
> handler's context is NULL, so even it if is executed after
> acpi_remove_address_space_handler() has been called for it (or in parallel), it
> doesn't depend on anything passed by the caller, so I don't see why the issue
> can't be addressed by a proper synchronization between
> acpi_ipmi_exit() and acpi_ipmi_space_handler().
> 
> Clearly, acpi_ipmi_exit() should wait for all already running instances of
> acpi_ipmi_space_handler() to complete and all acpi_ipmi_space_handler()
> instances started after acpi_ipmi_exit() has been called must return
> immediately.
> 
> I would imagine an algorithm like this:
> 
> acpi_ipmi_exit()
>  1. Take "address space handler lock".
>  2. Set "unregistering address space handler" flag.
>  3. Check if "count of currently running handlers" is 0.  If so,
>     call acpi_remove_address_space_handler(), drop the lock (possibly clear
> the
>     flag) and return.
>  4. Otherwise drop the lock and go to sleep in "address space handler wait
> queue".
>  5. When woken up, take "address space handler lock" and go to 3.
> 
> acpi_ipmi_space_handler()
>  1. Take "address space handler lock".
>  2. Check "unregistering address space handler" flag.  If set, drop the lock
>     and return.
>  3. Increment "count of currently running handlers".
>  4. Drop the lock.
>  5. Do your work.
>  6. Take "address space handler lock".
>  7. Decrement "count of currently running handlers" and if 0, signal the
>     tasks waiting on it to wake up.
>  8. Drop the lock.

Yes, it can also work, but the fix will go inside IPMI.
And I agree that the codes should not appear in the IPMI context since the issue is highly ACPI interpreter related.
What if we just stop doing any further work on this patch and just mark it as RFC or a test patch for information purpose.
It is only useful for the testers.

Thanks and best regards
-Lv

> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers
@ 2013-07-29  1:56             ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-07-29  1:56 UTC (permalink / raw)
  To: Rafael J. Wysocki; +Cc: Wysocki, Rafael J, Brown, Len, linux-kernel, linux-acpi

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6777 bytes --]

> On Friday, July 26, 2013 10:49 PM Rafael J. Wysocki wrote:
> > On Friday, July 26, 2013 01:54:00 AM Zheng, Lv wrote:
> > > On Friday, July 26, 2013 5:29 AM Rafael J. Wysocki wrote:
> > > On Tuesday, July 23, 2013 04:09:43 PM Lv Zheng wrote:
> > > > This patch adds reference couting for ACPI operation region
> > > > handlers to fix races caused by the ACPICA address space callback
> invocations.
> > > >
> > > > ACPICA address space callback invocation is not suitable for Linux
> > > > CONFIG_MODULE=y execution environment.
> > >
> > > Actually, can you please explain to me what *exactly* the problem is?
> >
> > OK.  I'll add race explanations in the next revision.
> >
> > The problem is there is no "lock" held inside ACPICA for invoking
> > operation region handlers.
> > Thus races happens between the
> > acpi_remove/install_address_space_handler and the handler/setup
> callbacks.
> 
> I see.  Now you're trying to introduce something that would prevent those
> races from happening, right?

Yes. Let me explain this later in this email.

> 
> > This is correct per ACPI specification.
> > As if there is interpreter locks held for invoking operation region
> > handlers, the timeout implemented inside the operation region handlers
> > will make all locking facilities (Acquire or Sleep,...) timed out.
> > Please refer to ACPI specification "5.5.2 Control Method Execution":
> > Interpretation of a Control Method is not preemptive, but it can
> > block. When a control method does block, OSPM can initiate or continue
> > the execution of a different control method. A control method can only
> > assume that access to global objects is exclusive for any period the control
> method does not block.
> >
> > So it is pretty much likely that ACPI IO transfers are locked inside
> > the operation region callback implementations.
> > Using locking facility to protect the callback invocation will risk dead locks.
> 
> No.  If you use a single global lock around all invocations of operation region
> handlers, it won't deadlock, but it will *serialize* things.  This means that
> there won't be two handlers executing in parallel.  That may or may not be
> bad depending on what those handlers actually do.
> 
> Your concern seems to be that if one address space handler is buggy and it
> blocks indefinitely, executing it under such a lock would affect the other address
> space handlers and in my opinion this is a valid concern.

It can be expressed in more detailed ways:

The interpreter runs control methods in the following style according to the ACPI spec.
CM1_Enter -> EnterInter -> CM1_Running -> OpRegion1 -> ExitInter -> EnterInter                            -> CM1_running -> ExitInter -> CM1_Exit
CM2_Enter -> EnterInter ->                                       -> CM2_Running -> OpRegion1 -> ExitInter -> EnterInter               -> CM2_running -> ExitInter -> CM2_Exit

EnterInter: Enter interpreter lock
ExitInter: Leave interpreter lock

Let me introduce two situations:

1. If we hold global "mutex" before "EnterInter", then no second control method can be run "NotSerialized".
If the CM1 just have some codes waiting for a hardware flag and CM2 can access other hardware IOs to trigger this flag, then nothing can happen any longer.
This is a practical bug as what we have already seen in "NotSerialized" marked ACPI control methods behave in the interpreter mode executed in serialized way - kernel parameter "acpi_serialize".

2. If we hold global "mutex" after "EnterInter" and Before OpRegion1
If we do things this way, then all IO accesses are serialized, if we have something in an IPMI operation region failed due to timeout, then any other system IOs that should happen in parallel will just happen after 5 seconds.  This is not an acceptable experience.

> 
> So the idea seems to be to add wrappers around
> acpi_install_address_space_handler()
> and acpi_remove_address_space_handler (but I don't see where the latter is
> called after the change?), such that they will know when it is safe to unregister
> the handler.  That is simple enough.

An obvious bug, it should be put between the while (atomic_read() > 1) block and the final atomic_dec().

> However, I'm not sure it is needed in the context of IPMI.

I think I do this just because I need a quick fix to test IPMI bug-fix series.
The issue is highly related to ACPI interpreter design, and codes should be implemented inside ACPICA.
And there is not only ACPI_ROOT_OBJECT based address space handlers, but also non-ACPI_ROOT_OBJECT based address space handlers, this patch can't protect the latter ones.

> Your address space
> handler's context is NULL, so even it if is executed after
> acpi_remove_address_space_handler() has been called for it (or in parallel), it
> doesn't depend on anything passed by the caller, so I don't see why the issue
> can't be addressed by a proper synchronization between
> acpi_ipmi_exit() and acpi_ipmi_space_handler().
> 
> Clearly, acpi_ipmi_exit() should wait for all already running instances of
> acpi_ipmi_space_handler() to complete and all acpi_ipmi_space_handler()
> instances started after acpi_ipmi_exit() has been called must return
> immediately.
> 
> I would imagine an algorithm like this:
> 
> acpi_ipmi_exit()
>  1. Take "address space handler lock".
>  2. Set "unregistering address space handler" flag.
>  3. Check if "count of currently running handlers" is 0.  If so,
>     call acpi_remove_address_space_handler(), drop the lock (possibly clear
> the
>     flag) and return.
>  4. Otherwise drop the lock and go to sleep in "address space handler wait
> queue".
>  5. When woken up, take "address space handler lock" and go to 3.
> 
> acpi_ipmi_space_handler()
>  1. Take "address space handler lock".
>  2. Check "unregistering address space handler" flag.  If set, drop the lock
>     and return.
>  3. Increment "count of currently running handlers".
>  4. Drop the lock.
>  5. Do your work.
>  6. Take "address space handler lock".
>  7. Decrement "count of currently running handlers" and if 0, signal the
>     tasks waiting on it to wake up.
>  8. Drop the lock.

Yes, it can also work, but the fix will go inside IPMI.
And I agree that the codes should not appear in the IPMI context since the issue is highly ACPI interpreter related.
What if we just stop doing any further work on this patch and just mark it as RFC or a test patch for information purpose.
It is only useful for the testers.

Thanks and best regards
-Lv

> 
> Thanks,
> Rafael
> 
> 
> --
> I speak only for myself.
> Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes
       [not found] <cover.1370652213.git.lv.zheng@intel.com>
  2013-07-23  8:08   ` Lv Zheng
@ 2013-09-13  5:13 ` Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 01/12] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler() Lv Zheng
                     ` (12 more replies)
  1 sibling, 13 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:13 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patchset tries to fix the following kernel bug:
Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
This is fixed by [PATCH 06].

The bug shows IPMI operation region may appear in a device not under the
IPMI system interface device's scope, thus it's required to install the
ACPI IPMI operation region handler from the root of the ACPI namespace.

The original acpi_ipmi implementation includes several issues that break
the test process.  This patchset also includes a re-design of acpi_ipmi
module to make the test possible.
 
[PATCH 01-06] are bug-fix patches that can be applied to the kernels whose
              version is > 2.6.38.  This can be confirmed with:
              # git tag --contains e92b297c
[PATCH 07] is a tuning patch for acpi_ipmi.c.
[PATCH 08-12] are cleanup patches for acpi_ipmi.c and its Kconfig item.

v2.0
PATCH 03: Uses timeout mechanism offerred by ipmi_si.
PATCH 05: Uses kref instead of atomic_t and adds "dead" flag to kill the
          atomic_read codes in ipmi_flush_tx_msg().
PATCH 07: Uses kref instead of atomic_t.

This patchset has passed the test around a fake device accessing IPMI
operation region fields on an IPMI capable platform.  A stress test of
module(acpi_ipmi) load/unload has been performed on such platform.  No
races can be found and the IPMI operation region handler is functioning
now.  It is not possible to test module(ipmi_si) load/unload as it can't be
unloaded due to its' transfer flushing implementation.

Lv Zheng (12):
  ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
  ACPI/IPMI: Fix potential response buffer overflow
  ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  ACPI/IPMI: Fix race caused by the timed out ACPI IPMI transfers
  ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
  ACPI/IPMI: Fix issue caused by the per-device registration of the
    IPMI operation region handler
  ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  ACPI/IPMI: Cleanup several acpi_ipmi_device members
  ACPI/IPMI: Cleanup some initialization codes
  ACPI/IPMI: Cleanup some inclusion codes
  ACPI/IPMI: Cleanup some Kconfig codes
  ACPI/IPMI: Cleanup coding styles

 drivers/acpi/Kconfig     |    3 +-
 drivers/acpi/acpi_ipmi.c |  594 ++++++++++++++++++++++++++++------------------
 2 files changed, 367 insertions(+), 230 deletions(-)

-- 
1.7.10


^ permalink raw reply	[flat|nested] 99+ messages in thread

* [PATCH v2 01/12] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
@ 2013-09-13  5:13   ` Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 02/12] ACPI/IPMI: Fix potential response buffer overflow Lv Zheng
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:13 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patch quick fixes the issues indicated by the test results that
ipmi_msg_handler() is invoked in atomic context.

BUG: scheduling while atomic: kipmi0/18933/0x10000100
Modules linked in: ipmi_si acpi_ipmi ...
CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G       AW    3.10.0-rc7+ #2
Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010
 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8
 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4
 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8
Call Trace:
 <IRQ>  [<ffffffff814c4a1e>] dump_stack+0x19/0x1b
 [<ffffffff814bfbab>] __schedule_bug+0x46/0x54
 [<ffffffff814c73e0>] __schedule+0x83/0x59c
 [<ffffffff81058853>] __cond_resched+0x22/0x2d
 [<ffffffff814c794b>] _cond_resched+0x14/0x1d
 [<ffffffff814c6d82>] mutex_lock+0x11/0x32
 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58
 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si]
 [<ffffffff812bf6e4>] deliver_response+0x55/0x5a
 [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65
 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19
 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc
 [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si]
 ...

Known issues:
- Replacing tx_msg_lock with spinlock is not performance friendly
  Current solution works but does not have the best performance because it
  is better to make atomic context run as fast as possible.  Given there
  are no many IPMI messages created by ACPI, performance of current
  solution may be OK.  It can be better via linking ipmi_recv_msg into an
  RX message queue and process it in other contexts.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index f40acef..a6977e1 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -39,6 +39,7 @@
 #include <linux/ipmi.h>
 #include <linux/device.h>
 #include <linux/pnp.h>
+#include <linux/spinlock.h>
 
 MODULE_AUTHOR("Zhao Yakui");
 MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
@@ -57,7 +58,7 @@ struct acpi_ipmi_device {
 	struct list_head head;
 	/* the IPMI request message list */
 	struct list_head tx_msg_list;
-	struct mutex	tx_msg_lock;
+	spinlock_t	tx_msg_lock;
 	acpi_handle handle;
 	struct pnp_dev *pnp_dev;
 	ipmi_user_t	user_interface;
@@ -147,6 +148,7 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 	struct kernel_ipmi_msg *msg;
 	struct acpi_ipmi_buffer *buffer;
 	struct acpi_ipmi_device *device;
+	unsigned long flags;
 
 	msg = &tx_msg->tx_message;
 	/*
@@ -177,10 +179,10 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 
 	/* Get the msgid */
 	device = tx_msg->device;
-	mutex_lock(&device->tx_msg_lock);
+	spin_lock_irqsave(&device->tx_msg_lock, flags);
 	device->curr_msgid++;
 	tx_msg->tx_msgid = device->curr_msgid;
-	mutex_unlock(&device->tx_msg_lock);
+	spin_unlock_irqrestore(&device->tx_msg_lock, flags);
 }
 
 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
@@ -242,6 +244,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	int msg_found = 0;
 	struct acpi_ipmi_msg *tx_msg;
 	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+	unsigned long flags;
 
 	if (msg->user != ipmi_device->user_interface) {
 		dev_warn(&pnp_dev->dev, "Unexpected response is returned. "
@@ -250,7 +253,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		ipmi_free_recv_msg(msg);
 		return;
 	}
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
 		if (msg->msgid == tx_msg->tx_msgid) {
 			msg_found = 1;
@@ -258,7 +261,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		}
 	}
 
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
 			"returned.\n", msg->msgid);
@@ -378,6 +381,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	struct acpi_ipmi_device *ipmi_device = handler_context;
 	int err, rem_time;
 	acpi_status status;
+	unsigned long flags;
 	/*
 	 * IPMI opregion message.
 	 * IPMI message is firstly written to the BMC and system software
@@ -395,9 +399,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 		return AE_NO_MEMORY;
 
 	acpi_format_ipmi_msg(tx_msg, address, value);
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 	err = ipmi_request_settime(ipmi_device->user_interface,
 					&tx_msg->addr,
 					tx_msg->tx_msgid,
@@ -413,9 +417,9 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	status = AE_OK;
 
 end_label:
-	mutex_lock(&ipmi_device->tx_msg_lock);
+	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_del(&tx_msg->head);
-	mutex_unlock(&ipmi_device->tx_msg_lock);
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 	kfree(tx_msg);
 	return status;
 }
@@ -457,7 +461,7 @@ static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
 
 	INIT_LIST_HEAD(&ipmi_device->head);
 
-	mutex_init(&ipmi_device->tx_msg_lock);
+	spin_lock_init(&ipmi_device->tx_msg_lock);
 	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
 	ipmi_install_space_handler(ipmi_device);
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 02/12] ACPI/IPMI: Fix potential response buffer overflow
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 01/12] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler() Lv Zheng
@ 2013-09-13  5:13   ` Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 03/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers Lv Zheng
                     ` (10 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:13 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patch enhances sanity checks on message size to avoid potential buffer
overflow.

The kernel IPMI message size is IPMI_MAX_MSG_LENGTH(272 bytes) while the
ACPI specification defined IPMI message size is 64 bytes.  The difference
is not handled by the original codes.  This may cause crash in the response
handling codes.
This patch fixes this gap and also combines rx_data/tx_data to use single
data/len pair since they need not be seperated.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   53 +++++++++++++++++++++++++++++-----------------
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index a6977e1..7397135 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -52,6 +52,7 @@ MODULE_LICENSE("GPL");
 #define ACPI_IPMI_UNKNOWN		0x07
 /* the IPMI timeout is 5s */
 #define IPMI_TIMEOUT			(5 * HZ)
+#define ACPI_IPMI_MAX_MSG_LENGTH	64
 
 struct acpi_ipmi_device {
 	/* the device list attached to driver_data.ipmi_devices */
@@ -90,11 +91,9 @@ struct acpi_ipmi_msg {
 	struct completion tx_complete;
 	struct kernel_ipmi_msg tx_message;
 	int	msg_done;
-	/* tx data . And copy it from ACPI object buffer */
-	u8	tx_data[64];
-	int	tx_len;
-	u8	rx_data[64];
-	int	rx_len;
+	/* tx/rx data . And copy it from/to ACPI object buffer */
+	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
+	u8	rx_len;
 	struct acpi_ipmi_device *device;
 };
 
@@ -102,7 +101,7 @@ struct acpi_ipmi_msg {
 struct acpi_ipmi_buffer {
 	u8 status;
 	u8 length;
-	u8 data[64];
+	u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
 };
 
 static void ipmi_register_bmc(int iface, struct device *dev);
@@ -141,7 +140,7 @@ static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 
 #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
 #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
-static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
+static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 				acpi_physical_address address,
 				acpi_integer *value)
 {
@@ -157,15 +156,21 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 	 */
 	msg->netfn = IPMI_OP_RGN_NETFN(address);
 	msg->cmd = IPMI_OP_RGN_CMD(address);
-	msg->data = tx_msg->tx_data;
+	msg->data = tx_msg->data;
 	/*
 	 * value is the parameter passed by the IPMI opregion space handler.
 	 * It points to the IPMI request message buffer
 	 */
 	buffer = (struct acpi_ipmi_buffer *)value;
 	/* copy the tx message data */
+	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
+		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+			      "Unexpected request (msg len %d).\n",
+			      buffer->length);
+		return -EINVAL;
+	}
 	msg->data_len = buffer->length;
-	memcpy(tx_msg->tx_data, buffer->data, msg->data_len);
+	memcpy(tx_msg->data, buffer->data, msg->data_len);
 	/*
 	 * now the default type is SYSTEM_INTERFACE and channel type is BMC.
 	 * If the netfn is APP_REQUEST and the cmd is SEND_MESSAGE,
@@ -183,6 +188,7 @@ static void acpi_format_ipmi_msg(struct acpi_ipmi_msg *tx_msg,
 	device->curr_msgid++;
 	tx_msg->tx_msgid = device->curr_msgid;
 	spin_unlock_irqrestore(&device->tx_msg_lock, flags);
+	return 0;
 }
 
 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
@@ -214,7 +220,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 	 */
 	buffer->status = ACPI_IPMI_OK;
 	buffer->length = msg->rx_len;
-	memcpy(buffer->data, msg->rx_data, msg->rx_len);
+	memcpy(buffer->data, msg->data, msg->rx_len);
 }
 
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
@@ -250,8 +256,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		dev_warn(&pnp_dev->dev, "Unexpected response is returned. "
 			"returned user %p, expected user %p\n",
 			msg->user, ipmi_device->user_interface);
-		ipmi_free_recv_msg(msg);
-		return;
+		goto out_msg;
 	}
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
@@ -265,17 +270,21 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
 			"returned.\n", msg->msgid);
-		ipmi_free_recv_msg(msg);
-		return;
+		goto out_msg;
 	}
 
-	if (msg->msg.data_len) {
-		/* copy the response data to Rx_data buffer */
-		memcpy(tx_msg->rx_data, msg->msg_data, msg->msg.data_len);
+	/* copy the response data to Rx_data buffer */
+	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
+		dev_WARN_ONCE(&pnp_dev->dev, true,
+			      "Unexpected response (msg len %d).\n",
+			      msg->msg.data_len);
+	} else {
 		tx_msg->rx_len = msg->msg.data_len;
+		memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
 		tx_msg->msg_done = 1;
 	}
 	complete(&tx_msg->tx_complete);
+out_msg:
 	ipmi_free_recv_msg(msg);
 };
 
@@ -398,7 +407,10 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if (!tx_msg)
 		return AE_NO_MEMORY;
 
-	acpi_format_ipmi_msg(tx_msg, address, value);
+	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
+		status = AE_TYPE;
+		goto out_msg;
+	}
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
@@ -409,17 +421,18 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 					NULL, 0, 0, 0);
 	if (err) {
 		status = AE_ERROR;
-		goto end_label;
+		goto out_list;
 	}
 	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
 					IPMI_TIMEOUT);
 	acpi_format_ipmi_response(tx_msg, value, rem_time);
 	status = AE_OK;
 
-end_label:
+out_list:
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_del(&tx_msg->head);
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
+out_msg:
 	kfree(tx_msg);
 	return status;
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 03/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 01/12] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler() Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 02/12] ACPI/IPMI: Fix potential response buffer overflow Lv Zheng
@ 2013-09-13  5:13   ` Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 04/12] ACPI/IPMI: Fix race caused by the timed out " Lv Zheng
                     ` (9 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:13 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patch fixes races caused by unprotected ACPI IPMI transfers.

We can see the following crashes may occur:
1. There is no tx_msg_lock held for iterating tx_msg_list in
   ipmi_flush_tx_msg() while it is parellel unlinked on failure in
   acpi_ipmi_space_handler() under protection of tx_msg_lock.
2. There is no lock held for freeing tx_msg in acpi_ipmi_space_handler()
   while it is parellel accessed in ipmi_flush_tx_msg() and
   ipmi_msg_handler().

This patch enhances tx_msg_lock to protect all tx_msg accesses to solve
this issue.  Then tx_msg_lock is always held around complete() and tx_msg
accesses.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 7397135..87307ba 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -228,11 +228,14 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 	struct acpi_ipmi_msg *tx_msg, *temp;
 	int count = HZ / 10;
 	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
+	unsigned long flags;
 
+	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
 	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
 		/* wake up the sleep thread on the Tx msg */
 		complete(&tx_msg->tx_complete);
 	}
+	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
 
 	/* wait for about 100ms to flush the tx message list */
 	while (count--) {
@@ -266,11 +269,10 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		}
 	}
 
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
 			"returned.\n", msg->msgid);
-		goto out_msg;
+		goto out_lock;
 	}
 
 	/* copy the response data to Rx_data buffer */
@@ -284,6 +286,8 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		tx_msg->msg_done = 1;
 	}
 	complete(&tx_msg->tx_complete);
+out_lock:
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	ipmi_free_recv_msg(msg);
 };
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 04/12] ACPI/IPMI: Fix race caused by the timed out ACPI IPMI transfers
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (2 preceding siblings ...)
  2013-09-13  5:13   ` [PATCH v2 03/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers Lv Zheng
@ 2013-09-13  5:13   ` Lv Zheng
  2013-09-13  5:13   ` [PATCH v2 05/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user Lv Zheng
                     ` (8 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:13 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patch fixes races caused by timed out ACPI IPMI transfers.

This patch uses timeout mechanism provided by ipmi_si to avoid the race
that the msg_done flag is set but without any protection, its content can
be invalid.  Thanks for the suggestion of Corey Minyard.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   49 +++++++++++++++++++++++++---------------------
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 87307ba..9171a1a 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -51,7 +51,7 @@ MODULE_LICENSE("GPL");
 #define ACPI_IPMI_TIMEOUT		0x10
 #define ACPI_IPMI_UNKNOWN		0x07
 /* the IPMI timeout is 5s */
-#define IPMI_TIMEOUT			(5 * HZ)
+#define IPMI_TIMEOUT			(5000)
 #define ACPI_IPMI_MAX_MSG_LENGTH	64
 
 struct acpi_ipmi_device {
@@ -135,6 +135,7 @@ static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 	init_completion(&ipmi_msg->tx_complete);
 	INIT_LIST_HEAD(&ipmi_msg->head);
 	ipmi_msg->device = ipmi;
+	ipmi_msg->msg_done = ACPI_IPMI_UNKNOWN;
 	return ipmi_msg;
 }
 
@@ -192,7 +193,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 }
 
 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
-		acpi_integer *value, int rem_time)
+		acpi_integer *value)
 {
 	struct acpi_ipmi_buffer *buffer;
 
@@ -201,24 +202,17 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 	 * IPMI message returned by IPMI command.
 	 */
 	buffer = (struct acpi_ipmi_buffer *)value;
-	if (!rem_time && !msg->msg_done) {
-		buffer->status = ACPI_IPMI_TIMEOUT;
-		return;
-	}
 	/*
-	 * If the flag of msg_done is not set or the recv length is zero, it
-	 * means that the IPMI command is not executed correctly.
-	 * The status code will be ACPI_IPMI_UNKNOWN.
+	 * If the flag of msg_done is not set, it means that the IPMI command is
+	 * not executed correctly.
 	 */
-	if (!msg->msg_done || !msg->rx_len) {
-		buffer->status = ACPI_IPMI_UNKNOWN;
+	buffer->status = msg->msg_done;
+	if (msg->msg_done != ACPI_IPMI_OK)
 		return;
-	}
 	/*
 	 * If the IPMI response message is obtained correctly, the status code
 	 * will be ACPI_IPMI_OK
 	 */
-	buffer->status = ACPI_IPMI_OK;
 	buffer->length = msg->rx_len;
 	memcpy(buffer->data, msg->data, msg->rx_len);
 }
@@ -280,11 +274,23 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		dev_WARN_ONCE(&pnp_dev->dev, true,
 			      "Unexpected response (msg len %d).\n",
 			      msg->msg.data_len);
-	} else {
-		tx_msg->rx_len = msg->msg.data_len;
-		memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
-		tx_msg->msg_done = 1;
+		goto out_comp;
 	}
+	/* response msg is an error msg */
+	msg->recv_type = IPMI_RESPONSE_RECV_TYPE;
+	if (msg->recv_type == IPMI_RESPONSE_RECV_TYPE &&
+	    msg->msg.data_len == 1) {
+		if (msg->msg.data[0] == IPMI_TIMEOUT_COMPLETION_CODE) {
+			dev_WARN_ONCE(&pnp_dev->dev, true,
+				      "Unexpected response (timeout).\n");
+			tx_msg->msg_done = ACPI_IPMI_TIMEOUT;
+		}
+		goto out_comp;
+	}
+	tx_msg->rx_len = msg->msg.data_len;
+	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
+	tx_msg->msg_done = ACPI_IPMI_OK;
+out_comp:
 	complete(&tx_msg->tx_complete);
 out_lock:
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
@@ -392,7 +398,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 {
 	struct acpi_ipmi_msg *tx_msg;
 	struct acpi_ipmi_device *ipmi_device = handler_context;
-	int err, rem_time;
+	int err;
 	acpi_status status;
 	unsigned long flags;
 	/*
@@ -422,14 +428,13 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 					&tx_msg->addr,
 					tx_msg->tx_msgid,
 					&tx_msg->tx_message,
-					NULL, 0, 0, 0);
+					NULL, 0, 0, IPMI_TIMEOUT);
 	if (err) {
 		status = AE_ERROR;
 		goto out_list;
 	}
-	rem_time = wait_for_completion_timeout(&tx_msg->tx_complete,
-					IPMI_TIMEOUT);
-	acpi_format_ipmi_response(tx_msg, value, rem_time);
+	wait_for_completion(&tx_msg->tx_complete);
+	acpi_format_ipmi_response(tx_msg, value);
 	status = AE_OK;
 
 out_list:
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 05/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (3 preceding siblings ...)
  2013-09-13  5:13   ` [PATCH v2 04/12] ACPI/IPMI: Fix race caused by the timed out " Lv Zheng
@ 2013-09-13  5:13   ` Lv Zheng
  2013-09-13  5:14   ` [PATCH v2 06/12] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler Lv Zheng
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:13 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patch uses reference counting to fix the race caused by the
unprotected ACPI IPMI user.

There are two rules for using ipmi_si APIs:
1.  In ipmi_si, ipmi_destroy_user() can ensure no ipmi_recv_msg be passed
    to ipmi_msg_handler(), but ipmi_request_settime() can not use a
    non-valid ipmi_user_t.  This means the ipmi_si users must ensure no
    local references on ipmi_user_t before invoking ipmi_destroy_user().
2.  In ipmi_si, smi_gone()/new_smi() callbacks are protected by
    smi_watchers_mutex, thus their invocations are serialized.  But as a
    new smi can re-use the freed intf_num, it requires that the callback
    implementation must not use intf_num as an identification mean or it
    must ensure all references to the previous smi are all dropped before
    exiting smi_gone() callback.

As the acpi_ipmi_device->user_interface check in acpi_ipmi_space_handler()
can happen before setting user_interface to NULL and codes after the check
in acpi_ipmi_space_handler() can happen after user_interface becoming NULL,
then the on-going acpi_ipmi_space_handler() still can pass an invalid
acpi_ipmi_device->user_interface to ipmi_request_settime().  Such race
condition is not allowed by the IPMI layer's API design as crash will
happen in ipmi_request_settime().

This patch follows ipmi_devintf.c design:
1. Invoking ipmi_destroy_user() after the reference count of
   acpi_ipmi_device dropping to 0.  References of acpi_ipmi_device dropping
   to 0 also means tx_msg related to this acpi_ipmi_device are all freed.
   This matches IPMI layer's API calling rule on ipmi_destroy_user() and
   ipmi_request_settime().
2. ipmi_flush_tx_msg() is performed so that no on-going tx_msg can still be
   running in acpi_ipmi_space_handler().  And it is invoked after invoking
   __ipmi_dev_kill() where acpi_ipmi_device is deleted from the list with a
   "dead" flag set, and the "dead" flag check is also introduced to the
   point where a tx_msg is going to be added to the tx_msg_list so that no
   new tx_msg can be created after returning from the __ipmi_dev_kill().
3. The waiting codes in ipmi_flush_tx_msg() is deleted because it is not
   required since this patch ensures no acpi_ipmi reference is still held
   for ipmi_user_t before calling ipmi_destroy_user() and
   ipmi_destroy_user() can ensure no more ipmi_msg_handler() can happen
   after returning from ipmi_destroy_user().
4. The flushing of tx_msg is also moved out of ipmi_lock in this patch.

The forthcoming IPMI operation region handler installation changes also
requires acpi_ipmi_device be handled in this style.

Authorship is also updated due to this design change.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  249 +++++++++++++++++++++++++++++-----------------
 1 file changed, 156 insertions(+), 93 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 9171a1a..b285386 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -1,8 +1,9 @@
 /*
  *  acpi_ipmi.c - ACPI IPMI opregion
  *
- *  Copyright (C) 2010 Intel Corporation
- *  Copyright (C) 2010 Zhao Yakui <yakui.zhao@intel.com>
+ *  Copyright (C) 2010, 2013 Intel Corporation
+ *    Author: Zhao Yakui <yakui.zhao@intel.com>
+ *            Lv Zheng <lv.zheng@intel.com>
  *
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  *
@@ -67,6 +68,8 @@ struct acpi_ipmi_device {
 	long curr_msgid;
 	unsigned long flags;
 	struct ipmi_smi_info smi_data;
+	bool dead;
+	struct kref kref;
 };
 
 struct ipmi_driver_data {
@@ -107,8 +110,8 @@ struct acpi_ipmi_buffer {
 static void ipmi_register_bmc(int iface, struct device *dev);
 static void ipmi_bmc_gone(int iface);
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device);
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device);
+static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
+static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
 
 static struct ipmi_driver_data driver_data = {
 	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -122,6 +125,88 @@ static struct ipmi_driver_data driver_data = {
 	},
 };
 
+static struct acpi_ipmi_device *
+ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+{
+	struct acpi_ipmi_device *ipmi_device;
+	int err;
+	ipmi_user_t user;
+
+	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
+	if (!ipmi_device)
+		return NULL;
+
+	kref_init(&ipmi_device->kref);
+	INIT_LIST_HEAD(&ipmi_device->head);
+	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
+	spin_lock_init(&ipmi_device->tx_msg_lock);
+
+	ipmi_device->handle = handle;
+	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
+	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+	ipmi_device->ipmi_ifnum = iface;
+
+	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
+			       ipmi_device, &user);
+	if (err) {
+		put_device(smi_data->dev);
+		kfree(ipmi_device);
+		return NULL;
+	}
+	ipmi_device->user_interface = user;
+	ipmi_install_space_handler(ipmi_device);
+
+	return ipmi_device;
+}
+
+static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
+{
+	ipmi_remove_space_handler(ipmi_device);
+	ipmi_destroy_user(ipmi_device->user_interface);
+	put_device(ipmi_device->smi_data.dev);
+	kfree(ipmi_device);
+}
+
+static void ipmi_dev_release_kref(struct kref *kref)
+{
+	struct acpi_ipmi_device *ipmi =
+		container_of(kref, struct acpi_ipmi_device, kref);
+
+	ipmi_dev_release(ipmi);
+}
+
+static void __ipmi_dev_kill(struct acpi_ipmi_device *ipmi_device)
+{
+	list_del(&ipmi_device->head);
+	/*
+	 * Always setting dead flag after deleting from the list or
+	 * list_for_each_entry() codes must get changed.
+	 */
+	ipmi_device->dead = true;
+}
+
+static struct acpi_ipmi_device *acpi_ipmi_dev_get(int iface)
+{
+	struct acpi_ipmi_device *temp, *ipmi_device = NULL;
+
+	mutex_lock(&driver_data.ipmi_lock);
+	list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
+		if (temp->ipmi_ifnum == iface) {
+			ipmi_device = temp;
+			kref_get(&ipmi_device->kref);
+			break;
+		}
+	}
+	mutex_unlock(&driver_data.ipmi_lock);
+
+	return ipmi_device;
+}
+
+static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
+{
+	kref_put(&ipmi_device->kref, ipmi_dev_release_kref);
+}
+
 static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 {
 	struct acpi_ipmi_msg *ipmi_msg;
@@ -220,25 +305,22 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 {
 	struct acpi_ipmi_msg *tx_msg, *temp;
-	int count = HZ / 10;
-	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 	unsigned long flags;
 
+	/*
+	 * NOTE: On-going ipmi_recv_msg
+	 * ipmi_msg_handler() may still be invoked by ipmi_si after
+	 * flushing.  But it is safe to do a fast flushing on module_exit()
+	 * without waiting for all ipmi_recv_msg(s) to complete from
+	 * ipmi_msg_handler() as it is ensured by ipmi_si that all
+	 * ipmi_recv_msg(s) are freed after invoking ipmi_destroy_user().
+	 */
 	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
 	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
 		/* wake up the sleep thread on the Tx msg */
 		complete(&tx_msg->tx_complete);
 	}
 	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
-
-	/* wait for about 100ms to flush the tx message list */
-	while (count--) {
-		if (list_empty(&ipmi->tx_msg_list))
-			break;
-		schedule_timeout(1);
-	}
-	if (!list_empty(&ipmi->tx_msg_list))
-		dev_warn(&pnp_dev->dev, "tx msg list is not NULL\n");
 }
 
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
@@ -302,7 +384,6 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 {
 	struct acpi_ipmi_device *ipmi_device, *temp;
 	struct pnp_dev *pnp_dev;
-	ipmi_user_t		user;
 	int err;
 	struct ipmi_smi_info smi_data;
 	acpi_handle handle;
@@ -312,12 +393,18 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 	if (err)
 		return;
 
-	if (smi_data.addr_src != SI_ACPI) {
-		put_device(smi_data.dev);
-		return;
-	}
-
+	if (smi_data.addr_src != SI_ACPI)
+		goto err_ref;
 	handle = smi_data.addr_info.acpi_info.acpi_handle;
+	if (!handle)
+		goto err_ref;
+	pnp_dev = to_pnp_dev(smi_data.dev);
+
+	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+	if (!ipmi_device) {
+		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+		goto err_ref;
+	}
 
 	mutex_lock(&driver_data.ipmi_lock);
 	list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
@@ -326,34 +413,18 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 		 * to the device list, don't add it again.
 		 */
 		if (temp->handle == handle)
-			goto out;
+			goto err_lock;
 	}
 
-	ipmi_device = kzalloc(sizeof(*ipmi_device), GFP_KERNEL);
-
-	if (!ipmi_device)
-		goto out;
-
-	pnp_dev = to_pnp_dev(smi_data.dev);
-	ipmi_device->handle = handle;
-	ipmi_device->pnp_dev = pnp_dev;
-
-	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
-					ipmi_device, &user);
-	if (err) {
-		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
-		kfree(ipmi_device);
-		goto out;
-	}
-	acpi_add_ipmi_device(ipmi_device);
-	ipmi_device->user_interface = user;
-	ipmi_device->ipmi_ifnum = iface;
+	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
 	mutex_unlock(&driver_data.ipmi_lock);
-	memcpy(&ipmi_device->smi_data, &smi_data, sizeof(struct ipmi_smi_info));
+	put_device(smi_data.dev);
 	return;
 
-out:
+err_lock:
 	mutex_unlock(&driver_data.ipmi_lock);
+	ipmi_dev_release(ipmi_device);
+err_ref:
 	put_device(smi_data.dev);
 	return;
 }
@@ -361,19 +432,22 @@ out:
 static void ipmi_bmc_gone(int iface)
 {
 	struct acpi_ipmi_device *ipmi_device, *temp;
+	bool dev_found = false;
 
 	mutex_lock(&driver_data.ipmi_lock);
 	list_for_each_entry_safe(ipmi_device, temp,
 				&driver_data.ipmi_devices, head) {
-		if (ipmi_device->ipmi_ifnum != iface)
-			continue;
-
-		acpi_remove_ipmi_device(ipmi_device);
-		put_device(ipmi_device->smi_data.dev);
-		kfree(ipmi_device);
-		break;
+		if (ipmi_device->ipmi_ifnum != iface) {
+			dev_found = true;
+			__ipmi_dev_kill(ipmi_device);
+			break;
+		}
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
+	if (dev_found) {
+		ipmi_flush_tx_msg(ipmi_device);
+		acpi_ipmi_dev_put(ipmi_device);
+	}
 }
 /* --------------------------------------------------------------------------
  *			Address Space Management
@@ -397,7 +471,8 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 		      void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
-	struct acpi_ipmi_device *ipmi_device = handler_context;
+	int iface = (long)handler_context;
+	struct acpi_ipmi_device *ipmi_device;
 	int err;
 	acpi_status status;
 	unsigned long flags;
@@ -410,20 +485,31 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	if (!ipmi_device->user_interface)
+	ipmi_device = acpi_ipmi_dev_get(iface);
+	if (!ipmi_device)
 		return AE_NOT_EXIST;
 
 	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
-	if (!tx_msg)
-		return AE_NO_MEMORY;
+	if (!tx_msg) {
+		status = AE_NO_MEMORY;
+		goto out_ref;
+	}
 
 	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
 		status = AE_TYPE;
 		goto out_msg;
 	}
+	mutex_lock(&driver_data.ipmi_lock);
+	/* Do not add a tx_msg that can not be flushed. */
+	if (ipmi_device->dead) {
+		status = AE_NOT_EXIST;
+		mutex_unlock(&driver_data.ipmi_lock);
+		goto out_msg;
+	}
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
+	mutex_unlock(&driver_data.ipmi_lock);
 	err = ipmi_request_settime(ipmi_device->user_interface,
 					&tx_msg->addr,
 					tx_msg->tx_msgid,
@@ -443,6 +529,8 @@ out_list:
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
 	kfree(tx_msg);
+out_ref:
+	acpi_ipmi_dev_put(ipmi_device);
 	return status;
 }
 
@@ -465,9 +553,8 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
 		return 0;
 
 	status = acpi_install_address_space_handler(ipmi->handle,
-						    ACPI_ADR_SPACE_IPMI,
-						    &acpi_ipmi_space_handler,
-						    NULL, ipmi);
+				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler,
+				NULL, (void *)((long)ipmi->ipmi_ifnum));
 	if (ACPI_FAILURE(status)) {
 		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
@@ -478,36 +565,6 @@ static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
 	return 0;
 }
 
-static void acpi_add_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
-
-	INIT_LIST_HEAD(&ipmi_device->head);
-
-	spin_lock_init(&ipmi_device->tx_msg_lock);
-	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
-	ipmi_install_space_handler(ipmi_device);
-
-	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
-}
-
-static void acpi_remove_ipmi_device(struct acpi_ipmi_device *ipmi_device)
-{
-	/*
-	 * If the IPMI user interface is created, it should be
-	 * destroyed.
-	 */
-	if (ipmi_device->user_interface) {
-		ipmi_destroy_user(ipmi_device->user_interface);
-		ipmi_device->user_interface = NULL;
-	}
-	/* flush the Tx_msg list */
-	if (!list_empty(&ipmi_device->tx_msg_list))
-		ipmi_flush_tx_msg(ipmi_device);
-
-	list_del(&ipmi_device->head);
-	ipmi_remove_space_handler(ipmi_device);
-}
-
 static int __init acpi_ipmi_init(void)
 {
 	int result = 0;
@@ -524,7 +581,7 @@ static int __init acpi_ipmi_init(void)
 
 static void __exit acpi_ipmi_exit(void)
 {
-	struct acpi_ipmi_device *ipmi_device, *temp;
+	struct acpi_ipmi_device *ipmi_device;
 
 	if (acpi_disabled)
 		return;
@@ -538,11 +595,17 @@ static void __exit acpi_ipmi_exit(void)
 	 * handler and free it.
 	 */
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry_safe(ipmi_device, temp,
-				&driver_data.ipmi_devices, head) {
-		acpi_remove_ipmi_device(ipmi_device);
-		put_device(ipmi_device->smi_data.dev);
-		kfree(ipmi_device);
+	while (!list_empty(&driver_data.ipmi_devices)) {
+		ipmi_device = list_first_entry(&driver_data.ipmi_devices,
+					       struct acpi_ipmi_device,
+					       head);
+		__ipmi_dev_kill(ipmi_device);
+		mutex_unlock(&driver_data.ipmi_lock);
+
+		ipmi_flush_tx_msg(ipmi_device);
+		acpi_ipmi_dev_put(ipmi_device);
+
+		mutex_lock(&driver_data.ipmi_lock);
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 06/12] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (4 preceding siblings ...)
  2013-09-13  5:13   ` [PATCH v2 05/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user Lv Zheng
@ 2013-09-13  5:14   ` Lv Zheng
  2013-09-13  5:14   ` [PATCH v2 07/12] ACPI/IPMI: Add reference counting for ACPI IPMI transfers Lv Zheng
                     ` (6 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:14 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

It is found on a real machine, in its ACPI namespace, the IPMI
OperationRegions (in the ACPI000D - ACPI power meter) are not defined under
the IPMI system interface device (the IPI0001 with KCS type returned from
_IFT control method):
  Device (PMI0)
  {
      Name (_HID, "ACPI000D")  // _HID: Hardware ID
      OperationRegion (SYSI, IPMI, 0x0600, 0x0100)
      Field (SYSI, BufferAcc, Lock, Preserve)
      {
          AccessAs (BufferAcc, 0x01),
          Offset (0x58),
          SCMD,   8,
          GCMD,   8
      }

      OperationRegion (POWR, IPMI, 0x3000, 0x0100)
      Field (POWR, BufferAcc, Lock, Preserve)
      {
          AccessAs (BufferAcc, 0x01),
          Offset (0xB3),
          GPMM,   8
      }
  }

  Device (PCI0)
  {
      Device (ISA)
      {
          Device (NIPM)
          {
              Name (_HID, EisaId ("IPI0001"))  // _HID: Hardware ID
              Method (_IFT, 0, NotSerialized)  // _IFT: IPMI Interface Type
              {
                  Return (0x01)
              }
          }
      }
  }
Current ACPI_IPMI code registers IPMI operation region handler on a
per-device basis, so that for above namespace, the IPMI operation region
handler is registered only under the scope of \_SB.PCI0.ISA.NIPM.  Thus
when an IPMI operation region field of \PMI0 is accessed, there are errors
reported on such platform:
  ACPI Error: No handlers for Region [IPMI]
  ACPI Error: Region IPMI(7) has no handler
The solution is to install IPMI operation region handler from root node so
that every object that defines IPMI OperationRegion can get an address
space handler registered.

When an IPMI operation region field is accessed, the Network Function
(0x06 for SYSI and 0x30 for POWR) and the Command (SCMD, GCMD, GPMM) are
passed to the operation region handler, there is no system interface
specified by the BIOS.  The patch tries to select one system interface by
monitoring the system interface notification.  IPMI messages passed from
the ACPI codes are sent to this selected global IPMI system interface.

Known issues:
- How to select the IPMI system interface:
  Currently, the ACPI_IPMI always selects the first registered one with the
  ACPI handle set (i.e., defined in the ACPI namespace).  It's hard to
  determine the selection when there are multiple IPMI system interfaces
  defined in the ACPI namespace.
  According to the IPMI specification:
  A BMC device may make available multiple system interfaces, but only one
  management controller is allowed to be 'active' BMC that provides BMC
  functionality for the system (in case of a 'partitioned' system, there
  can be only one active BMC per partition).  Only the system interface(s)
  for the active BMC allowed to respond to the 'Get Device Id' command.
  According to the ipmi_si desigin:
  The ipmi_si registeration notifications can only happen after a
  successful "Get Device ID" command.
  Thus it should be OK for non-partitioned systems to do such selection.
  But we do not have too much knowledges on 'partitioned' systems.

Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   81 +++++++++++++++++++---------------------------
 1 file changed, 34 insertions(+), 47 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index b285386..7ec4cd1 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -46,7 +46,6 @@ MODULE_AUTHOR("Zhao Yakui");
 MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
 MODULE_LICENSE("GPL");
 
-#define IPMI_FLAGS_HANDLER_INSTALL	0
 
 #define ACPI_IPMI_OK			0
 #define ACPI_IPMI_TIMEOUT		0x10
@@ -66,7 +65,6 @@ struct acpi_ipmi_device {
 	ipmi_user_t	user_interface;
 	int ipmi_ifnum; /* IPMI interface number */
 	long curr_msgid;
-	unsigned long flags;
 	struct ipmi_smi_info smi_data;
 	bool dead;
 	struct kref kref;
@@ -77,6 +75,14 @@ struct ipmi_driver_data {
 	struct ipmi_smi_watcher	bmc_events;
 	struct ipmi_user_hndl	ipmi_hndlrs;
 	struct mutex		ipmi_lock;
+	/*
+	 * NOTE: IPMI System Interface Selection
+	 * There is no system interface specified by the IPMI operation
+	 * region access.  We try to select one system interface with ACPI
+	 * handle set.  IPMI messages passed from the ACPI codes are sent
+	 * to this selected global IPMI system interface.
+	 */
+	struct acpi_ipmi_device *selected_smi;
 };
 
 struct acpi_ipmi_msg {
@@ -110,8 +116,6 @@ struct acpi_ipmi_buffer {
 static void ipmi_register_bmc(int iface, struct device *dev);
 static void ipmi_bmc_gone(int iface);
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data);
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi);
-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi);
 
 static struct ipmi_driver_data driver_data = {
 	.ipmi_devices = LIST_HEAD_INIT(driver_data.ipmi_devices),
@@ -154,14 +158,12 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
 		return NULL;
 	}
 	ipmi_device->user_interface = user;
-	ipmi_install_space_handler(ipmi_device);
 
 	return ipmi_device;
 }
 
 static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
 {
-	ipmi_remove_space_handler(ipmi_device);
 	ipmi_destroy_user(ipmi_device->user_interface);
 	put_device(ipmi_device->smi_data.dev);
 	kfree(ipmi_device);
@@ -178,6 +180,8 @@ static void ipmi_dev_release_kref(struct kref *kref)
 static void __ipmi_dev_kill(struct acpi_ipmi_device *ipmi_device)
 {
 	list_del(&ipmi_device->head);
+	if (driver_data.selected_smi == ipmi_device)
+		driver_data.selected_smi = NULL;
 	/*
 	 * Always setting dead flag after deleting from the list or
 	 * list_for_each_entry() codes must get changed.
@@ -185,17 +189,14 @@ static void __ipmi_dev_kill(struct acpi_ipmi_device *ipmi_device)
 	ipmi_device->dead = true;
 }
 
-static struct acpi_ipmi_device *acpi_ipmi_dev_get(int iface)
+static struct acpi_ipmi_device *acpi_ipmi_dev_get(void)
 {
-	struct acpi_ipmi_device *temp, *ipmi_device = NULL;
+	struct acpi_ipmi_device *ipmi_device = NULL;
 
 	mutex_lock(&driver_data.ipmi_lock);
-	list_for_each_entry(temp, &driver_data.ipmi_devices, head) {
-		if (temp->ipmi_ifnum == iface) {
-			ipmi_device = temp;
-			kref_get(&ipmi_device->kref);
-			break;
-		}
+	if (driver_data.selected_smi) {
+		ipmi_device = driver_data.selected_smi;
+		kref_get(&ipmi_device->kref);
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 
@@ -416,6 +417,8 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 			goto err_lock;
 	}
 
+	if (!driver_data.selected_smi)
+		driver_data.selected_smi = ipmi_device;
 	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
 	mutex_unlock(&driver_data.ipmi_lock);
 	put_device(smi_data.dev);
@@ -443,6 +446,10 @@ static void ipmi_bmc_gone(int iface)
 			break;
 		}
 	}
+	if (!driver_data.selected_smi)
+		driver_data.selected_smi = list_first_entry_or_null(
+					&driver_data.ipmi_devices,
+					struct acpi_ipmi_device, head);
 	mutex_unlock(&driver_data.ipmi_lock);
 	if (dev_found) {
 		ipmi_flush_tx_msg(ipmi_device);
@@ -471,7 +478,6 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 		      void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
-	int iface = (long)handler_context;
 	struct acpi_ipmi_device *ipmi_device;
 	int err;
 	acpi_status status;
@@ -485,7 +491,7 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	ipmi_device = acpi_ipmi_dev_get(iface);
+	ipmi_device = acpi_ipmi_dev_get();
 	if (!ipmi_device)
 		return AE_NOT_EXIST;
 
@@ -534,47 +540,26 @@ out_ref:
 	return status;
 }
 
-static void ipmi_remove_space_handler(struct acpi_ipmi_device *ipmi)
-{
-	if (!test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
-		return;
-
-	acpi_remove_address_space_handler(ipmi->handle,
-				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler);
-
-	clear_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
-}
-
-static int ipmi_install_space_handler(struct acpi_ipmi_device *ipmi)
-{
-	acpi_status status;
-
-	if (test_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags))
-		return 0;
-
-	status = acpi_install_address_space_handler(ipmi->handle,
-				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler,
-				NULL, (void *)((long)ipmi->ipmi_ifnum));
-	if (ACPI_FAILURE(status)) {
-		struct pnp_dev *pnp_dev = ipmi->pnp_dev;
-		dev_warn(&pnp_dev->dev, "Can't register IPMI opregion space "
-			"handle\n");
-		return -EINVAL;
-	}
-	set_bit(IPMI_FLAGS_HANDLER_INSTALL, &ipmi->flags);
-	return 0;
-}
-
 static int __init acpi_ipmi_init(void)
 {
 	int result = 0;
+	acpi_status status;
 
 	if (acpi_disabled)
 		return result;
 
 	mutex_init(&driver_data.ipmi_lock);
 
+	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
+				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler,
+				NULL, NULL);
+	if (ACPI_FAILURE(status)) {
+		pr_warn("Can't register IPMI opregion space handle\n");
+		return -EINVAL;
+	}
 	result = ipmi_smi_watcher_register(&driver_data.bmc_events);
+	if (result)
+		pr_err("Can't register IPMI system interface watcher\n");
 
 	return result;
 }
@@ -608,6 +593,8 @@ static void __exit acpi_ipmi_exit(void)
 		mutex_lock(&driver_data.ipmi_lock);
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
+	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
+				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler);
 }
 
 module_init(acpi_ipmi_init);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 07/12] ACPI/IPMI: Add reference counting for ACPI IPMI transfers
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (5 preceding siblings ...)
  2013-09-13  5:14   ` [PATCH v2 06/12] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler Lv Zheng
@ 2013-09-13  5:14   ` Lv Zheng
  2013-09-13  5:14   ` [PATCH v2 08/12] ACPI/IPMI: Cleanup several acpi_ipmi_device members Lv Zheng
                     ` (5 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:14 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patch adds reference counting for ACPI IPMI transfers to tune the
locking granularity of tx_msg_lock.

This patch also makes whole acpi_ipmi module's coding style consistent by
using reference counting for all its objects (i.e., acpi_ipmi_device and
acpi_ipmi_msg).

The acpi_ipmi_msg handling is re-designed using referece counting.
1. tx_msg is always unlinked before complete(), so that it is safe to put
   complete() out side of tx_msg_lock.
2. Increasing the reference of tx_msg before calling
   ipmi_request_settime() and introducing tx_msg_lock protected
   ipmi_cancel_tx_msg() so that a complete() can happen in parellel with
   tx_msg unlinking in the failure cases.
3. tx_msg holds the reference of acpi_ipmi_device so that it can be flushed
   and freed in the contexts other than acpi_ipmi_space_handler().

The lockdep_chains shows all acpi_ipmi locks are leaf locks after the
tuning:
1. ipmi_lock is always leaf:
   irq_context: 0
   [ffffffff81a943f8] smi_watchers_mutex
   [ffffffffa06eca60] driver_data.ipmi_lock
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a6678] s_active#103
   [ffffffffa06eca60] driver_data.ipmi_lock
2. without this patch applied, lock used by complete() is held after
   holding tx_msg_lock:
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a6678] s_active#103
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   [ffffffff81e36620] &p->pi_lock
   irq_context: 1
   [ffffffffa06ecce8] &(&ipmi_device->tx_msg_lock)->rlock
   [ffffffffa06eccf0] &x->wait#25
   [ffffffff81e36620] &p->pi_lock
   [ffffffff81e5d0a8] &rq->lock
3. with this patch applied, tx_msg_lock is always leaf:
   irq_context: 0
   [ffffffff82767b40] &buffer->mutex
   [ffffffffa00a66d8] s_active#107
   [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock
   irq_context: 1
   [ffffffffa07ecdc8] &(&ipmi_device->tx_msg_lock)->rlock

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Cc: Zhao Yakui <yakui.zhao@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  117 +++++++++++++++++++++++++++++++++-------------
 1 file changed, 85 insertions(+), 32 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 7ec4cd1..b9da5ef 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -104,6 +104,7 @@ struct acpi_ipmi_msg {
 	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
 	u8	rx_len;
 	struct acpi_ipmi_device *device;
+	struct kref	kref;
 };
 
 /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
@@ -208,16 +209,20 @@ static void acpi_ipmi_dev_put(struct acpi_ipmi_device *ipmi_device)
 	kref_put(&ipmi_device->kref, ipmi_dev_release_kref);
 }
 
-static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
+static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
 {
+	struct acpi_ipmi_device *ipmi;
 	struct acpi_ipmi_msg *ipmi_msg;
-	struct pnp_dev *pnp_dev = ipmi->pnp_dev;
 
+	ipmi = acpi_ipmi_dev_get();
+	if (!ipmi)
+		return NULL;
 	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
-	if (!ipmi_msg)	{
-		dev_warn(&pnp_dev->dev, "Can't allocate memory for ipmi_msg\n");
+	if (!ipmi_msg) {
+		acpi_ipmi_dev_put(ipmi);
 		return NULL;
 	}
+	kref_init(&ipmi_msg->kref);
 	init_completion(&ipmi_msg->tx_complete);
 	INIT_LIST_HEAD(&ipmi_msg->head);
 	ipmi_msg->device = ipmi;
@@ -225,6 +230,32 @@ static struct acpi_ipmi_msg *acpi_alloc_ipmi_msg(struct acpi_ipmi_device *ipmi)
 	return ipmi_msg;
 }
 
+static void ipmi_msg_release(struct acpi_ipmi_msg *tx_msg)
+{
+	acpi_ipmi_dev_put(tx_msg->device);
+	kfree(tx_msg);
+}
+
+static void ipmi_msg_release_kref(struct kref *kref)
+{
+	struct acpi_ipmi_msg *tx_msg =
+		container_of(kref, struct acpi_ipmi_msg, kref);
+
+	ipmi_msg_release(tx_msg);
+}
+
+static struct acpi_ipmi_msg *acpi_ipmi_msg_get(struct acpi_ipmi_msg *tx_msg)
+{
+	kref_get(&tx_msg->kref);
+
+	return tx_msg;
+}
+
+static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
+{
+	kref_put(&tx_msg->kref, ipmi_msg_release_kref);
+}
+
 #define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
 #define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
 static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
@@ -305,7 +336,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 
 static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 {
-	struct acpi_ipmi_msg *tx_msg, *temp;
+	struct acpi_ipmi_msg *tx_msg;
 	unsigned long flags;
 
 	/*
@@ -317,18 +348,47 @@ static void ipmi_flush_tx_msg(struct acpi_ipmi_device *ipmi)
 	 * ipmi_recv_msg(s) are freed after invoking ipmi_destroy_user().
 	 */
 	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
-	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
+	while (!list_empty(&ipmi->tx_msg_list)) {
+		tx_msg = list_first_entry(&ipmi->tx_msg_list,
+					  struct acpi_ipmi_msg,
+					  head);
+		list_del(&tx_msg->head);
+		spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
 		/* wake up the sleep thread on the Tx msg */
 		complete(&tx_msg->tx_complete);
+		acpi_ipmi_msg_put(tx_msg);
+		spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+	}
+	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+}
+
+static void ipmi_cancel_tx_msg(struct acpi_ipmi_device *ipmi,
+			       struct acpi_ipmi_msg *msg)
+{
+	struct acpi_ipmi_msg *tx_msg, *temp;
+	bool msg_found = false;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ipmi->tx_msg_lock, flags);
+	list_for_each_entry_safe(tx_msg, temp, &ipmi->tx_msg_list, head) {
+		if (msg == tx_msg) {
+			msg_found = true;
+			list_del(&tx_msg->head);
+			break;
+		}
 	}
 	spin_unlock_irqrestore(&ipmi->tx_msg_lock, flags);
+
+	if (msg_found)
+		acpi_ipmi_msg_put(tx_msg);
 }
 
 static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 {
 	struct acpi_ipmi_device *ipmi_device = user_msg_data;
-	int msg_found = 0;
-	struct acpi_ipmi_msg *tx_msg;
+	bool msg_found = false;
+	struct acpi_ipmi_msg *tx_msg, *temp;
 	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
 	unsigned long flags;
 
@@ -339,17 +399,19 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		goto out_msg;
 	}
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
-	list_for_each_entry(tx_msg, &ipmi_device->tx_msg_list, head) {
+	list_for_each_entry_safe(tx_msg, temp, &ipmi_device->tx_msg_list, head) {
 		if (msg->msgid == tx_msg->tx_msgid) {
-			msg_found = 1;
+			msg_found = true;
+			list_del(&tx_msg->head);
 			break;
 		}
 	}
+	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
 		dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
 			"returned.\n", msg->msgid);
-		goto out_lock;
+		goto out_msg;
 	}
 
 	/* copy the response data to Rx_data buffer */
@@ -375,8 +437,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	tx_msg->msg_done = ACPI_IPMI_OK;
 out_comp:
 	complete(&tx_msg->tx_complete);
-out_lock:
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
+	acpi_ipmi_msg_put(tx_msg);
 out_msg:
 	ipmi_free_recv_msg(msg);
 };
@@ -491,26 +552,23 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	if ((function & ACPI_IO_MASK) == ACPI_READ)
 		return AE_TYPE;
 
-	ipmi_device = acpi_ipmi_dev_get();
-	if (!ipmi_device)
+	tx_msg = ipmi_msg_alloc();
+	if (!tx_msg)
 		return AE_NOT_EXIST;
 
-	tx_msg = acpi_alloc_ipmi_msg(ipmi_device);
-	if (!tx_msg) {
-		status = AE_NO_MEMORY;
-		goto out_ref;
-	}
+	ipmi_device = tx_msg->device;
 
 	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
-		status = AE_TYPE;
-		goto out_msg;
+		ipmi_msg_release(tx_msg);
+		return AE_TYPE;
 	}
+	acpi_ipmi_msg_get(tx_msg);
 	mutex_lock(&driver_data.ipmi_lock);
 	/* Do not add a tx_msg that can not be flushed. */
 	if (ipmi_device->dead) {
-		status = AE_NOT_EXIST;
 		mutex_unlock(&driver_data.ipmi_lock);
-		goto out_msg;
+		ipmi_msg_release(tx_msg);
+		return AE_NOT_EXIST;
 	}
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
@@ -523,20 +581,15 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 					NULL, 0, 0, IPMI_TIMEOUT);
 	if (err) {
 		status = AE_ERROR;
-		goto out_list;
+		goto out_msg;
 	}
 	wait_for_completion(&tx_msg->tx_complete);
 	acpi_format_ipmi_response(tx_msg, value);
 	status = AE_OK;
 
-out_list:
-	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
-	list_del(&tx_msg->head);
-	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 out_msg:
-	kfree(tx_msg);
-out_ref:
-	acpi_ipmi_dev_put(ipmi_device);
+	ipmi_cancel_tx_msg(ipmi_device, tx_msg);
+	acpi_ipmi_msg_put(tx_msg);
 	return status;
 }
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 08/12] ACPI/IPMI: Cleanup several acpi_ipmi_device members
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (6 preceding siblings ...)
  2013-09-13  5:14   ` [PATCH v2 07/12] ACPI/IPMI: Add reference counting for ACPI IPMI transfers Lv Zheng
@ 2013-09-13  5:14   ` Lv Zheng
  2013-09-13  5:14   ` [PATCH v2 09/12] ACPI/IPMI: Cleanup some initialization codes Lv Zheng
                     ` (4 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:14 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes a member of the acpi_ipmi_device - smi_data which is not
   actually used.
2. Updates a member of the acpi_ipmi_device - pnp_dev which is only used
   by dev_warn() invocations, so changes it to struct device.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   30 +++++++++++++-----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index b9da5ef..90d57c8 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -61,11 +61,10 @@ struct acpi_ipmi_device {
 	struct list_head tx_msg_list;
 	spinlock_t	tx_msg_lock;
 	acpi_handle handle;
-	struct pnp_dev *pnp_dev;
+	struct device *dev;
 	ipmi_user_t	user_interface;
 	int ipmi_ifnum; /* IPMI interface number */
 	long curr_msgid;
-	struct ipmi_smi_info smi_data;
 	bool dead;
 	struct kref kref;
 };
@@ -131,7 +130,7 @@ static struct ipmi_driver_data driver_data = {
 };
 
 static struct acpi_ipmi_device *
-ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
+ipmi_dev_alloc(int iface, struct device *dev, acpi_handle handle)
 {
 	struct acpi_ipmi_device *ipmi_device;
 	int err;
@@ -147,14 +146,13 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
 	spin_lock_init(&ipmi_device->tx_msg_lock);
 
 	ipmi_device->handle = handle;
-	ipmi_device->pnp_dev = to_pnp_dev(get_device(smi_data->dev));
-	memcpy(&ipmi_device->smi_data, smi_data, sizeof(struct ipmi_smi_info));
+	ipmi_device->dev = get_device(dev);
 	ipmi_device->ipmi_ifnum = iface;
 
 	err = ipmi_create_user(iface, &driver_data.ipmi_hndlrs,
 			       ipmi_device, &user);
 	if (err) {
-		put_device(smi_data->dev);
+		put_device(dev);
 		kfree(ipmi_device);
 		return NULL;
 	}
@@ -166,7 +164,7 @@ ipmi_dev_alloc(int iface, struct ipmi_smi_info *smi_data, acpi_handle handle)
 static void ipmi_dev_release(struct acpi_ipmi_device *ipmi_device)
 {
 	ipmi_destroy_user(ipmi_device->user_interface);
-	put_device(ipmi_device->smi_data.dev);
+	put_device(ipmi_device->dev);
 	kfree(ipmi_device);
 }
 
@@ -282,7 +280,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	buffer = (struct acpi_ipmi_buffer *)value;
 	/* copy the tx message data */
 	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
-		dev_WARN_ONCE(&tx_msg->device->pnp_dev->dev, true,
+		dev_WARN_ONCE(tx_msg->device->dev, true,
 			      "Unexpected request (msg len %d).\n",
 			      buffer->length);
 		return -EINVAL;
@@ -389,11 +387,11 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	struct acpi_ipmi_device *ipmi_device = user_msg_data;
 	bool msg_found = false;
 	struct acpi_ipmi_msg *tx_msg, *temp;
-	struct pnp_dev *pnp_dev = ipmi_device->pnp_dev;
+	struct device *dev = ipmi_device->dev;
 	unsigned long flags;
 
 	if (msg->user != ipmi_device->user_interface) {
-		dev_warn(&pnp_dev->dev, "Unexpected response is returned. "
+		dev_warn(dev, "Unexpected response is returned. "
 			"returned user %p, expected user %p\n",
 			msg->user, ipmi_device->user_interface);
 		goto out_msg;
@@ -409,14 +407,14 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
-		dev_warn(&pnp_dev->dev, "Unexpected response (msg id %ld) is "
+		dev_warn(dev, "Unexpected response (msg id %ld) is "
 			"returned.\n", msg->msgid);
 		goto out_msg;
 	}
 
 	/* copy the response data to Rx_data buffer */
 	if (msg->msg.data_len > ACPI_IPMI_MAX_MSG_LENGTH) {
-		dev_WARN_ONCE(&pnp_dev->dev, true,
+		dev_WARN_ONCE(dev, true,
 			      "Unexpected response (msg len %d).\n",
 			      msg->msg.data_len);
 		goto out_comp;
@@ -426,7 +424,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	if (msg->recv_type == IPMI_RESPONSE_RECV_TYPE &&
 	    msg->msg.data_len == 1) {
 		if (msg->msg.data[0] == IPMI_TIMEOUT_COMPLETION_CODE) {
-			dev_WARN_ONCE(&pnp_dev->dev, true,
+			dev_WARN_ONCE(dev, true,
 				      "Unexpected response (timeout).\n");
 			tx_msg->msg_done = ACPI_IPMI_TIMEOUT;
 		}
@@ -445,7 +443,6 @@ out_msg:
 static void ipmi_register_bmc(int iface, struct device *dev)
 {
 	struct acpi_ipmi_device *ipmi_device, *temp;
-	struct pnp_dev *pnp_dev;
 	int err;
 	struct ipmi_smi_info smi_data;
 	acpi_handle handle;
@@ -460,11 +457,10 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 	handle = smi_data.addr_info.acpi_info.acpi_handle;
 	if (!handle)
 		goto err_ref;
-	pnp_dev = to_pnp_dev(smi_data.dev);
 
-	ipmi_device = ipmi_dev_alloc(iface, &smi_data, handle);
+	ipmi_device = ipmi_dev_alloc(iface, smi_data.dev, handle);
 	if (!ipmi_device) {
-		dev_warn(&pnp_dev->dev, "Can't create IPMI user interface\n");
+		dev_warn(smi_data.dev, "Can't create IPMI user interface\n");
 		goto err_ref;
 	}
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 09/12] ACPI/IPMI: Cleanup some initialization codes
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (7 preceding siblings ...)
  2013-09-13  5:14   ` [PATCH v2 08/12] ACPI/IPMI: Cleanup several acpi_ipmi_device members Lv Zheng
@ 2013-09-13  5:14   ` Lv Zheng
  2013-09-13  5:14   ` [PATCH v2 10/12] ACPI/IPMI: Cleanup some inclusion codes Lv Zheng
                     ` (3 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:14 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This is a trivial patch.
1. Changes dynamic mutex initialization to static initialization.
2. Removes one acpi_ipmi_init() variable initialization as it is not
   needed.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |    7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 90d57c8..f7b6598 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -127,6 +127,7 @@ static struct ipmi_driver_data driver_data = {
 	.ipmi_hndlrs = {
 		.ipmi_recv_hndl = ipmi_msg_handler,
 	},
+	.ipmi_lock = __MUTEX_INITIALIZER(driver_data.ipmi_lock)
 };
 
 static struct acpi_ipmi_device *
@@ -591,13 +592,11 @@ out_msg:
 
 static int __init acpi_ipmi_init(void)
 {
-	int result = 0;
+	int result;
 	acpi_status status;
 
 	if (acpi_disabled)
-		return result;
-
-	mutex_init(&driver_data.ipmi_lock);
+		return 0;
 
 	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
 				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler,
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 10/12] ACPI/IPMI: Cleanup some inclusion codes
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (8 preceding siblings ...)
  2013-09-13  5:14   ` [PATCH v2 09/12] ACPI/IPMI: Cleanup some initialization codes Lv Zheng
@ 2013-09-13  5:14   ` Lv Zheng
  2013-09-13  5:14   ` [PATCH v2 11/12] ACPI/IPMI: Cleanup some Kconfig codes Lv Zheng
                     ` (2 subsequent siblings)
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:14 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This is a trivial patch:
1. Deletes several useless header inclusions.
2. Kernel codes should always include <linux/acpi.h> instead of
   <acpi/acpi_bus.h> or <acpi/acpi_drivers.h> where many conditional
   declarations are handled.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/acpi_ipmi.c |   15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index f7b6598..9d187fd 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -24,22 +24,9 @@
  * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  */
 
-#include <linux/kernel.h>
 #include <linux/module.h>
-#include <linux/init.h>
-#include <linux/types.h>
-#include <linux/delay.h>
-#include <linux/proc_fs.h>
-#include <linux/seq_file.h>
-#include <linux/interrupt.h>
-#include <linux/list.h>
-#include <linux/spinlock.h>
-#include <linux/io.h>
-#include <acpi/acpi_bus.h>
-#include <acpi/acpi_drivers.h>
+#include <linux/acpi.h>
 #include <linux/ipmi.h>
-#include <linux/device.h>
-#include <linux/pnp.h>
 #include <linux/spinlock.h>
 
 MODULE_AUTHOR("Zhao Yakui");
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 11/12] ACPI/IPMI: Cleanup some Kconfig codes
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (9 preceding siblings ...)
  2013-09-13  5:14   ` [PATCH v2 10/12] ACPI/IPMI: Cleanup some inclusion codes Lv Zheng
@ 2013-09-13  5:14   ` Lv Zheng
  2013-09-13  5:15   ` [PATCH v2 12/12] ACPI/IPMI: Cleanup coding styles Lv Zheng
  2013-09-25 17:54   ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Rafael J. Wysocki
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:14 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: linux-acpi, openipmi-developer, Lv Zheng

This is a trivial patch:
1. Deletes duplicate Kconfig dependency as there is "if IPMI_HANDLER"
   around "IPMI_SI".

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
Reviewed-by: Huang Ying <ying.huang@intel.com>
---
 drivers/acpi/Kconfig |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
index 3278a21..d129869 100644
--- a/drivers/acpi/Kconfig
+++ b/drivers/acpi/Kconfig
@@ -181,9 +181,10 @@ config ACPI_PROCESSOR
 
 	  To compile this driver as a module, choose M here:
 	  the module will be called processor.
+
 config ACPI_IPMI
 	tristate "IPMI"
-	depends on IPMI_SI && IPMI_HANDLER
+	depends on IPMI_SI
 	default n
 	help
 	  This driver enables the ACPI to access the BMC controller. And it
-- 
1.7.10


------------------------------------------------------------------------------
How ServiceNow helps IT people transform IT departments:
1. Consolidate legacy IT systems to a single system of record for IT
2. Standardize and globalize service processes across IT
3. Implement zero-touch automation to replace manual, redundant tasks
http://pubads.g.doubleclick.net/gampad/clk?id=51271111&iu=/4140/ostg.clktrk

^ permalink raw reply related	[flat|nested] 99+ messages in thread

* [PATCH v2 12/12] ACPI/IPMI: Cleanup coding styles
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (10 preceding siblings ...)
  2013-09-13  5:14   ` [PATCH v2 11/12] ACPI/IPMI: Cleanup some Kconfig codes Lv Zheng
@ 2013-09-13  5:15   ` Lv Zheng
  2013-09-25 17:54   ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Rafael J. Wysocki
  12 siblings, 0 replies; 99+ messages in thread
From: Lv Zheng @ 2013-09-13  5:15 UTC (permalink / raw)
  To: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui
  Cc: Lv Zheng, linux-acpi, openipmi-developer

This patch only introduces indentation cleanups.  No functional changes.

Signed-off-by: Lv Zheng <lv.zheng@intel.com>
---
 drivers/acpi/acpi_ipmi.c |  105 ++++++++++++++++++++++++++++------------------
 1 file changed, 65 insertions(+), 40 deletions(-)

diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c
index 9d187fd..ac0f52f 100644
--- a/drivers/acpi/acpi_ipmi.c
+++ b/drivers/acpi/acpi_ipmi.c
@@ -33,7 +33,6 @@ MODULE_AUTHOR("Zhao Yakui");
 MODULE_DESCRIPTION("ACPI IPMI Opregion driver");
 MODULE_LICENSE("GPL");
 
-
 #define ACPI_IPMI_OK			0
 #define ACPI_IPMI_TIMEOUT		0x10
 #define ACPI_IPMI_UNKNOWN		0x07
@@ -44,12 +43,14 @@ MODULE_LICENSE("GPL");
 struct acpi_ipmi_device {
 	/* the device list attached to driver_data.ipmi_devices */
 	struct list_head head;
+
 	/* the IPMI request message list */
 	struct list_head tx_msg_list;
-	spinlock_t	tx_msg_lock;
+
+	spinlock_t tx_msg_lock;
 	acpi_handle handle;
 	struct device *dev;
-	ipmi_user_t	user_interface;
+	ipmi_user_t user_interface;
 	int ipmi_ifnum; /* IPMI interface number */
 	long curr_msgid;
 	bool dead;
@@ -57,10 +58,11 @@ struct acpi_ipmi_device {
 };
 
 struct ipmi_driver_data {
-	struct list_head	ipmi_devices;
-	struct ipmi_smi_watcher	bmc_events;
-	struct ipmi_user_hndl	ipmi_hndlrs;
-	struct mutex		ipmi_lock;
+	struct list_head ipmi_devices;
+	struct ipmi_smi_watcher bmc_events;
+	struct ipmi_user_hndl ipmi_hndlrs;
+	struct mutex ipmi_lock;
+
 	/*
 	 * NOTE: IPMI System Interface Selection
 	 * There is no system interface specified by the IPMI operation
@@ -73,6 +75,7 @@ struct ipmi_driver_data {
 
 struct acpi_ipmi_msg {
 	struct list_head head;
+
 	/*
 	 * General speaking the addr type should be SI_ADDR_TYPE. And
 	 * the addr channel should be BMC.
@@ -82,15 +85,19 @@ struct acpi_ipmi_msg {
 	 */
 	struct ipmi_addr addr;
 	long tx_msgid;
+
 	/* it is used to track whether the IPMI message is finished */
 	struct completion tx_complete;
+
 	struct kernel_ipmi_msg tx_message;
-	int	msg_done;
+	int msg_done;
+
 	/* tx/rx data . And copy it from/to ACPI object buffer */
-	u8	data[ACPI_IPMI_MAX_MSG_LENGTH];
-	u8	rx_len;
+	u8 data[ACPI_IPMI_MAX_MSG_LENGTH];
+	u8 rx_len;
+
 	struct acpi_ipmi_device *device;
-	struct kref	kref;
+	struct kref kref;
 };
 
 /* IPMI request/response buffer per ACPI 4.0, sec 5.5.2.4.3.2 */
@@ -132,7 +139,6 @@ ipmi_dev_alloc(int iface, struct device *dev, acpi_handle handle)
 	INIT_LIST_HEAD(&ipmi_device->head);
 	INIT_LIST_HEAD(&ipmi_device->tx_msg_list);
 	spin_lock_init(&ipmi_device->tx_msg_lock);
-
 	ipmi_device->handle = handle;
 	ipmi_device->dev = get_device(dev);
 	ipmi_device->ipmi_ifnum = iface;
@@ -169,6 +175,7 @@ static void __ipmi_dev_kill(struct acpi_ipmi_device *ipmi_device)
 	list_del(&ipmi_device->head);
 	if (driver_data.selected_smi == ipmi_device)
 		driver_data.selected_smi = NULL;
+
 	/*
 	 * Always setting dead flag after deleting from the list or
 	 * list_for_each_entry() codes must get changed.
@@ -203,16 +210,19 @@ static struct acpi_ipmi_msg *ipmi_msg_alloc(void)
 	ipmi = acpi_ipmi_dev_get();
 	if (!ipmi)
 		return NULL;
+
 	ipmi_msg = kzalloc(sizeof(struct acpi_ipmi_msg), GFP_KERNEL);
 	if (!ipmi_msg) {
 		acpi_ipmi_dev_put(ipmi);
 		return NULL;
 	}
+
 	kref_init(&ipmi_msg->kref);
 	init_completion(&ipmi_msg->tx_complete);
 	INIT_LIST_HEAD(&ipmi_msg->head);
 	ipmi_msg->device = ipmi;
 	ipmi_msg->msg_done = ACPI_IPMI_UNKNOWN;
+
 	return ipmi_msg;
 }
 
@@ -242,11 +252,11 @@ static void acpi_ipmi_msg_put(struct acpi_ipmi_msg *tx_msg)
 	kref_put(&tx_msg->kref, ipmi_msg_release_kref);
 }
 
-#define		IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
-#define		IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
+#define IPMI_OP_RGN_NETFN(offset)	((offset >> 8) & 0xff)
+#define IPMI_OP_RGN_CMD(offset)		(offset & 0xff)
 static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
-				acpi_physical_address address,
-				acpi_integer *value)
+				    acpi_physical_address address,
+				    acpi_integer *value)
 {
 	struct kernel_ipmi_msg *msg;
 	struct acpi_ipmi_buffer *buffer;
@@ -254,6 +264,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	unsigned long flags;
 
 	msg = &tx_msg->tx_message;
+
 	/*
 	 * IPMI network function and command are encoded in the address
 	 * within the IPMI OpRegion; see ACPI 4.0, sec 5.5.2.4.3.
@@ -261,11 +272,13 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	msg->netfn = IPMI_OP_RGN_NETFN(address);
 	msg->cmd = IPMI_OP_RGN_CMD(address);
 	msg->data = tx_msg->data;
+
 	/*
 	 * value is the parameter passed by the IPMI opregion space handler.
 	 * It points to the IPMI request message buffer
 	 */
 	buffer = (struct acpi_ipmi_buffer *)value;
+
 	/* copy the tx message data */
 	if (buffer->length > ACPI_IPMI_MAX_MSG_LENGTH) {
 		dev_WARN_ONCE(tx_msg->device->dev, true,
@@ -275,6 +288,7 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 	}
 	msg->data_len = buffer->length;
 	memcpy(tx_msg->data, buffer->data, msg->data_len);
+
 	/*
 	 * now the default type is SYSTEM_INTERFACE and channel type is BMC.
 	 * If the netfn is APP_REQUEST and the cmd is SEND_MESSAGE,
@@ -288,15 +302,17 @@ static int acpi_format_ipmi_request(struct acpi_ipmi_msg *tx_msg,
 
 	/* Get the msgid */
 	device = tx_msg->device;
+
 	spin_lock_irqsave(&device->tx_msg_lock, flags);
 	device->curr_msgid++;
 	tx_msg->tx_msgid = device->curr_msgid;
 	spin_unlock_irqrestore(&device->tx_msg_lock, flags);
+
 	return 0;
 }
 
 static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
-		acpi_integer *value)
+				      acpi_integer *value)
 {
 	struct acpi_ipmi_buffer *buffer;
 
@@ -305,6 +321,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 	 * IPMI message returned by IPMI command.
 	 */
 	buffer = (struct acpi_ipmi_buffer *)value;
+
 	/*
 	 * If the flag of msg_done is not set, it means that the IPMI command is
 	 * not executed correctly.
@@ -312,6 +329,7 @@ static void acpi_format_ipmi_response(struct acpi_ipmi_msg *msg,
 	buffer->status = msg->msg_done;
 	if (msg->msg_done != ACPI_IPMI_OK)
 		return;
+
 	/*
 	 * If the IPMI response message is obtained correctly, the status code
 	 * will be ACPI_IPMI_OK
@@ -379,11 +397,12 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	unsigned long flags;
 
 	if (msg->user != ipmi_device->user_interface) {
-		dev_warn(dev, "Unexpected response is returned. "
-			"returned user %p, expected user %p\n",
-			msg->user, ipmi_device->user_interface);
+		dev_warn(dev,
+			 "Unexpected response is returned. returned user %p, expected user %p\n",
+			 msg->user, ipmi_device->user_interface);
 		goto out_msg;
 	}
+
 	spin_lock_irqsave(&ipmi_device->tx_msg_lock, flags);
 	list_for_each_entry_safe(tx_msg, temp, &ipmi_device->tx_msg_list, head) {
 		if (msg->msgid == tx_msg->tx_msgid) {
@@ -395,8 +414,9 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 
 	if (!msg_found) {
-		dev_warn(dev, "Unexpected response (msg id %ld) is "
-			"returned.\n", msg->msgid);
+		dev_warn(dev,
+			 "Unexpected response (msg id %ld) is returned.\n",
+			 msg->msgid);
 		goto out_msg;
 	}
 
@@ -407,6 +427,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 			      msg->msg.data_len);
 		goto out_comp;
 	}
+
 	/* response msg is an error msg */
 	msg->recv_type = IPMI_RESPONSE_RECV_TYPE;
 	if (msg->recv_type == IPMI_RESPONSE_RECV_TYPE &&
@@ -418,15 +439,17 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
 		}
 		goto out_comp;
 	}
+
 	tx_msg->rx_len = msg->msg.data_len;
 	memcpy(tx_msg->data, msg->msg.data, tx_msg->rx_len);
 	tx_msg->msg_done = ACPI_IPMI_OK;
+
 out_comp:
 	complete(&tx_msg->tx_complete);
 	acpi_ipmi_msg_put(tx_msg);
 out_msg:
 	ipmi_free_recv_msg(msg);
-};
+}
 
 static void ipmi_register_bmc(int iface, struct device *dev)
 {
@@ -436,7 +459,6 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 	acpi_handle handle;
 
 	err = ipmi_get_smi_info(iface, &smi_data);
-
 	if (err)
 		return;
 
@@ -461,11 +483,11 @@ static void ipmi_register_bmc(int iface, struct device *dev)
 		if (temp->handle == handle)
 			goto err_lock;
 	}
-
 	if (!driver_data.selected_smi)
 		driver_data.selected_smi = ipmi_device;
 	list_add_tail(&ipmi_device->head, &driver_data.ipmi_devices);
 	mutex_unlock(&driver_data.ipmi_lock);
+
 	put_device(smi_data.dev);
 	return;
 
@@ -484,7 +506,7 @@ static void ipmi_bmc_gone(int iface)
 
 	mutex_lock(&driver_data.ipmi_lock);
 	list_for_each_entry_safe(ipmi_device, temp,
-				&driver_data.ipmi_devices, head) {
+				 &driver_data.ipmi_devices, head) {
 		if (ipmi_device->ipmi_ifnum != iface) {
 			dev_found = true;
 			__ipmi_dev_kill(ipmi_device);
@@ -496,14 +518,13 @@ static void ipmi_bmc_gone(int iface)
 					&driver_data.ipmi_devices,
 					struct acpi_ipmi_device, head);
 	mutex_unlock(&driver_data.ipmi_lock);
+
 	if (dev_found) {
 		ipmi_flush_tx_msg(ipmi_device);
 		acpi_ipmi_dev_put(ipmi_device);
 	}
 }
-/* --------------------------------------------------------------------------
- *			Address Space Management
- * -------------------------------------------------------------------------- */
+
 /*
  * This is the IPMI opregion space handler.
  * @function: indicates the read/write. In fact as the IPMI message is driven
@@ -516,17 +537,17 @@ static void ipmi_bmc_gone(int iface)
  *	     the response IPMI message returned by IPMI command.
  * @handler_context: IPMI device context.
  */
-
 static acpi_status
 acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
-		      u32 bits, acpi_integer *value,
-		      void *handler_context, void *region_context)
+			u32 bits, acpi_integer *value,
+			void *handler_context, void *region_context)
 {
 	struct acpi_ipmi_msg *tx_msg;
 	struct acpi_ipmi_device *ipmi_device;
 	int err;
 	acpi_status status;
 	unsigned long flags;
+
 	/*
 	 * IPMI opregion message.
 	 * IPMI message is firstly written to the BMC and system software
@@ -539,13 +560,13 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	tx_msg = ipmi_msg_alloc();
 	if (!tx_msg)
 		return AE_NOT_EXIST;
-
 	ipmi_device = tx_msg->device;
 
 	if (acpi_format_ipmi_request(tx_msg, address, value) != 0) {
 		ipmi_msg_release(tx_msg);
 		return AE_TYPE;
 	}
+
 	acpi_ipmi_msg_get(tx_msg);
 	mutex_lock(&driver_data.ipmi_lock);
 	/* Do not add a tx_msg that can not be flushed. */
@@ -558,16 +579,18 @@ acpi_ipmi_space_handler(u32 function, acpi_physical_address address,
 	list_add_tail(&tx_msg->head, &ipmi_device->tx_msg_list);
 	spin_unlock_irqrestore(&ipmi_device->tx_msg_lock, flags);
 	mutex_unlock(&driver_data.ipmi_lock);
+
 	err = ipmi_request_settime(ipmi_device->user_interface,
-					&tx_msg->addr,
-					tx_msg->tx_msgid,
-					&tx_msg->tx_message,
-					NULL, 0, 0, IPMI_TIMEOUT);
+				   &tx_msg->addr,
+				   tx_msg->tx_msgid,
+				   &tx_msg->tx_message,
+				   NULL, 0, 0, IPMI_TIMEOUT);
 	if (err) {
 		status = AE_ERROR;
 		goto out_msg;
 	}
 	wait_for_completion(&tx_msg->tx_complete);
+
 	acpi_format_ipmi_response(tx_msg, value);
 	status = AE_OK;
 
@@ -586,8 +609,9 @@ static int __init acpi_ipmi_init(void)
 		return 0;
 
 	status = acpi_install_address_space_handler(ACPI_ROOT_OBJECT,
-				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler,
-				NULL, NULL);
+						    ACPI_ADR_SPACE_IPMI,
+						    &acpi_ipmi_space_handler,
+						    NULL, NULL);
 	if (ACPI_FAILURE(status)) {
 		pr_warn("Can't register IPMI opregion space handle\n");
 		return -EINVAL;
@@ -629,7 +653,8 @@ static void __exit acpi_ipmi_exit(void)
 	}
 	mutex_unlock(&driver_data.ipmi_lock);
 	acpi_remove_address_space_handler(ACPI_ROOT_OBJECT,
-				ACPI_ADR_SPACE_IPMI, &acpi_ipmi_space_handler);
+					  ACPI_ADR_SPACE_IPMI,
+					  &acpi_ipmi_space_handler);
 }
 
 module_init(acpi_ipmi_init);
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 99+ messages in thread

* Re: [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes
  2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
                     ` (11 preceding siblings ...)
  2013-09-13  5:15   ` [PATCH v2 12/12] ACPI/IPMI: Cleanup coding styles Lv Zheng
@ 2013-09-25 17:54   ` Rafael J. Wysocki
  2013-09-27  7:44     ` Zheng, Lv
  12 siblings, 1 reply; 99+ messages in thread
From: Rafael J. Wysocki @ 2013-09-25 17:54 UTC (permalink / raw)
  To: Lv Zheng
  Cc: Rafael J. Wysocki, Len Brown, Corey Minyard, Zhao Yakui,
	linux-acpi, openipmi-developer

On Friday, September 13, 2013 01:13:01 PM Lv Zheng wrote:
> This patchset tries to fix the following kernel bug:
> Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
> This is fixed by [PATCH 06].
> 
> The bug shows IPMI operation region may appear in a device not under the
> IPMI system interface device's scope, thus it's required to install the
> ACPI IPMI operation region handler from the root of the ACPI namespace.
> 
> The original acpi_ipmi implementation includes several issues that break
> the test process.  This patchset also includes a re-design of acpi_ipmi
> module to make the test possible.
>  
> [PATCH 01-06] are bug-fix patches that can be applied to the kernels whose
>               version is > 2.6.38.  This can be confirmed with:
>               # git tag --contains e92b297c
> [PATCH 07] is a tuning patch for acpi_ipmi.c.
> [PATCH 08-12] are cleanup patches for acpi_ipmi.c and its Kconfig item.
> 
> v2.0
> PATCH 03: Uses timeout mechanism offerred by ipmi_si.
> PATCH 05: Uses kref instead of atomic_t and adds "dead" flag to kill the
>           atomic_read codes in ipmi_flush_tx_msg().
> PATCH 07: Uses kref instead of atomic_t.
> 
> This patchset has passed the test around a fake device accessing IPMI
> operation region fields on an IPMI capable platform.  A stress test of
> module(acpi_ipmi) load/unload has been performed on such platform.  No
> races can be found and the IPMI operation region handler is functioning
> now.  It is not possible to test module(ipmi_si) load/unload as it can't be
> unloaded due to its' transfer flushing implementation.
> 
> Lv Zheng (12):
>   ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
>   ACPI/IPMI: Fix potential response buffer overflow
>   ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
>   ACPI/IPMI: Fix race caused by the timed out ACPI IPMI transfers
>   ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
>   ACPI/IPMI: Fix issue caused by the per-device registration of the
>     IPMI operation region handler
>   ACPI/IPMI: Add reference counting for ACPI IPMI transfers
>   ACPI/IPMI: Cleanup several acpi_ipmi_device members
>   ACPI/IPMI: Cleanup some initialization codes
>   ACPI/IPMI: Cleanup some inclusion codes
>   ACPI/IPMI: Cleanup some Kconfig codes
>   ACPI/IPMI: Cleanup coding styles
> 
>  drivers/acpi/Kconfig     |    3 +-
>  drivers/acpi/acpi_ipmi.c |  594 ++++++++++++++++++++++++++++------------------
>  2 files changed, 367 insertions(+), 230 deletions(-)

Queued up for 3.13, thanks Lv!


^ permalink raw reply	[flat|nested] 99+ messages in thread

* RE: [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes
  2013-09-25 17:54   ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Rafael J. Wysocki
@ 2013-09-27  7:44     ` Zheng, Lv
  0 siblings, 0 replies; 99+ messages in thread
From: Zheng, Lv @ 2013-09-27  7:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Wysocki, Rafael J, Brown, Len, Corey Minyard, Zhao, Yakui,
	linux-acpi, openipmi-developer

> From: Rafael J. Wysocki [mailto:rjw@rjwysocki.net]
> Sent: Thursday, September 26, 2013 1:55 AM
> 
> On Friday, September 13, 2013 01:13:01 PM Lv Zheng wrote:
> > This patchset tries to fix the following kernel bug:
> > Buglink: https://bugzilla.kernel.org/show_bug.cgi?id=46741
> > This is fixed by [PATCH 06].
> >
> > The bug shows IPMI operation region may appear in a device not under the
> > IPMI system interface device's scope, thus it's required to install the
> > ACPI IPMI operation region handler from the root of the ACPI namespace.
> >
> > The original acpi_ipmi implementation includes several issues that break
> > the test process.  This patchset also includes a re-design of acpi_ipmi
> > module to make the test possible.
> >
> > [PATCH 01-06] are bug-fix patches that can be applied to the kernels whose
> >               version is > 2.6.38.  This can be confirmed with:
> >               # git tag --contains e92b297c
> > [PATCH 07] is a tuning patch for acpi_ipmi.c.
> > [PATCH 08-12] are cleanup patches for acpi_ipmi.c and its Kconfig item.
> >
> > v2.0
> > PATCH 03: Uses timeout mechanism offerred by ipmi_si.
> > PATCH 05: Uses kref instead of atomic_t and adds "dead" flag to kill the
> >           atomic_read codes in ipmi_flush_tx_msg().
> > PATCH 07: Uses kref instead of atomic_t.
> >
> > This patchset has passed the test around a fake device accessing IPMI
> > operation region fields on an IPMI capable platform.  A stress test of
> > module(acpi_ipmi) load/unload has been performed on such platform.  No
> > races can be found and the IPMI operation region handler is functioning
> > now.  It is not possible to test module(ipmi_si) load/unload as it can't be
> > unloaded due to its' transfer flushing implementation.
> >
> > Lv Zheng (12):
> >   ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler()
> >   ACPI/IPMI: Fix potential response buffer overflow
> >   ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers
> >   ACPI/IPMI: Fix race caused by the timed out ACPI IPMI transfers
> >   ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user
> >   ACPI/IPMI: Fix issue caused by the per-device registration of the
> >     IPMI operation region handler
> >   ACPI/IPMI: Add reference counting for ACPI IPMI transfers
> >   ACPI/IPMI: Cleanup several acpi_ipmi_device members
> >   ACPI/IPMI: Cleanup some initialization codes
> >   ACPI/IPMI: Cleanup some inclusion codes
> >   ACPI/IPMI: Cleanup some Kconfig codes
> >   ACPI/IPMI: Cleanup coding styles
> >
> >  drivers/acpi/Kconfig     |    3 +-
> >  drivers/acpi/acpi_ipmi.c |  594 ++++++++++++++++++++++++++++------------------
> >  2 files changed, 367 insertions(+), 230 deletions(-)
> 
> Queued up for 3.13, thanks Lv!

Thanks for the helping.

Cheers
-Lv

^ permalink raw reply	[flat|nested] 99+ messages in thread

end of thread, other threads:[~2013-09-27  7:44 UTC | newest]

Thread overview: 99+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <cover.1370652213.git.lv.zheng@intel.com>
2013-07-23  8:08 ` [PATCH 00/13] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
2013-07-23  8:08   ` Lv Zheng
2013-07-23  8:08   ` [PATCH 01/13] ACPI/IPMI: Fix potential response buffer overflow Lv Zheng
2013-07-23  8:08     ` Lv Zheng
2013-07-23 14:54     ` Greg KH
2013-07-24  0:21       ` Zheng, Lv
2013-07-24  0:21         ` Zheng, Lv
2013-07-24  0:44       ` Zheng, Lv
2013-07-24  0:44         ` Zheng, Lv
2013-07-23  8:09   ` [PATCH 02/13] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler() Lv Zheng
2013-07-23  8:09     ` Lv Zheng
2013-07-23  8:09   ` [PATCH 03/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers Lv Zheng
2013-07-23  8:09     ` Lv Zheng
2013-07-24 23:38     ` Rafael J. Wysocki
2013-07-25  3:09       ` Zheng, Lv
2013-07-25  3:09         ` Zheng, Lv
2013-07-25 12:06         ` Rafael J. Wysocki
2013-07-25 12:06           ` Rafael J. Wysocki
2013-07-25 18:12           ` Corey Minyard
2013-07-25 18:12             ` Corey Minyard
2013-07-25 19:32             ` Rafael J. Wysocki
2013-07-25 19:32               ` Rafael J. Wysocki
2013-07-26  0:18               ` Zheng, Lv
2013-07-26  0:18                 ` Zheng, Lv
2013-07-26  0:16             ` Zheng, Lv
2013-07-26  0:16               ` Zheng, Lv
2013-07-26  0:48               ` Corey Minyard
2013-07-26  0:48                 ` Corey Minyard
2013-07-26  1:30                 ` Zheng, Lv
2013-07-26  1:30                   ` Zheng, Lv
2013-07-26  0:09           ` Zheng, Lv
2013-07-26  0:09             ` Zheng, Lv
2013-07-23  8:09   ` [PATCH 04/13] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user Lv Zheng
2013-07-23  8:09     ` Lv Zheng
2013-07-25 21:59     ` Rafael J. Wysocki
2013-07-26  1:17       ` Zheng, Lv
2013-07-26  1:17         ` Zheng, Lv
2013-07-23  8:09   ` [PATCH 05/13] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler Lv Zheng
2013-07-23  8:09     ` Lv Zheng
2013-07-23  8:09   ` [PATCH 06/13] ACPI/IPMI: Add reference counting for ACPI operation region handlers Lv Zheng
2013-07-23  8:09     ` Lv Zheng
2013-07-25 20:27     ` Rafael J. Wysocki
2013-07-26  0:47       ` Zheng, Lv
2013-07-26  0:47         ` Zheng, Lv
2013-07-26  8:09         ` Zheng, Lv
2013-07-26  8:09           ` Zheng, Lv
2013-07-26 14:00         ` Rafael J. Wysocki
2013-07-26 14:00           ` Rafael J. Wysocki
2013-07-29  1:43           ` Zheng, Lv
2013-07-29  1:43             ` Zheng, Lv
2013-07-25 21:29     ` Rafael J. Wysocki
2013-07-26  1:54       ` Zheng, Lv
2013-07-26  1:54         ` Zheng, Lv
2013-07-26  8:15         ` Zheng, Lv
2013-07-26  8:15           ` Zheng, Lv
2013-07-26 14:49         ` Rafael J. Wysocki
2013-07-29  1:56           ` Zheng, Lv
2013-07-29  1:56             ` Zheng, Lv
2013-07-23  8:09   ` [PATCH 07/13] ACPI/IPMI: Add reference counting for ACPI IPMI transfers Lv Zheng
2013-07-23  8:09     ` Lv Zheng
2013-07-25 22:23     ` Rafael J. Wysocki
2013-07-26  1:21       ` Zheng, Lv
2013-07-26  1:21         ` Zheng, Lv
2013-07-26 13:41         ` Rafael J. Wysocki
2013-07-26 13:41           ` Rafael J. Wysocki
2013-07-23  8:10   ` [PATCH 08/13] ACPI/IPMI: Cleanup several acpi_ipmi_device members Lv Zheng
2013-07-23  8:10     ` Lv Zheng
2013-07-25 22:25     ` Rafael J. Wysocki
2013-07-26  1:25       ` Zheng, Lv
2013-07-26  1:25         ` Zheng, Lv
2013-07-26 13:38         ` Rafael J. Wysocki
2013-07-26 13:38           ` Rafael J. Wysocki
2013-07-29  1:12           ` Zheng, Lv
2013-07-29  1:12             ` Zheng, Lv
2013-07-23  8:10   ` [PATCH 09/13] ACPI/IPMI: Cleanup some initialization codes Lv Zheng
2013-07-23  8:10     ` Lv Zheng
2013-07-23  8:10   ` [PATCH 10/13] ACPI/IPMI: Cleanup some inclusion codes Lv Zheng
2013-07-23  8:10     ` Lv Zheng
2013-07-23  8:10   ` [PATCH 11/13] ACPI/IPMI: Cleanup some Kconfig codes Lv Zheng
2013-07-23  8:10     ` Lv Zheng
2013-07-23  8:10   ` [PATCH 12/13] Testing: Add module load/unload test suite Lv Zheng
2013-07-23  8:10     ` Lv Zheng
2013-07-23  8:10   ` [PATCH 13/13] ACPI/IPMI: Add IPMI operation region test device driver Lv Zheng
2013-07-23  8:10     ` Lv Zheng
2013-09-13  5:13 ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Lv Zheng
2013-09-13  5:13   ` [PATCH v2 01/12] ACPI/IPMI: Fix atomic context requirement of ipmi_msg_handler() Lv Zheng
2013-09-13  5:13   ` [PATCH v2 02/12] ACPI/IPMI: Fix potential response buffer overflow Lv Zheng
2013-09-13  5:13   ` [PATCH v2 03/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI transfers Lv Zheng
2013-09-13  5:13   ` [PATCH v2 04/12] ACPI/IPMI: Fix race caused by the timed out " Lv Zheng
2013-09-13  5:13   ` [PATCH v2 05/12] ACPI/IPMI: Fix race caused by the unprotected ACPI IPMI user Lv Zheng
2013-09-13  5:14   ` [PATCH v2 06/12] ACPI/IPMI: Fix issue caused by the per-device registration of the IPMI operation region handler Lv Zheng
2013-09-13  5:14   ` [PATCH v2 07/12] ACPI/IPMI: Add reference counting for ACPI IPMI transfers Lv Zheng
2013-09-13  5:14   ` [PATCH v2 08/12] ACPI/IPMI: Cleanup several acpi_ipmi_device members Lv Zheng
2013-09-13  5:14   ` [PATCH v2 09/12] ACPI/IPMI: Cleanup some initialization codes Lv Zheng
2013-09-13  5:14   ` [PATCH v2 10/12] ACPI/IPMI: Cleanup some inclusion codes Lv Zheng
2013-09-13  5:14   ` [PATCH v2 11/12] ACPI/IPMI: Cleanup some Kconfig codes Lv Zheng
2013-09-13  5:15   ` [PATCH v2 12/12] ACPI/IPMI: Cleanup coding styles Lv Zheng
2013-09-25 17:54   ` [PATCH v2 00/12] ACPI/IPMI: Fix several issues in the current codes Rafael J. Wysocki
2013-09-27  7:44     ` Zheng, Lv

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.