[PATCH v6 0/4] Serialize execution environment changes for MHI

linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v6 0/4] Serialize execution environment changes for MHI
@ 2021-02-24 23:23 Bhaumik Bhatt
  2021-02-24 23:23 ` [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode Bhaumik Bhatt
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Bhaumik Bhatt @ 2021-02-24 23:23 UTC (permalink / raw)
  To: manivannan.sadhasivam
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar, Bhaumik Bhatt

v6:
-Add patch to improve debug message
-Fix switch-case fall through warning for EE serialization patch
-Address review comments and update commit text

v5:
-Update commit text for "clear devices when moving execution environments" patch
-Added test platform details that were missed out in the cover letter
-Merged two if checks in to a single one for EE serialization patch

v4:
-Addressed review comments for additional info logging for EE movements
-Updated switch case for EE handling in mhi_intvec_threaded_handler()

v3:
-Update commit text to accurately reflect changes and reasoning based on reviews

v2:
-Add patch to clear devices when moving execution environments

Note: This patch is first in series of execution environment related changes.

During full boot chain firmware download, the PM state worker downloads the AMSS
image after waiting for the SBL execution environment change in PBL mode itself.
Since getting rid of the firmware load worker thread, this design needs to
change and MHI host must download the AMSS image from the SBL mode of PM state
worker thread instead of blocking waits for SBL EE in PBL transition processing.

Ensure that EE changes are handled only from appropriate places and occur
one after another and handle only PBL or RDDM EE changes as critical events
directly from the interrupt handler and the status callback is given to the
controller drivers promptly.

When moving from SBL to AMSS EE, clear SBL specific client devices by calling
remove callbacks for them so they are not left opened in a different execution
environment.

This patchset was tested on ARM64.

Bhaumik Bhatt (4):
  bus: mhi: core: Destroy SBL devices when moving to mission mode
  bus: mhi: core: Download AMSS image from appropriate function
  bus: mhi: core: Process execution environment changes serially
  bus: mhi: core: Update debug prints to include local device state

 drivers/bus/mhi/core/boot.c     | 51 +++++++++++++--------------
 drivers/bus/mhi/core/internal.h |  1 +
 drivers/bus/mhi/core/main.c     | 76 +++++++++++++++++++++++++++--------------
 drivers/bus/mhi/core/pm.c       | 10 ++++--
 4 files changed, 83 insertions(+), 55 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode
  2021-02-24 23:23 [PATCH v6 0/4] Serialize execution environment changes for MHI Bhaumik Bhatt
@ 2021-02-24 23:23 ` Bhaumik Bhatt
  2021-02-26 22:06   ` Hemant Kumar
                     ` (2 more replies)
  2021-02-24 23:23 ` [PATCH v6 2/4] bus: mhi: core: Download AMSS image from appropriate function Bhaumik Bhatt
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 18+ messages in thread
From: Bhaumik Bhatt @ 2021-02-24 23:23 UTC (permalink / raw)
  To: manivannan.sadhasivam
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar, Bhaumik Bhatt

Currently, client devices are created in SBL or AMSS (mission
mode) and only destroyed after power down or SYS ERROR. When
moving between certain execution environments, such as from SBL
to AMSS, no clean-up is required. This presents an issue where
SBL-specific channels are left open and client drivers now run in
an execution environment where they cannot operate. Fix this by
expanding the mhi_destroy_device() to do an execution environment
specific clean-up if one is requested. Close the gap and destroy
devices in such scenarios that allow SBL client drivers to clean
up once device enters mission mode.

Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
---
 drivers/bus/mhi/core/main.c | 29 +++++++++++++++++++++++++----
 drivers/bus/mhi/core/pm.c   |  3 +++
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 4e0131b..7a2e98c 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -244,8 +244,10 @@ static void mhi_del_ring_element(struct mhi_controller *mhi_cntrl,
 
 int mhi_destroy_device(struct device *dev, void *data)
 {
+	struct mhi_chan *ul_chan, *dl_chan;
 	struct mhi_device *mhi_dev;
 	struct mhi_controller *mhi_cntrl;
+	enum mhi_ee_type ee = MHI_EE_MAX;
 
 	if (dev->bus != &mhi_bus_type)
 		return 0;
@@ -257,6 +259,17 @@ int mhi_destroy_device(struct device *dev, void *data)
 	if (mhi_dev->dev_type == MHI_DEVICE_CONTROLLER)
 		return 0;
 
+	ul_chan = mhi_dev->ul_chan;
+	dl_chan = mhi_dev->dl_chan;
+
+	/*
+	 * If execution environment is specified, remove only those devices that
+	 * started in them based on ee_mask for the channels as we move on to a
+	 * different execution environment
+	 */
+	if (data)
+		ee = *(enum mhi_ee_type *)data;
+
 	/*
 	 * For the suspend and resume case, this function will get called
 	 * without mhi_unregister_controller(). Hence, we need to drop the
@@ -264,11 +277,19 @@ int mhi_destroy_device(struct device *dev, void *data)
 	 * be sure that there will be no instances of mhi_dev left after
 	 * this.
 	 */
-	if (mhi_dev->ul_chan)
-		put_device(&mhi_dev->ul_chan->mhi_dev->dev);
+	if (ul_chan) {
+		if (ee != MHI_EE_MAX && !(ul_chan->ee_mask & BIT(ee)))
+			return 0;
 
-	if (mhi_dev->dl_chan)
-		put_device(&mhi_dev->dl_chan->mhi_dev->dev);
+		put_device(&ul_chan->mhi_dev->dev);
+	}
+
+	if (dl_chan) {
+		if (ee != MHI_EE_MAX && !(dl_chan->ee_mask & BIT(ee)))
+			return 0;
+
+		put_device(&dl_chan->mhi_dev->dev);
+	}
 
 	dev_dbg(&mhi_cntrl->mhi_dev->dev, "destroy device for chan:%s\n",
 		 mhi_dev->name);
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 681960c..3bd81d0 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -377,6 +377,7 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl)
 {
 	struct mhi_event *mhi_event;
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
+	enum mhi_ee_type current_ee = mhi_cntrl->ee;
 	int i, ret;
 
 	dev_dbg(dev, "Processing Mission Mode transition\n");
@@ -395,6 +396,8 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl)
 
 	wake_up_all(&mhi_cntrl->state_event);
 
+	device_for_each_child(&mhi_cntrl->mhi_dev->dev, &current_ee,
+			      mhi_destroy_device);
 	mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_MISSION_MODE);
 
 	/* Force MHI to be in M0 state before continuing */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 2/4] bus: mhi: core: Download AMSS image from appropriate function
  2021-02-24 23:23 [PATCH v6 0/4] Serialize execution environment changes for MHI Bhaumik Bhatt
  2021-02-24 23:23 ` [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode Bhaumik Bhatt
@ 2021-02-24 23:23 ` Bhaumik Bhatt
  2021-03-04  8:42   ` Loic Poulain
  2021-02-24 23:23 ` [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially Bhaumik Bhatt
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Bhaumik Bhatt @ 2021-02-24 23:23 UTC (permalink / raw)
  To: manivannan.sadhasivam
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar, Bhaumik Bhatt

During full boot chain firmware download, the PM state worker
downloads the AMSS image after a blocking wait for the SBL
execution environment change when running in PBL transition
itself. Improve this design by having the host download the AMSS
image from the SBL transition of PM state worker thread when a
DEV_ST_TRANSITION_SBL is queued instead of the blocking wait.

Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/bus/mhi/core/boot.c     | 51 +++++++++++++++++++----------------------
 drivers/bus/mhi/core/internal.h |  1 +
 drivers/bus/mhi/core/pm.c       |  2 ++
 3 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/drivers/bus/mhi/core/boot.c b/drivers/bus/mhi/core/boot.c
index c2546bf..08c2874 100644
--- a/drivers/bus/mhi/core/boot.c
+++ b/drivers/bus/mhi/core/boot.c
@@ -389,7 +389,6 @@ static void mhi_firmware_copy(struct mhi_controller *mhi_cntrl,
 void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
 {
 	const struct firmware *firmware = NULL;
-	struct image_info *image_info;
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
 	const char *fw_name;
 	void *buf;
@@ -491,44 +490,42 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
 fw_load_ee_pthru:
 	/* Transitioning into MHI RESET->READY state */
 	ret = mhi_ready_state_transition(mhi_cntrl);
-
-	if (!mhi_cntrl->fbc_download)
-		return;
-
 	if (ret) {
 		dev_err(dev, "MHI did not enter READY state\n");
 		goto error_ready_state;
 	}
 
-	/* Wait for the SBL event */
-	ret = wait_event_timeout(mhi_cntrl->state_event,
-				 mhi_cntrl->ee == MHI_EE_SBL ||
-				 MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
-				 msecs_to_jiffies(mhi_cntrl->timeout_ms));
+	dev_info(dev, "Wait for device to enter SBL or Mission mode\n");
+	return;
 
-	if (!ret || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state)) {
-		dev_err(dev, "MHI did not enter SBL\n");
-		goto error_ready_state;
+error_ready_state:
+	if (mhi_cntrl->fbc_download) {
+		mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image);
+		mhi_cntrl->fbc_image = NULL;
 	}
 
-	/* Start full firmware image download */
-	image_info = mhi_cntrl->fbc_image;
+error_fw_load:
+	mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
+	wake_up_all(&mhi_cntrl->state_event);
+}
+
+int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
+{
+	struct image_info *image_info = mhi_cntrl->fbc_image;
+	struct device *dev = &mhi_cntrl->mhi_dev->dev;
+	int ret;
+
+	if (!image_info)
+		return -EIO;
+
 	ret = mhi_fw_load_bhie(mhi_cntrl,
 			       /* Vector table is the last entry */
 			       &image_info->mhi_buf[image_info->entries - 1]);
 	if (ret) {
-		dev_err(dev, "MHI did not load image over BHIe, ret: %d\n",
-			ret);
-		goto error_fw_load;
+		dev_err(dev, "MHI did not load AMSS, ret:%d\n", ret);
+		mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
+		wake_up_all(&mhi_cntrl->state_event);
 	}
 
-	return;
-
-error_ready_state:
-	mhi_free_bhie_table(mhi_cntrl, mhi_cntrl->fbc_image);
-	mhi_cntrl->fbc_image = NULL;
-
-error_fw_load:
-	mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
-	wake_up_all(&mhi_cntrl->state_event);
+	return ret;
 }
diff --git a/drivers/bus/mhi/core/internal.h b/drivers/bus/mhi/core/internal.h
index 6f80ec3..6f37439 100644
--- a/drivers/bus/mhi/core/internal.h
+++ b/drivers/bus/mhi/core/internal.h
@@ -619,6 +619,7 @@ int mhi_pm_m3_transition(struct mhi_controller *mhi_cntrl);
 int __mhi_device_get_sync(struct mhi_controller *mhi_cntrl);
 int mhi_send_cmd(struct mhi_controller *mhi_cntrl, struct mhi_chan *mhi_chan,
 		 enum mhi_cmd_type cmd);
+int mhi_download_amss_image(struct mhi_controller *mhi_cntrl);
 static inline bool mhi_is_active(struct mhi_controller *mhi_cntrl)
 {
 	return (mhi_cntrl->dev_state >= MHI_STATE_M0 &&
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 3bd81d0..c09ec13 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -758,6 +758,8 @@ void mhi_pm_st_worker(struct work_struct *work)
 			 * either SBL or AMSS states
 			 */
 			mhi_create_devices(mhi_cntrl);
+			if (mhi_cntrl->fbc_download)
+				mhi_download_amss_image(mhi_cntrl);
 			break;
 		case DEV_ST_TRANSITION_MISSION_MODE:
 			mhi_pm_mission_mode_transition(mhi_cntrl);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially
  2021-02-24 23:23 [PATCH v6 0/4] Serialize execution environment changes for MHI Bhaumik Bhatt
  2021-02-24 23:23 ` [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode Bhaumik Bhatt
  2021-02-24 23:23 ` [PATCH v6 2/4] bus: mhi: core: Download AMSS image from appropriate function Bhaumik Bhatt
@ 2021-02-24 23:23 ` Bhaumik Bhatt
  2021-03-04  8:43   ` Loic Poulain
                     ` (2 more replies)
  2021-02-24 23:23 ` [PATCH v6 4/4] bus: mhi: core: Update debug prints to include local device state Bhaumik Bhatt
                   ` (2 subsequent siblings)
  5 siblings, 3 replies; 18+ messages in thread
From: Bhaumik Bhatt @ 2021-02-24 23:23 UTC (permalink / raw)
  To: manivannan.sadhasivam
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar, Bhaumik Bhatt

In current design, whenever the BHI interrupt is fired, the
execution environment is updated. This can cause race conditions
and impede ongoing power up/down processing. For example, if a
power down is in progress, MHI host updates to a local "disabled"
execution environment. If a BHI interrupt fires later, that value
gets replaced with one from the BHI EE register. This impacts the
controller as it does not expect multiple RDDM execution
environment change status callbacks as an example. Another issue
would be that the device can enter mission mode and the execution
environment is updated, while device creation for SBL channels is
still going on due to slower PM state worker thread run, leading
to multiple attempts at opening the same channel.

Ensure that EE changes are handled only from appropriate places
and occur one after another and handle only PBL modes or RDDM EE
changes as critical events directly from the interrupt handler.
Simplify handling by waiting for SYS ERROR before handling RDDM.
This also makes sure that we use the correct execution environment
to notify the controller driver when the device resets to one of
the PBL execution environments.

Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
---
 drivers/bus/mhi/core/main.c | 40 +++++++++++++++++++++-------------------
 drivers/bus/mhi/core/pm.c   |  7 ++++---
 2 files changed, 25 insertions(+), 22 deletions(-)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 7a2e98c..9715f51 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -430,7 +430,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
 	enum mhi_state state = MHI_STATE_MAX;
 	enum mhi_pm_state pm_state = 0;
-	enum mhi_ee_type ee = 0;
+	enum mhi_ee_type ee = MHI_EE_MAX;
 
 	write_lock_irq(&mhi_cntrl->pm_lock);
 	if (!MHI_REG_ACCESS_VALID(mhi_cntrl->pm_state)) {
@@ -439,8 +439,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
 	}
 
 	state = mhi_get_mhi_state(mhi_cntrl);
-	ee = mhi_cntrl->ee;
-	mhi_cntrl->ee = mhi_get_exec_env(mhi_cntrl);
+	ee = mhi_get_exec_env(mhi_cntrl);
 	dev_dbg(dev, "local ee:%s device ee:%s dev_state:%s\n",
 		TO_MHI_EXEC_STR(mhi_cntrl->ee), TO_MHI_EXEC_STR(ee),
 		TO_MHI_STATE_STR(state));
@@ -452,27 +451,30 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
 	}
 	write_unlock_irq(&mhi_cntrl->pm_lock);
 
-	 /* If device supports RDDM don't bother processing SYS error */
-	if (mhi_cntrl->rddm_image) {
-		/* host may be performing a device power down already */
-		if (!mhi_is_active(mhi_cntrl))
-			goto exit_intvec;
+	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
+		goto exit_intvec;
 
-		if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {
+	switch (ee) {
+	case MHI_EE_RDDM:
+		/* proceed if power down is not already in progress */
+		if (mhi_cntrl->rddm_image && mhi_is_active(mhi_cntrl)) {
 			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
+			mhi_cntrl->ee = ee;
 			wake_up_all(&mhi_cntrl->state_event);
 		}
-		goto exit_intvec;
-	}
-
-	if (pm_state == MHI_PM_SYS_ERR_DETECT) {
+		break;
+	case MHI_EE_PBL:
+	case MHI_EE_EDL:
+	case MHI_EE_PTHRU:
+		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
+		mhi_cntrl->ee = ee;
 		wake_up_all(&mhi_cntrl->state_event);
-
-		/* For fatal errors, we let controller decide next step */
-		if (MHI_IN_PBL(ee))
-			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
-		else
-			mhi_pm_sys_err_handler(mhi_cntrl);
+		mhi_pm_sys_err_handler(mhi_cntrl);
+		break;
+	default:
+		wake_up_all(&mhi_cntrl->state_event);
+		mhi_pm_sys_err_handler(mhi_cntrl);
+		break;
 	}
 
 exit_intvec:
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index c09ec13..c870fa8 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -377,21 +377,22 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl)
 {
 	struct mhi_event *mhi_event;
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
-	enum mhi_ee_type current_ee = mhi_cntrl->ee;
+	enum mhi_ee_type ee = MHI_EE_MAX, current_ee = mhi_cntrl->ee;
 	int i, ret;
 
 	dev_dbg(dev, "Processing Mission Mode transition\n");
 
 	write_lock_irq(&mhi_cntrl->pm_lock);
 	if (MHI_REG_ACCESS_VALID(mhi_cntrl->pm_state))
-		mhi_cntrl->ee = mhi_get_exec_env(mhi_cntrl);
+		ee = mhi_get_exec_env(mhi_cntrl);
 
-	if (!MHI_IN_MISSION_MODE(mhi_cntrl->ee)) {
+	if (!MHI_IN_MISSION_MODE(ee)) {
 		mhi_cntrl->pm_state = MHI_PM_LD_ERR_FATAL_DETECT;
 		write_unlock_irq(&mhi_cntrl->pm_lock);
 		wake_up_all(&mhi_cntrl->state_event);
 		return -EIO;
 	}
+	mhi_cntrl->ee = ee;
 	write_unlock_irq(&mhi_cntrl->pm_lock);
 
 	wake_up_all(&mhi_cntrl->state_event);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v6 4/4] bus: mhi: core: Update debug prints to include local device state
  2021-02-24 23:23 [PATCH v6 0/4] Serialize execution environment changes for MHI Bhaumik Bhatt
                   ` (2 preceding siblings ...)
  2021-02-24 23:23 ` [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially Bhaumik Bhatt
@ 2021-02-24 23:23 ` Bhaumik Bhatt
  2021-03-04  8:45   ` Loic Poulain
  2021-03-10 14:07 ` [PATCH v6 0/4] Serialize execution environment changes for MHI Manivannan Sadhasivam
  2021-05-26 19:03 ` patchwork-bot+linux-arm-msm
  5 siblings, 1 reply; 18+ messages in thread
From: Bhaumik Bhatt @ 2021-02-24 23:23 UTC (permalink / raw)
  To: manivannan.sadhasivam
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar, Bhaumik Bhatt

Update debug prints to include local device in the BHI interrupt
handler. This helps show transitions better between MHI states.

Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---
 drivers/bus/mhi/core/main.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 9715f51..4540f9e 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -440,9 +440,10 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
 
 	state = mhi_get_mhi_state(mhi_cntrl);
 	ee = mhi_get_exec_env(mhi_cntrl);
-	dev_dbg(dev, "local ee:%s device ee:%s dev_state:%s\n",
-		TO_MHI_EXEC_STR(mhi_cntrl->ee), TO_MHI_EXEC_STR(ee),
-		TO_MHI_STATE_STR(state));
+	dev_dbg(dev, "local ee: %s state: %s device ee: %s state: %s\n",
+		TO_MHI_EXEC_STR(mhi_cntrl->ee),
+		TO_MHI_STATE_STR(mhi_cntrl->dev_state),
+		TO_MHI_EXEC_STR(ee), TO_MHI_STATE_STR(state));
 
 	if (state == MHI_STATE_SYS_ERR) {
 		dev_dbg(dev, "System error detected\n");
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode
  2021-02-24 23:23 ` [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode Bhaumik Bhatt
@ 2021-02-26 22:06   ` Hemant Kumar
  2021-03-04  8:41   ` Loic Poulain
  2021-03-10 14:03   ` Manivannan Sadhasivam
  2 siblings, 0 replies; 18+ messages in thread
From: Hemant Kumar @ 2021-02-26 22:06 UTC (permalink / raw)
  To: Bhaumik Bhatt, manivannan.sadhasivam
  Cc: linux-arm-msm, jhugo, linux-kernel, loic.poulain, carl.yin, naveen.kumar



On 2/24/21 3:23 PM, Bhaumik Bhatt wrote:
> Currently, client devices are created in SBL or AMSS (mission
> mode) and only destroyed after power down or SYS ERROR. When
> moving between certain execution environments, such as from SBL
> to AMSS, no clean-up is required. This presents an issue where
> SBL-specific channels are left open and client drivers now run in
> an execution environment where they cannot operate. Fix this by
> expanding the mhi_destroy_device() to do an execution environment
> specific clean-up if one is requested. Close the gap and destroy
> devices in such scenarios that allow SBL client drivers to clean
> up once device enters mission mode.
> 
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
It make sense to clean up previous execution env related resources.

Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode
  2021-02-24 23:23 ` [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode Bhaumik Bhatt
  2021-02-26 22:06   ` Hemant Kumar
@ 2021-03-04  8:41   ` Loic Poulain
  2021-03-10 14:03   ` Manivannan Sadhasivam
  2 siblings, 0 replies; 18+ messages in thread
From: Loic Poulain @ 2021-03-04  8:41 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: Manivannan Sadhasivam, linux-arm-msm, Hemant Kumar, Jeffrey Hugo,
	open list, Carl Yin(殷张成),
	Naveen Kumar

On Thu, 25 Feb 2021 at 00:23, Bhaumik Bhatt <bbhatt@codeaurora.org> wrote:
>
> Currently, client devices are created in SBL or AMSS (mission
> mode) and only destroyed after power down or SYS ERROR. When
> moving between certain execution environments, such as from SBL
> to AMSS, no clean-up is required. This presents an issue where
> SBL-specific channels are left open and client drivers now run in
> an execution environment where they cannot operate. Fix this by
> expanding the mhi_destroy_device() to do an execution environment
> specific clean-up if one is requested. Close the gap and destroy
> devices in such scenarios that allow SBL client drivers to clean
> up once device enters mission mode.
>
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 2/4] bus: mhi: core: Download AMSS image from appropriate function
  2021-02-24 23:23 ` [PATCH v6 2/4] bus: mhi: core: Download AMSS image from appropriate function Bhaumik Bhatt
@ 2021-03-04  8:42   ` Loic Poulain
  0 siblings, 0 replies; 18+ messages in thread
From: Loic Poulain @ 2021-03-04  8:42 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: Manivannan Sadhasivam, linux-arm-msm, Hemant Kumar, Jeffrey Hugo,
	open list, Carl Yin(殷张成),
	Naveen Kumar

On Thu, 25 Feb 2021 at 00:23, Bhaumik Bhatt <bbhatt@codeaurora.org> wrote:
>
> During full boot chain firmware download, the PM state worker
> downloads the AMSS image after a blocking wait for the SBL
> execution environment change when running in PBL transition
> itself. Improve this design by having the host download the AMSS
> image from the SBL transition of PM state worker thread when a
> DEV_ST_TRANSITION_SBL is queued instead of the blocking wait.
>
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially
  2021-02-24 23:23 ` [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially Bhaumik Bhatt
@ 2021-03-04  8:43   ` Loic Poulain
  2021-03-10 14:04   ` Manivannan Sadhasivam
  2021-08-23 18:43   ` Jeffrey Hugo
  2 siblings, 0 replies; 18+ messages in thread
From: Loic Poulain @ 2021-03-04  8:43 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: Manivannan Sadhasivam, linux-arm-msm, Hemant Kumar, Jeffrey Hugo,
	open list, Carl Yin(殷张成),
	Naveen Kumar

On Thu, 25 Feb 2021 at 00:23, Bhaumik Bhatt <bbhatt@codeaurora.org> wrote:
>
> In current design, whenever the BHI interrupt is fired, the
> execution environment is updated. This can cause race conditions
> and impede ongoing power up/down processing. For example, if a
> power down is in progress, MHI host updates to a local "disabled"
> execution environment. If a BHI interrupt fires later, that value
> gets replaced with one from the BHI EE register. This impacts the
> controller as it does not expect multiple RDDM execution
> environment change status callbacks as an example. Another issue
> would be that the device can enter mission mode and the execution
> environment is updated, while device creation for SBL channels is
> still going on due to slower PM state worker thread run, leading
> to multiple attempts at opening the same channel.
>
> Ensure that EE changes are handled only from appropriate places
> and occur one after another and handle only PBL modes or RDDM EE
> changes as critical events directly from the interrupt handler.
> Simplify handling by waiting for SYS ERROR before handling RDDM.
> This also makes sure that we use the correct execution environment
> to notify the controller driver when the device resets to one of
> the PBL execution environments.
>
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

Looking good now

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 4/4] bus: mhi: core: Update debug prints to include local device state
  2021-02-24 23:23 ` [PATCH v6 4/4] bus: mhi: core: Update debug prints to include local device state Bhaumik Bhatt
@ 2021-03-04  8:45   ` Loic Poulain
  0 siblings, 0 replies; 18+ messages in thread
From: Loic Poulain @ 2021-03-04  8:45 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: Manivannan Sadhasivam, linux-arm-msm, Hemant Kumar, Jeffrey Hugo,
	open list, Carl Yin(殷张成),
	Naveen Kumar

On Thu, 25 Feb 2021 at 00:23, Bhaumik Bhatt <bbhatt@codeaurora.org> wrote:
>
> Update debug prints to include local device in the BHI interrupt
> handler. This helps show transitions better between MHI states.
>
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Reviewed-by: Loic Poulain <loic.poulain@linaro.org>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode
  2021-02-24 23:23 ` [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode Bhaumik Bhatt
  2021-02-26 22:06   ` Hemant Kumar
  2021-03-04  8:41   ` Loic Poulain
@ 2021-03-10 14:03   ` Manivannan Sadhasivam
  2 siblings, 0 replies; 18+ messages in thread
From: Manivannan Sadhasivam @ 2021-03-10 14:03 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar

On Wed, Feb 24, 2021 at 03:23:02PM -0800, Bhaumik Bhatt wrote:
> Currently, client devices are created in SBL or AMSS (mission
> mode) and only destroyed after power down or SYS ERROR. When
> moving between certain execution environments, such as from SBL
> to AMSS, no clean-up is required. This presents an issue where
> SBL-specific channels are left open and client drivers now run in
> an execution environment where they cannot operate. Fix this by
> expanding the mhi_destroy_device() to do an execution environment
> specific clean-up if one is requested. Close the gap and destroy
> devices in such scenarios that allow SBL client drivers to clean
> up once device enters mission mode.
> 
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Thanks,
Mani

> ---
>  drivers/bus/mhi/core/main.c | 29 +++++++++++++++++++++++++----
>  drivers/bus/mhi/core/pm.c   |  3 +++
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
> index 4e0131b..7a2e98c 100644
> --- a/drivers/bus/mhi/core/main.c
> +++ b/drivers/bus/mhi/core/main.c
> @@ -244,8 +244,10 @@ static void mhi_del_ring_element(struct mhi_controller *mhi_cntrl,
>  
>  int mhi_destroy_device(struct device *dev, void *data)
>  {
> +	struct mhi_chan *ul_chan, *dl_chan;
>  	struct mhi_device *mhi_dev;
>  	struct mhi_controller *mhi_cntrl;
> +	enum mhi_ee_type ee = MHI_EE_MAX;
>  
>  	if (dev->bus != &mhi_bus_type)
>  		return 0;
> @@ -257,6 +259,17 @@ int mhi_destroy_device(struct device *dev, void *data)
>  	if (mhi_dev->dev_type == MHI_DEVICE_CONTROLLER)
>  		return 0;
>  
> +	ul_chan = mhi_dev->ul_chan;
> +	dl_chan = mhi_dev->dl_chan;
> +
> +	/*
> +	 * If execution environment is specified, remove only those devices that
> +	 * started in them based on ee_mask for the channels as we move on to a
> +	 * different execution environment
> +	 */
> +	if (data)
> +		ee = *(enum mhi_ee_type *)data;
> +
>  	/*
>  	 * For the suspend and resume case, this function will get called
>  	 * without mhi_unregister_controller(). Hence, we need to drop the
> @@ -264,11 +277,19 @@ int mhi_destroy_device(struct device *dev, void *data)
>  	 * be sure that there will be no instances of mhi_dev left after
>  	 * this.
>  	 */
> -	if (mhi_dev->ul_chan)
> -		put_device(&mhi_dev->ul_chan->mhi_dev->dev);
> +	if (ul_chan) {
> +		if (ee != MHI_EE_MAX && !(ul_chan->ee_mask & BIT(ee)))
> +			return 0;
>  
> -	if (mhi_dev->dl_chan)
> -		put_device(&mhi_dev->dl_chan->mhi_dev->dev);
> +		put_device(&ul_chan->mhi_dev->dev);
> +	}
> +
> +	if (dl_chan) {
> +		if (ee != MHI_EE_MAX && !(dl_chan->ee_mask & BIT(ee)))
> +			return 0;
> +
> +		put_device(&dl_chan->mhi_dev->dev);
> +	}
>  
>  	dev_dbg(&mhi_cntrl->mhi_dev->dev, "destroy device for chan:%s\n",
>  		 mhi_dev->name);
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index 681960c..3bd81d0 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -377,6 +377,7 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl)
>  {
>  	struct mhi_event *mhi_event;
>  	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> +	enum mhi_ee_type current_ee = mhi_cntrl->ee;
>  	int i, ret;
>  
>  	dev_dbg(dev, "Processing Mission Mode transition\n");
> @@ -395,6 +396,8 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl)
>  
>  	wake_up_all(&mhi_cntrl->state_event);
>  
> +	device_for_each_child(&mhi_cntrl->mhi_dev->dev, &current_ee,
> +			      mhi_destroy_device);
>  	mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_MISSION_MODE);
>  
>  	/* Force MHI to be in M0 state before continuing */
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially
  2021-02-24 23:23 ` [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially Bhaumik Bhatt
  2021-03-04  8:43   ` Loic Poulain
@ 2021-03-10 14:04   ` Manivannan Sadhasivam
  2021-08-23 18:43   ` Jeffrey Hugo
  2 siblings, 0 replies; 18+ messages in thread
From: Manivannan Sadhasivam @ 2021-03-10 14:04 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar

On Wed, Feb 24, 2021 at 03:23:04PM -0800, Bhaumik Bhatt wrote:
> In current design, whenever the BHI interrupt is fired, the
> execution environment is updated. This can cause race conditions
> and impede ongoing power up/down processing. For example, if a
> power down is in progress, MHI host updates to a local "disabled"
> execution environment. If a BHI interrupt fires later, that value
> gets replaced with one from the BHI EE register. This impacts the
> controller as it does not expect multiple RDDM execution
> environment change status callbacks as an example. Another issue
> would be that the device can enter mission mode and the execution
> environment is updated, while device creation for SBL channels is
> still going on due to slower PM state worker thread run, leading
> to multiple attempts at opening the same channel.
> 
> Ensure that EE changes are handled only from appropriate places
> and occur one after another and handle only PBL modes or RDDM EE
> changes as critical events directly from the interrupt handler.
> Simplify handling by waiting for SYS ERROR before handling RDDM.
> This also makes sure that we use the correct execution environment
> to notify the controller driver when the device resets to one of
> the PBL execution environments.
> 
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>

Thanks,
Mani

> ---
>  drivers/bus/mhi/core/main.c | 40 +++++++++++++++++++++-------------------
>  drivers/bus/mhi/core/pm.c   |  7 ++++---
>  2 files changed, 25 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
> index 7a2e98c..9715f51 100644
> --- a/drivers/bus/mhi/core/main.c
> +++ b/drivers/bus/mhi/core/main.c
> @@ -430,7 +430,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
>  	struct device *dev = &mhi_cntrl->mhi_dev->dev;
>  	enum mhi_state state = MHI_STATE_MAX;
>  	enum mhi_pm_state pm_state = 0;
> -	enum mhi_ee_type ee = 0;
> +	enum mhi_ee_type ee = MHI_EE_MAX;
>  
>  	write_lock_irq(&mhi_cntrl->pm_lock);
>  	if (!MHI_REG_ACCESS_VALID(mhi_cntrl->pm_state)) {
> @@ -439,8 +439,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
>  	}
>  
>  	state = mhi_get_mhi_state(mhi_cntrl);
> -	ee = mhi_cntrl->ee;
> -	mhi_cntrl->ee = mhi_get_exec_env(mhi_cntrl);
> +	ee = mhi_get_exec_env(mhi_cntrl);
>  	dev_dbg(dev, "local ee:%s device ee:%s dev_state:%s\n",
>  		TO_MHI_EXEC_STR(mhi_cntrl->ee), TO_MHI_EXEC_STR(ee),
>  		TO_MHI_STATE_STR(state));
> @@ -452,27 +451,30 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
>  	}
>  	write_unlock_irq(&mhi_cntrl->pm_lock);
>  
> -	 /* If device supports RDDM don't bother processing SYS error */
> -	if (mhi_cntrl->rddm_image) {
> -		/* host may be performing a device power down already */
> -		if (!mhi_is_active(mhi_cntrl))
> -			goto exit_intvec;
> +	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
> +		goto exit_intvec;
>  
> -		if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {
> +	switch (ee) {
> +	case MHI_EE_RDDM:
> +		/* proceed if power down is not already in progress */
> +		if (mhi_cntrl->rddm_image && mhi_is_active(mhi_cntrl)) {
>  			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
> +			mhi_cntrl->ee = ee;
>  			wake_up_all(&mhi_cntrl->state_event);
>  		}
> -		goto exit_intvec;
> -	}
> -
> -	if (pm_state == MHI_PM_SYS_ERR_DETECT) {
> +		break;
> +	case MHI_EE_PBL:
> +	case MHI_EE_EDL:
> +	case MHI_EE_PTHRU:
> +		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
> +		mhi_cntrl->ee = ee;
>  		wake_up_all(&mhi_cntrl->state_event);
> -
> -		/* For fatal errors, we let controller decide next step */
> -		if (MHI_IN_PBL(ee))
> -			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
> -		else
> -			mhi_pm_sys_err_handler(mhi_cntrl);
> +		mhi_pm_sys_err_handler(mhi_cntrl);
> +		break;
> +	default:
> +		wake_up_all(&mhi_cntrl->state_event);
> +		mhi_pm_sys_err_handler(mhi_cntrl);
> +		break;
>  	}
>  
>  exit_intvec:
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index c09ec13..c870fa8 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -377,21 +377,22 @@ static int mhi_pm_mission_mode_transition(struct mhi_controller *mhi_cntrl)
>  {
>  	struct mhi_event *mhi_event;
>  	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> -	enum mhi_ee_type current_ee = mhi_cntrl->ee;
> +	enum mhi_ee_type ee = MHI_EE_MAX, current_ee = mhi_cntrl->ee;
>  	int i, ret;
>  
>  	dev_dbg(dev, "Processing Mission Mode transition\n");
>  
>  	write_lock_irq(&mhi_cntrl->pm_lock);
>  	if (MHI_REG_ACCESS_VALID(mhi_cntrl->pm_state))
> -		mhi_cntrl->ee = mhi_get_exec_env(mhi_cntrl);
> +		ee = mhi_get_exec_env(mhi_cntrl);
>  
> -	if (!MHI_IN_MISSION_MODE(mhi_cntrl->ee)) {
> +	if (!MHI_IN_MISSION_MODE(ee)) {
>  		mhi_cntrl->pm_state = MHI_PM_LD_ERR_FATAL_DETECT;
>  		write_unlock_irq(&mhi_cntrl->pm_lock);
>  		wake_up_all(&mhi_cntrl->state_event);
>  		return -EIO;
>  	}
> +	mhi_cntrl->ee = ee;
>  	write_unlock_irq(&mhi_cntrl->pm_lock);
>  
>  	wake_up_all(&mhi_cntrl->state_event);
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/4] Serialize execution environment changes for MHI
  2021-02-24 23:23 [PATCH v6 0/4] Serialize execution environment changes for MHI Bhaumik Bhatt
                   ` (3 preceding siblings ...)
  2021-02-24 23:23 ` [PATCH v6 4/4] bus: mhi: core: Update debug prints to include local device state Bhaumik Bhatt
@ 2021-03-10 14:07 ` Manivannan Sadhasivam
  2021-05-26 19:03 ` patchwork-bot+linux-arm-msm
  5 siblings, 0 replies; 18+ messages in thread
From: Manivannan Sadhasivam @ 2021-03-10 14:07 UTC (permalink / raw)
  To: Bhaumik Bhatt
  Cc: linux-arm-msm, hemantk, jhugo, linux-kernel, loic.poulain,
	carl.yin, naveen.kumar

On Wed, Feb 24, 2021 at 03:23:01PM -0800, Bhaumik Bhatt wrote:
> v6:
> -Add patch to improve debug message
> -Fix switch-case fall through warning for EE serialization patch
> -Address review comments and update commit text
> 
> v5:
> -Update commit text for "clear devices when moving execution environments" patch
> -Added test platform details that were missed out in the cover letter
> -Merged two if checks in to a single one for EE serialization patch
> 
> v4:
> -Addressed review comments for additional info logging for EE movements
> -Updated switch case for EE handling in mhi_intvec_threaded_handler()
> 
> v3:
> -Update commit text to accurately reflect changes and reasoning based on reviews
> 
> v2:
> -Add patch to clear devices when moving execution environments
> 
> Note: This patch is first in series of execution environment related changes.
> 
> During full boot chain firmware download, the PM state worker downloads the AMSS
> image after waiting for the SBL execution environment change in PBL mode itself.
> Since getting rid of the firmware load worker thread, this design needs to
> change and MHI host must download the AMSS image from the SBL mode of PM state
> worker thread instead of blocking waits for SBL EE in PBL transition processing.
> 
> Ensure that EE changes are handled only from appropriate places and occur
> one after another and handle only PBL or RDDM EE changes as critical events
> directly from the interrupt handler and the status callback is given to the
> controller drivers promptly.
> 
> When moving from SBL to AMSS EE, clear SBL specific client devices by calling
> remove callbacks for them so they are not left opened in a different execution
> environment.
> 
> This patchset was tested on ARM64.
> 

Series applied to mhi-next!

Thanks,
Mani

> Bhaumik Bhatt (4):
>   bus: mhi: core: Destroy SBL devices when moving to mission mode
>   bus: mhi: core: Download AMSS image from appropriate function
>   bus: mhi: core: Process execution environment changes serially
>   bus: mhi: core: Update debug prints to include local device state
> 
>  drivers/bus/mhi/core/boot.c     | 51 +++++++++++++--------------
>  drivers/bus/mhi/core/internal.h |  1 +
>  drivers/bus/mhi/core/main.c     | 76 +++++++++++++++++++++++++++--------------
>  drivers/bus/mhi/core/pm.c       | 10 ++++--
>  4 files changed, 83 insertions(+), 55 deletions(-)
> 
> -- 
> The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 0/4] Serialize execution environment changes for MHI
  2021-02-24 23:23 [PATCH v6 0/4] Serialize execution environment changes for MHI Bhaumik Bhatt
                   ` (4 preceding siblings ...)
  2021-03-10 14:07 ` [PATCH v6 0/4] Serialize execution environment changes for MHI Manivannan Sadhasivam
@ 2021-05-26 19:03 ` patchwork-bot+linux-arm-msm
  5 siblings, 0 replies; 18+ messages in thread
From: patchwork-bot+linux-arm-msm @ 2021-05-26 19:03 UTC (permalink / raw)
  To: Bhaumik Bhatt; +Cc: linux-arm-msm

Hello:

This series was applied to qcom/linux.git (refs/heads/for-next):

On Wed, 24 Feb 2021 15:23:01 -0800 you wrote:
> v6:
> -Add patch to improve debug message
> -Fix switch-case fall through warning for EE serialization patch
> -Address review comments and update commit text
> 
> v5:
> -Update commit text for "clear devices when moving execution environments" patch
> -Added test platform details that were missed out in the cover letter
> -Merged two if checks in to a single one for EE serialization patch
> 
> [...]

Here is the summary with links:
  - [v6,1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode
    https://git.kernel.org/qcom/c/925089c1900f
  - [v6,2/4] bus: mhi: core: Download AMSS image from appropriate function
    https://git.kernel.org/qcom/c/4884362f6977
  - [v6,3/4] bus: mhi: core: Process execution environment changes serially
    https://git.kernel.org/qcom/c/ef2126c4e2ea
  - [v6,4/4] bus: mhi: core: Update debug prints to include local device state
    https://git.kernel.org/qcom/c/aaca4233ea03

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially
  2021-02-24 23:23 ` [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially Bhaumik Bhatt
  2021-03-04  8:43   ` Loic Poulain
  2021-03-10 14:04   ` Manivannan Sadhasivam
@ 2021-08-23 18:43   ` Jeffrey Hugo
  2021-08-23 19:19     ` Bhaumik Bhatt
  2 siblings, 1 reply; 18+ messages in thread
From: Jeffrey Hugo @ 2021-08-23 18:43 UTC (permalink / raw)
  To: Bhaumik Bhatt, manivannan.sadhasivam
  Cc: linux-arm-msm, hemantk, linux-kernel, loic.poulain, carl.yin,
	naveen.kumar

On 2/24/2021 4:23 PM, Bhaumik Bhatt wrote:
> In current design, whenever the BHI interrupt is fired, the
> execution environment is updated. This can cause race conditions
> and impede ongoing power up/down processing. For example, if a
> power down is in progress, MHI host updates to a local "disabled"
> execution environment. If a BHI interrupt fires later, that value
> gets replaced with one from the BHI EE register. This impacts the
> controller as it does not expect multiple RDDM execution
> environment change status callbacks as an example. Another issue
> would be that the device can enter mission mode and the execution
> environment is updated, while device creation for SBL channels is
> still going on due to slower PM state worker thread run, leading
> to multiple attempts at opening the same channel.
> 
> Ensure that EE changes are handled only from appropriate places
> and occur one after another and handle only PBL modes or RDDM EE
> changes as critical events directly from the interrupt handler.
> Simplify handling by waiting for SYS ERROR before handling RDDM.
> This also makes sure that we use the correct execution environment
> to notify the controller driver when the device resets to one of
> the PBL execution environments.
> 
> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>

<snip>

> @@ -452,27 +451,30 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
>   	}
>   	write_unlock_irq(&mhi_cntrl->pm_lock);
>   
> -	 /* If device supports RDDM don't bother processing SYS error */
> -	if (mhi_cntrl->rddm_image) {
> -		/* host may be performing a device power down already */
> -		if (!mhi_is_active(mhi_cntrl))
> -			goto exit_intvec;
> +	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
> +		goto exit_intvec;
>   
> -		if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {
> +	switch (ee) {
> +	case MHI_EE_RDDM:
> +		/* proceed if power down is not already in progress */
> +		if (mhi_cntrl->rddm_image && mhi_is_active(mhi_cntrl)) {
>   			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
> +			mhi_cntrl->ee = ee;
>   			wake_up_all(&mhi_cntrl->state_event);
>   		}
> -		goto exit_intvec;
> -	}
> -
> -	if (pm_state == MHI_PM_SYS_ERR_DETECT) {
> +		break;
> +	case MHI_EE_PBL:
> +	case MHI_EE_EDL:
> +	case MHI_EE_PTHRU:
> +		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
> +		mhi_cntrl->ee = ee;
>   		wake_up_all(&mhi_cntrl->state_event);
> -
> -		/* For fatal errors, we let controller decide next step */
> -		if (MHI_IN_PBL(ee))
> -			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
> -		else
> -			mhi_pm_sys_err_handler(mhi_cntrl);
> +		mhi_pm_sys_err_handler(mhi_cntrl);
> +		break;
> +	default:
> +		wake_up_all(&mhi_cntrl->state_event);
> +		mhi_pm_sys_err_handler(mhi_cntrl);
> +		break;
>   	}

Bhaumik, can you explain the above change?  Before this patch (which is 
now committed), if there was a fatal error, the controller was notified 
(MHI_CB_FATAL_ERROR) and it decided all action.  After this patch, the 
controller is notified, but also the core attempts to handle the syserr.

This is a change in behavior, and seems to make a mess of the 
controller, and possibly the core fighting each other.

Specifically, I'm rebasing the AIC100 driver onto 5.13, which has this 
change, and I'm seeing a serious regression.  I'm thinking that for the 
PBL/EDL/PTHRU case, mhi_pm_sys_err_handler() should not be called.

Thoughts?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially
  2021-08-23 18:43   ` Jeffrey Hugo
@ 2021-08-23 19:19     ` Bhaumik Bhatt
  2021-08-23 19:38       ` Jeffrey Hugo
  0 siblings, 1 reply; 18+ messages in thread
From: Bhaumik Bhatt @ 2021-08-23 19:19 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: manivannan.sadhasivam, linux-arm-msm, hemantk, linux-kernel,
	loic.poulain, carl.yin, naveen.kumar

On 2021-08-23 11:43 AM, Jeffrey Hugo wrote:
> On 2/24/2021 4:23 PM, Bhaumik Bhatt wrote:
>> In current design, whenever the BHI interrupt is fired, the
>> execution environment is updated. This can cause race conditions
>> and impede ongoing power up/down processing. For example, if a
>> power down is in progress, MHI host updates to a local "disabled"
>> execution environment. If a BHI interrupt fires later, that value
>> gets replaced with one from the BHI EE register. This impacts the
>> controller as it does not expect multiple RDDM execution
>> environment change status callbacks as an example. Another issue
>> would be that the device can enter mission mode and the execution
>> environment is updated, while device creation for SBL channels is
>> still going on due to slower PM state worker thread run, leading
>> to multiple attempts at opening the same channel.
>> 
>> Ensure that EE changes are handled only from appropriate places
>> and occur one after another and handle only PBL modes or RDDM EE
>> changes as critical events directly from the interrupt handler.
>> Simplify handling by waiting for SYS ERROR before handling RDDM.
>> This also makes sure that we use the correct execution environment
>> to notify the controller driver when the device resets to one of
>> the PBL execution environments.
>> 
>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> 
> <snip>
> 
>> @@ -452,27 +451,30 @@ irqreturn_t mhi_intvec_threaded_handler(int 
>> irq_number, void *priv)
>>   	}
>>   	write_unlock_irq(&mhi_cntrl->pm_lock);
>>   -	 /* If device supports RDDM don't bother processing SYS error */
>> -	if (mhi_cntrl->rddm_image) {
>> -		/* host may be performing a device power down already */
>> -		if (!mhi_is_active(mhi_cntrl))
>> -			goto exit_intvec;
>> +	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
>> +		goto exit_intvec;
>>   -		if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {
>> +	switch (ee) {
>> +	case MHI_EE_RDDM:
>> +		/* proceed if power down is not already in progress */
>> +		if (mhi_cntrl->rddm_image && mhi_is_active(mhi_cntrl)) {
>>   			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
>> +			mhi_cntrl->ee = ee;
>>   			wake_up_all(&mhi_cntrl->state_event);
>>   		}
>> -		goto exit_intvec;
>> -	}
>> -
>> -	if (pm_state == MHI_PM_SYS_ERR_DETECT) {
>> +		break;
>> +	case MHI_EE_PBL:
>> +	case MHI_EE_EDL:
>> +	case MHI_EE_PTHRU:
>> +		mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
>> +		mhi_cntrl->ee = ee;
>>   		wake_up_all(&mhi_cntrl->state_event);
>> -
>> -		/* For fatal errors, we let controller decide next step */
>> -		if (MHI_IN_PBL(ee))
>> -			mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
>> -		else
>> -			mhi_pm_sys_err_handler(mhi_cntrl);
>> +		mhi_pm_sys_err_handler(mhi_cntrl);
>> +		break;
>> +	default:
>> +		wake_up_all(&mhi_cntrl->state_event);
>> +		mhi_pm_sys_err_handler(mhi_cntrl);
>> +		break;
>>   	}
> 
> Bhaumik, can you explain the above change?  Before this patch (which
> is now committed), if there was a fatal error, the controller was
> notified (MHI_CB_FATAL_ERROR) and it decided all action.  After this
> patch, the controller is notified, but also the core attempts to
> handle the syserr.
> 
> This is a change in behavior, and seems to make a mess of the
> controller, and possibly the core fighting each other.
> 
> Specifically, I'm rebasing the AIC100 driver onto 5.13, which has this
> change, and I'm seeing a serious regression.  I'm thinking that for
> the PBL/EDL/PTHRU case, mhi_pm_sys_err_handler() should not be called.
> 
> Thoughts?

I see. We use this heavily for entry to EDL use-cases.

We do require the mhi_pm_sys_err_handler() to be called in those cases
as any entry to PBL/EDL means there is need to clean-up MHI host and 
notify all
its clients.

We include PTHRU in here because its a SYS ERROR in PBL modes.

Premise being the controller should not be in that business of doing any 
of
the clean-up that is responsibility of the core driver. We're using this 
feature
set to ensure controller is only notified.

What does AIC100 do on fatal error that you run in to issues? I don't 
think
any of the other controllers do anything other than disabling runtime PM 
since
device is down. Maybe there's some room for improvement.

Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially
  2021-08-23 19:19     ` Bhaumik Bhatt
@ 2021-08-23 19:38       ` Jeffrey Hugo
  2021-08-26 18:04         ` Bhaumik Bhatt
  0 siblings, 1 reply; 18+ messages in thread
From: Jeffrey Hugo @ 2021-08-23 19:38 UTC (permalink / raw)
  To: bbhatt
  Cc: manivannan.sadhasivam, linux-arm-msm, hemantk, linux-kernel,
	loic.poulain, carl.yin, naveen.kumar

On 8/23/2021 1:19 PM, Bhaumik Bhatt wrote:
> On 2021-08-23 11:43 AM, Jeffrey Hugo wrote:
>> On 2/24/2021 4:23 PM, Bhaumik Bhatt wrote:
>>> In current design, whenever the BHI interrupt is fired, the
>>> execution environment is updated. This can cause race conditions
>>> and impede ongoing power up/down processing. For example, if a
>>> power down is in progress, MHI host updates to a local "disabled"
>>> execution environment. If a BHI interrupt fires later, that value
>>> gets replaced with one from the BHI EE register. This impacts the
>>> controller as it does not expect multiple RDDM execution
>>> environment change status callbacks as an example. Another issue
>>> would be that the device can enter mission mode and the execution
>>> environment is updated, while device creation for SBL channels is
>>> still going on due to slower PM state worker thread run, leading
>>> to multiple attempts at opening the same channel.
>>>
>>> Ensure that EE changes are handled only from appropriate places
>>> and occur one after another and handle only PBL modes or RDDM EE
>>> changes as critical events directly from the interrupt handler.
>>> Simplify handling by waiting for SYS ERROR before handling RDDM.
>>> This also makes sure that we use the correct execution environment
>>> to notify the controller driver when the device resets to one of
>>> the PBL execution environments.
>>>
>>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
>>
>> <snip>
>>
>>> @@ -452,27 +451,30 @@ irqreturn_t mhi_intvec_threaded_handler(int 
>>> irq_number, void *priv)
>>>       }
>>>       write_unlock_irq(&mhi_cntrl->pm_lock);
>>>   -     /* If device supports RDDM don't bother processing SYS error */
>>> -    if (mhi_cntrl->rddm_image) {
>>> -        /* host may be performing a device power down already */
>>> -        if (!mhi_is_active(mhi_cntrl))
>>> -            goto exit_intvec;
>>> +    if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
>>> +        goto exit_intvec;
>>>   -        if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {
>>> +    switch (ee) {
>>> +    case MHI_EE_RDDM:
>>> +        /* proceed if power down is not already in progress */
>>> +        if (mhi_cntrl->rddm_image && mhi_is_active(mhi_cntrl)) {
>>>               mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
>>> +            mhi_cntrl->ee = ee;
>>>               wake_up_all(&mhi_cntrl->state_event);
>>>           }
>>> -        goto exit_intvec;
>>> -    }
>>> -
>>> -    if (pm_state == MHI_PM_SYS_ERR_DETECT) {
>>> +        break;
>>> +    case MHI_EE_PBL:
>>> +    case MHI_EE_EDL:
>>> +    case MHI_EE_PTHRU:
>>> +        mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
>>> +        mhi_cntrl->ee = ee;
>>>           wake_up_all(&mhi_cntrl->state_event);
>>> -
>>> -        /* For fatal errors, we let controller decide next step */
>>> -        if (MHI_IN_PBL(ee))
>>> -            mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
>>> -        else
>>> -            mhi_pm_sys_err_handler(mhi_cntrl);
>>> +        mhi_pm_sys_err_handler(mhi_cntrl);
>>> +        break;
>>> +    default:
>>> +        wake_up_all(&mhi_cntrl->state_event);
>>> +        mhi_pm_sys_err_handler(mhi_cntrl);
>>> +        break;
>>>       }
>>
>> Bhaumik, can you explain the above change?  Before this patch (which
>> is now committed), if there was a fatal error, the controller was
>> notified (MHI_CB_FATAL_ERROR) and it decided all action.  After this
>> patch, the controller is notified, but also the core attempts to
>> handle the syserr.
>>
>> This is a change in behavior, and seems to make a mess of the
>> controller, and possibly the core fighting each other.
>>
>> Specifically, I'm rebasing the AIC100 driver onto 5.13, which has this
>> change, and I'm seeing a serious regression.  I'm thinking that for
>> the PBL/EDL/PTHRU case, mhi_pm_sys_err_handler() should not be called.
>>
>> Thoughts?
> 
> I see. We use this heavily for entry to EDL use-cases.
> 
> We do require the mhi_pm_sys_err_handler() to be called in those cases
> as any entry to PBL/EDL means there is need to clean-up MHI host and 
> notify all
> its clients.
> 
> We include PTHRU in here because its a SYS ERROR in PBL modes.
> 
> Premise being the controller should not be in that business of doing any of
> the clean-up that is responsibility of the core driver. We're using this 
> feature
> set to ensure controller is only notified.
> 
> What does AIC100 do on fatal error that you run in to issues? I don't think
> any of the other controllers do anything other than disabling runtime PM 
> since
> device is down. Maybe there's some room for improvement.

Our usecase is PBL as a result of a full device crash.  AIC100 doesn't 
exercise the EDL/PTHRU usecases.  (Just giving you some context, not 
trying to imply EDL is not valuable to others for example).

In that case (FATAL_ERROR), our controller schedules a work item, and 
then returnsas we assume FATAL_ERROR is notified in atomic context.  We 
need to do non-mhi cleanup first.  Then we powerdown the MHI to cleanup 
the MHI core, and kick off all the clients (its been a long while, but 
initially, we were seeing the syserr handling in the core not 
sufficiently kicking off the clients).  Then we power up MHI.  MHI will 
init in PBL, still in syserr, and handle it.

I haven't fully traced everything, but we were getting into some really 
bad states with the core triggering mhi_pm_sys_err_handler() per this patch.

Its important that we have control over the ordering of our cleanup, vs 
the MHI cleanup.  Sadly, non-atomic context (sleeping) is a requirement 
of our cleanup.

Seems like we have differing requirements here.  Hmm.  What about an API 
the controller can call, that does mhi_pm_sys_err_handler() (or 
equivalent) so that the controller can control when the MHI core cleanup 
is done, but doesn't need to be concerned with the details?  Or, do you 
have a suggestion?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially
  2021-08-23 19:38       ` Jeffrey Hugo
@ 2021-08-26 18:04         ` Bhaumik Bhatt
  0 siblings, 0 replies; 18+ messages in thread
From: Bhaumik Bhatt @ 2021-08-26 18:04 UTC (permalink / raw)
  To: Jeffrey Hugo
  Cc: manivannan.sadhasivam, linux-arm-msm, hemantk, linux-kernel,
	loic.poulain, carl.yin, naveen.kumar

On 2021-08-23 12:38 PM, Jeffrey Hugo wrote:
> On 8/23/2021 1:19 PM, Bhaumik Bhatt wrote:
>> On 2021-08-23 11:43 AM, Jeffrey Hugo wrote:
>>> On 2/24/2021 4:23 PM, Bhaumik Bhatt wrote:
>>>> In current design, whenever the BHI interrupt is fired, the
>>>> execution environment is updated. This can cause race conditions
>>>> and impede ongoing power up/down processing. For example, if a
>>>> power down is in progress, MHI host updates to a local "disabled"
>>>> execution environment. If a BHI interrupt fires later, that value
>>>> gets replaced with one from the BHI EE register. This impacts the
>>>> controller as it does not expect multiple RDDM execution
>>>> environment change status callbacks as an example. Another issue
>>>> would be that the device can enter mission mode and the execution
>>>> environment is updated, while device creation for SBL channels is
>>>> still going on due to slower PM state worker thread run, leading
>>>> to multiple attempts at opening the same channel.
>>>> 
>>>> Ensure that EE changes are handled only from appropriate places
>>>> and occur one after another and handle only PBL modes or RDDM EE
>>>> changes as critical events directly from the interrupt handler.
>>>> Simplify handling by waiting for SYS ERROR before handling RDDM.
>>>> This also makes sure that we use the correct execution environment
>>>> to notify the controller driver when the device resets to one of
>>>> the PBL execution environments.
>>>> 
>>>> Signed-off-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
>>> 
>>> <snip>
>>> 
>>>> @@ -452,27 +451,30 @@ irqreturn_t mhi_intvec_threaded_handler(int 
>>>> irq_number, void *priv)
>>>>       }
>>>>       write_unlock_irq(&mhi_cntrl->pm_lock);
>>>>   -     /* If device supports RDDM don't bother processing SYS error 
>>>> */
>>>> -    if (mhi_cntrl->rddm_image) {
>>>> -        /* host may be performing a device power down already */
>>>> -        if (!mhi_is_active(mhi_cntrl))
>>>> -            goto exit_intvec;
>>>> +    if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
>>>> +        goto exit_intvec;
>>>>   -        if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) 
>>>> {
>>>> +    switch (ee) {
>>>> +    case MHI_EE_RDDM:
>>>> +        /* proceed if power down is not already in progress */
>>>> +        if (mhi_cntrl->rddm_image && mhi_is_active(mhi_cntrl)) {
>>>>               mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
>>>> +            mhi_cntrl->ee = ee;
>>>>               wake_up_all(&mhi_cntrl->state_event);
>>>>           }
>>>> -        goto exit_intvec;
>>>> -    }
>>>> -
>>>> -    if (pm_state == MHI_PM_SYS_ERR_DETECT) {
>>>> +        break;
>>>> +    case MHI_EE_PBL:
>>>> +    case MHI_EE_EDL:
>>>> +    case MHI_EE_PTHRU:
>>>> +        mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
>>>> +        mhi_cntrl->ee = ee;
>>>>           wake_up_all(&mhi_cntrl->state_event);
>>>> -
>>>> -        /* For fatal errors, we let controller decide next step */
>>>> -        if (MHI_IN_PBL(ee))
>>>> -            mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
>>>> -        else
>>>> -            mhi_pm_sys_err_handler(mhi_cntrl);
>>>> +        mhi_pm_sys_err_handler(mhi_cntrl);
>>>> +        break;
>>>> +    default:
>>>> +        wake_up_all(&mhi_cntrl->state_event);
>>>> +        mhi_pm_sys_err_handler(mhi_cntrl);
>>>> +        break;
>>>>       }
>>> 
>>> Bhaumik, can you explain the above change?  Before this patch (which
>>> is now committed), if there was a fatal error, the controller was
>>> notified (MHI_CB_FATAL_ERROR) and it decided all action.  After this
>>> patch, the controller is notified, but also the core attempts to
>>> handle the syserr.
>>> 
>>> This is a change in behavior, and seems to make a mess of the
>>> controller, and possibly the core fighting each other.
>>> 
>>> Specifically, I'm rebasing the AIC100 driver onto 5.13, which has 
>>> this
>>> change, and I'm seeing a serious regression.  I'm thinking that for
>>> the PBL/EDL/PTHRU case, mhi_pm_sys_err_handler() should not be 
>>> called.
>>> 
>>> Thoughts?
>> 
>> I see. We use this heavily for entry to EDL use-cases.
>> 
>> We do require the mhi_pm_sys_err_handler() to be called in those cases
>> as any entry to PBL/EDL means there is need to clean-up MHI host and 
>> notify all
>> its clients.
>> 
>> We include PTHRU in here because its a SYS ERROR in PBL modes.
>> 
>> Premise being the controller should not be in that business of doing 
>> any of
>> the clean-up that is responsibility of the core driver. We're using 
>> this feature
>> set to ensure controller is only notified.
>> 
>> What does AIC100 do on fatal error that you run in to issues? I don't 
>> think
>> any of the other controllers do anything other than disabling runtime 
>> PM since
>> device is down. Maybe there's some room for improvement.
> 
> Our usecase is PBL as a result of a full device crash.  AIC100 doesn't
> exercise the EDL/PTHRU usecases.  (Just giving you some context, not
> trying to imply EDL is not valuable to others for example).
> 
> In that case (FATAL_ERROR), our controller schedules a work item, and
> then returnsas we assume FATAL_ERROR is notified in atomic context.
> We need to do non-mhi cleanup first.  Then we powerdown the MHI to
> cleanup the MHI core, and kick off all the clients (its been a long
> while, but initially, we were seeing the syserr handling in the core
> not sufficiently kicking off the clients).  Then we power up MHI.  MHI
> will init in PBL, still in syserr, and handle it.
> 
> I haven't fully traced everything, but we were getting into some
> really bad states with the core triggering mhi_pm_sys_err_handler()
> per this patch.
> 
> Its important that we have control over the ordering of our cleanup,
> vs the MHI cleanup.  Sadly, non-atomic context (sleeping) is a
> requirement of our cleanup.
> 
> Seems like we have differing requirements here.  Hmm.  What about an
> API the controller can call, that does mhi_pm_sys_err_handler() (or
> equivalent) so that the controller can control when the MHI core
> cleanup is done, but doesn't need to be concerned with the details?
> Or, do you have a suggestion?

I see why you're seeing the issue. We had a design shift when we 
introduced
EDL handling and updated how the execution environments and state 
changes get
processed.

Previously, you would only receive the FATAL ERROR callback if a 
SYS_ERROR
entry in PBL execution environment occurred. Now, you're getting both 
the
callbacks as MHI core driver takes the responsibility of SYS_ERROR 
handling
and related clean-up.

The way we can take it forward is - FATAL ERROR callback should just be 
an
indication to the controller that device saw a SYS_ERROR and has entered 
PBL.

The SYS_ERROR callback is sleep-capable and will block for the 
controller to
perform any heavy-lifting clean-up with the exception of an 
mhi_power_down()
call from within that callback. If that provision is desired, it can be 
taken
care of in a future patch as an async power down request or something 
equivalent.

I will push a patch to improve the documentation for this.

Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2021-08-26 18:04 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 23:23 [PATCH v6 0/4] Serialize execution environment changes for MHI Bhaumik Bhatt
2021-02-24 23:23 ` [PATCH v6 1/4] bus: mhi: core: Destroy SBL devices when moving to mission mode Bhaumik Bhatt
2021-02-26 22:06   ` Hemant Kumar
2021-03-04  8:41   ` Loic Poulain
2021-03-10 14:03   ` Manivannan Sadhasivam
2021-02-24 23:23 ` [PATCH v6 2/4] bus: mhi: core: Download AMSS image from appropriate function Bhaumik Bhatt
2021-03-04  8:42   ` Loic Poulain
2021-02-24 23:23 ` [PATCH v6 3/4] bus: mhi: core: Process execution environment changes serially Bhaumik Bhatt
2021-03-04  8:43   ` Loic Poulain
2021-03-10 14:04   ` Manivannan Sadhasivam
2021-08-23 18:43   ` Jeffrey Hugo
2021-08-23 19:19     ` Bhaumik Bhatt
2021-08-23 19:38       ` Jeffrey Hugo
2021-08-26 18:04         ` Bhaumik Bhatt
2021-02-24 23:23 ` [PATCH v6 4/4] bus: mhi: core: Update debug prints to include local device state Bhaumik Bhatt
2021-03-04  8:45   ` Loic Poulain
2021-03-10 14:07 ` [PATCH v6 0/4] Serialize execution environment changes for MHI Manivannan Sadhasivam
2021-05-26 19:03 ` patchwork-bot+linux-arm-msm

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).