mhi.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] MHI host syserr fixes
@ 2023-01-24 21:57 Jeffrey Hugo
  2023-01-24 21:57 ` [PATCH 1/2] bus: mhi: host: Remove duplicate ee check for syserr Jeffrey Hugo
  2023-01-24 21:57 ` [PATCH 2/2] bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state Jeffrey Hugo
  0 siblings, 2 replies; 6+ messages in thread
From: Jeffrey Hugo @ 2023-01-24 21:57 UTC (permalink / raw)
  To: mani; +Cc: mhi, linux-arm-msm, linux-kernel, Jeffrey Hugo

Two small fixes that address an issue where it is observed in stress
testing that a MHI device could appear to enter a bad state and be unable
to recover unless the module is removed and re-added which should not be
necessary.

Jeffrey Hugo (2):
  bus: mhi: host: Remove duplicate ee check for syserr
  bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state

 drivers/bus/mhi/host/boot.c | 16 ++++++++++++----
 drivers/bus/mhi/host/main.c |  2 +-
 2 files changed, 13 insertions(+), 5 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] bus: mhi: host: Remove duplicate ee check for syserr
  2023-01-24 21:57 [PATCH 0/2] MHI host syserr fixes Jeffrey Hugo
@ 2023-01-24 21:57 ` Jeffrey Hugo
  2023-04-03  5:37   ` Manivannan Sadhasivam
  2023-01-24 21:57 ` [PATCH 2/2] bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state Jeffrey Hugo
  1 sibling, 1 reply; 6+ messages in thread
From: Jeffrey Hugo @ 2023-01-24 21:57 UTC (permalink / raw)
  To: mani; +Cc: mhi, linux-arm-msm, linux-kernel, Jeffrey Hugo

If we detect a system error via intvec, we only process the syserr if the
current ee is different than the last observed ee.  The reason for this
check is to prevent bhie from running multiple times, but with the single
queue handling syserr, that is not possible.

The check can cause an issue with device recovery.  If PBL loads a bad SBL
via BHI, but that SBL hangs before notifying the host of an ee change,
then issuing soc_reset to crash the device and retry (after supplying a
fixed SBL) will not recover the device as the host will observe a PBL->PBL
transition and not process the syserr.  The device will be stuck until
either the driver is reloaded, or the host is rebooted.  Instead, remove
the check so that we can attempt to recover the device.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/bus/mhi/host/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
index df0fbfe..0c3a009 100644
--- a/drivers/bus/mhi/host/main.c
+++ b/drivers/bus/mhi/host/main.c
@@ -503,7 +503,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
 	}
 	write_unlock_irq(&mhi_cntrl->pm_lock);
 
-	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
+	if (pm_state != MHI_PM_SYS_ERR_DETECT)
 		goto exit_intvec;
 
 	switch (ee) {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state
  2023-01-24 21:57 [PATCH 0/2] MHI host syserr fixes Jeffrey Hugo
  2023-01-24 21:57 ` [PATCH 1/2] bus: mhi: host: Remove duplicate ee check for syserr Jeffrey Hugo
@ 2023-01-24 21:57 ` Jeffrey Hugo
  2023-04-03  5:42   ` Manivannan Sadhasivam
  1 sibling, 1 reply; 6+ messages in thread
From: Jeffrey Hugo @ 2023-01-24 21:57 UTC (permalink / raw)
  To: mani; +Cc: mhi, linux-arm-msm, linux-kernel, Jeffrey Hugo

If firmware loading fails, the controller's pm_state is updated to
MHI_PM_FW_DL_ERR unconditionally.  This can corrupt the pm_state as the
update is not done under the proper lock, and also does not validate
the state transition.  The firmware loading can fail due to a detected
syserr, but if MHI_PM_FW_DL_ERR is unconditionally set as the pm_state,
the handling of the syserr can break when it attempts to transition from
syserr detect, to syserr process.

By grabbing the lock, we ensure we don't race with some other pm_state
update.  By using mhi_try_set_pm_state(), we check that the transition
to MHI_PM_FW_DL_ERR is valid via the state machine logic.  If it is not
valid, then some other transition is occurring like syserr processing, and
we assume that will resolve the firmware loading error.

Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
---
 drivers/bus/mhi/host/boot.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
index 1c69fee..d2a19b07 100644
--- a/drivers/bus/mhi/host/boot.c
+++ b/drivers/bus/mhi/host/boot.c
@@ -391,6 +391,7 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
 {
 	const struct firmware *firmware = NULL;
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
+	enum mhi_pm_state new_state;
 	const char *fw_name;
 	void *buf;
 	dma_addr_t dma_addr;
@@ -508,14 +509,18 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
 	}
 
 error_fw_load:
-	mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
-	wake_up_all(&mhi_cntrl->state_event);
+	write_lock_irq(&mhi_cntrl->pm_lock);
+	new_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_FW_DL_ERR);
+	write_unlock_irq(&mhi_cntrl->pm_lock);
+	if (new_state == MHI_PM_FW_DL_ERR)
+		wake_up_all(&mhi_cntrl->state_event);
 }
 
 int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
 {
 	struct image_info *image_info = mhi_cntrl->fbc_image;
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
+	enum mhi_pm_state new_state;
 	int ret;
 
 	if (!image_info)
@@ -526,8 +531,11 @@ int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
 			       &image_info->mhi_buf[image_info->entries - 1]);
 	if (ret) {
 		dev_err(dev, "MHI did not load AMSS, ret:%d\n", ret);
-		mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
-		wake_up_all(&mhi_cntrl->state_event);
+		write_lock_irq(&mhi_cntrl->pm_lock);
+		new_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_FW_DL_ERR);
+		write_unlock_irq(&mhi_cntrl->pm_lock);
+		if (new_state == MHI_PM_FW_DL_ERR)
+			wake_up_all(&mhi_cntrl->state_event);
 	}
 
 	return ret;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] bus: mhi: host: Remove duplicate ee check for syserr
  2023-01-24 21:57 ` [PATCH 1/2] bus: mhi: host: Remove duplicate ee check for syserr Jeffrey Hugo
@ 2023-04-03  5:37   ` Manivannan Sadhasivam
  2023-04-03  5:45     ` Manivannan Sadhasivam
  0 siblings, 1 reply; 6+ messages in thread
From: Manivannan Sadhasivam @ 2023-04-03  5:37 UTC (permalink / raw)
  To: Jeffrey Hugo; +Cc: mhi, linux-arm-msm, linux-kernel

On Tue, Jan 24, 2023 at 02:57:23PM -0700, Jeffrey Hugo wrote:
> If we detect a system error via intvec, we only process the syserr if the
> current ee is different than the last observed ee.  The reason for this
> check is to prevent bhie from running multiple times, but with the single
> queue handling syserr, that is not possible.
> 
> The check can cause an issue with device recovery.  If PBL loads a bad SBL
> via BHI, but that SBL hangs before notifying the host of an ee change,
> then issuing soc_reset to crash the device and retry (after supplying a
> fixed SBL) will not recover the device as the host will observe a PBL->PBL
> transition and not process the syserr.  The device will be stuck until
> either the driver is reloaded, or the host is rebooted.  Instead, remove
> the check so that we can attempt to recover the device.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

- Mani

> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  drivers/bus/mhi/host/main.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
> index df0fbfe..0c3a009 100644
> --- a/drivers/bus/mhi/host/main.c
> +++ b/drivers/bus/mhi/host/main.c
> @@ -503,7 +503,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
>  	}
>  	write_unlock_irq(&mhi_cntrl->pm_lock);
>  
> -	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
> +	if (pm_state != MHI_PM_SYS_ERR_DETECT)
>  		goto exit_intvec;
>  
>  	switch (ee) {
> -- 
> 2.7.4
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state
  2023-01-24 21:57 ` [PATCH 2/2] bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state Jeffrey Hugo
@ 2023-04-03  5:42   ` Manivannan Sadhasivam
  0 siblings, 0 replies; 6+ messages in thread
From: Manivannan Sadhasivam @ 2023-04-03  5:42 UTC (permalink / raw)
  To: Jeffrey Hugo; +Cc: mhi, linux-arm-msm, linux-kernel

On Tue, Jan 24, 2023 at 02:57:24PM -0700, Jeffrey Hugo wrote:
> If firmware loading fails, the controller's pm_state is updated to
> MHI_PM_FW_DL_ERR unconditionally.  This can corrupt the pm_state as the
> update is not done under the proper lock, and also does not validate
> the state transition.  The firmware loading can fail due to a detected
> syserr, but if MHI_PM_FW_DL_ERR is unconditionally set as the pm_state,
> the handling of the syserr can break when it attempts to transition from
> syserr detect, to syserr process.
> 
> By grabbing the lock, we ensure we don't race with some other pm_state
> update.  By using mhi_try_set_pm_state(), we check that the transition
> to MHI_PM_FW_DL_ERR is valid via the state machine logic.  If it is not
> valid, then some other transition is occurring like syserr processing, and
> we assume that will resolve the firmware loading error.
> 
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>

This looks like a legitimate fix. So please add the fixes tag and CC stable
for backporting (please add a hint on how far this patch has to be backported).

With that,

Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>

- Mani

> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> ---
>  drivers/bus/mhi/host/boot.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/bus/mhi/host/boot.c b/drivers/bus/mhi/host/boot.c
> index 1c69fee..d2a19b07 100644
> --- a/drivers/bus/mhi/host/boot.c
> +++ b/drivers/bus/mhi/host/boot.c
> @@ -391,6 +391,7 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
>  {
>  	const struct firmware *firmware = NULL;
>  	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> +	enum mhi_pm_state new_state;
>  	const char *fw_name;
>  	void *buf;
>  	dma_addr_t dma_addr;
> @@ -508,14 +509,18 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
>  	}
>  
>  error_fw_load:
> -	mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
> -	wake_up_all(&mhi_cntrl->state_event);
> +	write_lock_irq(&mhi_cntrl->pm_lock);
> +	new_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_FW_DL_ERR);
> +	write_unlock_irq(&mhi_cntrl->pm_lock);
> +	if (new_state == MHI_PM_FW_DL_ERR)
> +		wake_up_all(&mhi_cntrl->state_event);
>  }
>  
>  int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
>  {
>  	struct image_info *image_info = mhi_cntrl->fbc_image;
>  	struct device *dev = &mhi_cntrl->mhi_dev->dev;
> +	enum mhi_pm_state new_state;
>  	int ret;
>  
>  	if (!image_info)
> @@ -526,8 +531,11 @@ int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
>  			       &image_info->mhi_buf[image_info->entries - 1]);
>  	if (ret) {
>  		dev_err(dev, "MHI did not load AMSS, ret:%d\n", ret);
> -		mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
> -		wake_up_all(&mhi_cntrl->state_event);
> +		write_lock_irq(&mhi_cntrl->pm_lock);
> +		new_state = mhi_tryset_pm_state(mhi_cntrl, MHI_PM_FW_DL_ERR);
> +		write_unlock_irq(&mhi_cntrl->pm_lock);
> +		if (new_state == MHI_PM_FW_DL_ERR)
> +			wake_up_all(&mhi_cntrl->state_event);
>  	}
>  
>  	return ret;
> -- 
> 2.7.4
> 

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] bus: mhi: host: Remove duplicate ee check for syserr
  2023-04-03  5:37   ` Manivannan Sadhasivam
@ 2023-04-03  5:45     ` Manivannan Sadhasivam
  0 siblings, 0 replies; 6+ messages in thread
From: Manivannan Sadhasivam @ 2023-04-03  5:45 UTC (permalink / raw)
  To: Jeffrey Hugo; +Cc: mhi, linux-arm-msm, linux-kernel

On Mon, Apr 03, 2023 at 11:07:35AM +0530, Manivannan Sadhasivam wrote:
> On Tue, Jan 24, 2023 at 02:57:23PM -0700, Jeffrey Hugo wrote:
> > If we detect a system error via intvec, we only process the syserr if the
> > current ee is different than the last observed ee.  The reason for this
> > check is to prevent bhie from running multiple times, but with the single
> > queue handling syserr, that is not possible.
> > 
> > The check can cause an issue with device recovery.  If PBL loads a bad SBL
> > via BHI, but that SBL hangs before notifying the host of an ee change,
> > then issuing soc_reset to crash the device and retry (after supplying a
> > fixed SBL) will not recover the device as the host will observe a PBL->PBL
> > transition and not process the syserr.  The device will be stuck until
> > either the driver is reloaded, or the host is rebooted.  Instead, remove
> > the check so that we can attempt to recover the device.
> > 
> > Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> 
> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
> 

Forgot to add that, this patch also needs a fixes tag and backporting.

- Mani

> - Mani
> 
> > Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
> > ---
> >  drivers/bus/mhi/host/main.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/bus/mhi/host/main.c b/drivers/bus/mhi/host/main.c
> > index df0fbfe..0c3a009 100644
> > --- a/drivers/bus/mhi/host/main.c
> > +++ b/drivers/bus/mhi/host/main.c
> > @@ -503,7 +503,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, void *priv)
> >  	}
> >  	write_unlock_irq(&mhi_cntrl->pm_lock);
> >  
> > -	if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
> > +	if (pm_state != MHI_PM_SYS_ERR_DETECT)
> >  		goto exit_intvec;
> >  
> >  	switch (ee) {
> > -- 
> > 2.7.4
> > 
> 
> -- 
> மணிவண்ணன் சதாசிவம்

-- 
மணிவண்ணன் சதாசிவம்

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-03  5:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-24 21:57 [PATCH 0/2] MHI host syserr fixes Jeffrey Hugo
2023-01-24 21:57 ` [PATCH 1/2] bus: mhi: host: Remove duplicate ee check for syserr Jeffrey Hugo
2023-04-03  5:37   ` Manivannan Sadhasivam
2023-04-03  5:45     ` Manivannan Sadhasivam
2023-01-24 21:57 ` [PATCH 2/2] bus: mhi: host: Use mhi_tryset_pm_state() for setting fw error state Jeffrey Hugo
2023-04-03  5:42   ` Manivannan Sadhasivam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).