stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up
@ 2021-11-18  5:57 Manivannan Sadhasivam
  2021-11-18  9:55 ` Aleksander Morgado
  0 siblings, 1 reply; 6+ messages in thread
From: Manivannan Sadhasivam @ 2021-11-18  5:57 UTC (permalink / raw)
  To: mhi
  Cc: aleksander, loic.poulain, thomas.perrot, hemantk, bbhatt,
	quic_jhugo, linux-arm-msm, Manivannan Sadhasivam, stable

Some devices tend to trigger SYS_ERR interrupt while the host handling
SYS_ERR state of the device during power up. This creates a race
condition and causes a failure in booting up the device.

The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
handling in mhi_async_power_up(). Once the host detects that the device
is in SYS_ERR state, it issues MHI_RESET and waits for the device to
process the reset request. During this time, the device triggers SYS_ERR
interrupt to the host and host starts handling SYS_ERR execution.

So by the time the device has completed reset, host starts SYS_ERR
handling. This causes the race condition and the modem fails to boot.

Hence, register the IRQ handler only after handling the SYS_ERR check
to avoid getting spurious IRQs from the device.

Cc: stable@vger.kernel.org
Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
Reported-by: Aleksander Morgado <aleksander@aleksander.es>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
---

Changes in v3:

* Moved BHI_INTVEC setup after irq setup
* Used interval_us as the delay for the polling API

Changes in v2:

* Switched to "mhi_poll_reg_field" for detecting MHI reset in device.

 drivers/bus/mhi/core/pm.c | 38 ++++++++++++++------------------------
 1 file changed, 14 insertions(+), 24 deletions(-)

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index fb99e3727155..ee0515a25e46 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -1038,7 +1038,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
 	enum mhi_ee_type current_ee;
 	enum dev_st_transition next_state;
 	struct device *dev = &mhi_cntrl->mhi_dev->dev;
-	u32 val;
+	u32 interval_us = 25000; /* poll register field every 25 milliseconds */
 	int ret;
 
 	dev_info(dev, "Requested to power ON\n");
@@ -1055,13 +1055,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
 	mutex_lock(&mhi_cntrl->pm_mutex);
 	mhi_cntrl->pm_state = MHI_PM_DISABLE;
 
-	ret = mhi_init_irq_setup(mhi_cntrl);
-	if (ret)
-		goto error_setup_irq;
-
-	/* Setup BHI INTVEC */
 	write_lock_irq(&mhi_cntrl->pm_lock);
-	mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
 	mhi_cntrl->pm_state = MHI_PM_POR;
 	mhi_cntrl->ee = MHI_EE_MAX;
 	current_ee = mhi_get_exec_env(mhi_cntrl);
@@ -1072,7 +1066,7 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
 		dev_err(dev, "%s is not a valid EE for power on\n",
 			TO_MHI_EXEC_STR(current_ee));
 		ret = -EIO;
-		goto error_async_power_up;
+		goto error_setup_irq;
 	}
 
 	state = mhi_get_mhi_state(mhi_cntrl);
@@ -1081,20 +1075,12 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
 
 	if (state == MHI_STATE_SYS_ERR) {
 		mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
-		ret = wait_event_timeout(mhi_cntrl->state_event,
-				MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state) ||
-					mhi_read_reg_field(mhi_cntrl,
-							   mhi_cntrl->regs,
-							   MHICTRL,
-							   MHICTRL_RESET_MASK,
-							   MHICTRL_RESET_SHIFT,
-							   &val) ||
-					!val,
-				msecs_to_jiffies(mhi_cntrl->timeout_ms));
-		if (!ret) {
-			ret = -EIO;
+		ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL,
+				 MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0,
+				 interval_us);
+		if (ret) {
 			dev_info(dev, "Failed to reset MHI due to syserr state\n");
-			goto error_async_power_up;
+			goto error_setup_irq;
 		}
 
 		/*
@@ -1104,6 +1090,13 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
 		mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
 	}
 
+	ret = mhi_init_irq_setup(mhi_cntrl);
+	if (ret)
+		goto error_setup_irq;
+
+	/* Setup BHI INTVEC */
+	mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
+
 	/* Transition to next state */
 	next_state = MHI_IN_PBL(current_ee) ?
 		DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
@@ -1116,9 +1109,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
 
 	return 0;
 
-error_async_power_up:
-	mhi_deinit_free_irq(mhi_cntrl);
-
 error_setup_irq:
 	mhi_cntrl->pm_state = MHI_PM_DISABLE;
 	mutex_unlock(&mhi_cntrl->pm_mutex);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up
  2021-11-18  5:57 [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up Manivannan Sadhasivam
@ 2021-11-18  9:55 ` Aleksander Morgado
  2021-11-20 11:25   ` Manivannan Sadhasivam
  0 siblings, 1 reply; 6+ messages in thread
From: Aleksander Morgado @ 2021-11-18  9:55 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: mhi, Loic Poulain, Thomas Perrot, Hemant Kumar, Bhaumik Bhatt,
	quic_jhugo, linux-arm-msm, stable

Hey Mani,

On Thu, Nov 18, 2021 at 6:57 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> Some devices tend to trigger SYS_ERR interrupt while the host handling
> SYS_ERR state of the device during power up. This creates a race
> condition and causes a failure in booting up the device.
>
> The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
> handling in mhi_async_power_up(). Once the host detects that the device
> is in SYS_ERR state, it issues MHI_RESET and waits for the device to
> process the reset request. During this time, the device triggers SYS_ERR
> interrupt to the host and host starts handling SYS_ERR execution.
>
> So by the time the device has completed reset, host starts SYS_ERR
> handling. This causes the race condition and the modem fails to boot.
>
> Hence, register the IRQ handler only after handling the SYS_ERR check
> to avoid getting spurious IRQs from the device.
>
> Cc: stable@vger.kernel.org
> Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
> Reported-by: Aleksander Morgado <aleksander@aleksander.es>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> ---
>
> Changes in v3:
>
> * Moved BHI_INTVEC setup after irq setup
> * Used interval_us as the delay for the polling API
>
> Changes in v2:
>
> * Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
>

I tried this v3 patch and I'm not sure if it's working properly in my
setup; not all boots are successfully bringing the modem up.

Once I installed it, I kept having this kind of logs on every boot:
[    7.030407] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
0x600000000-0x600000fff 64bit]
[    7.038984] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
[    7.045814] mhi-pci-generic 0000:01:00.0: using shared MSI
[    7.052191] mhi mhi0: Requested to power ON
[    7.168042] mhi mhi0: Power on setup success
[    7.168141] mhi mhi0: Wait for device to enter SBL or Mission mode
[   15.687938] mhi-pci-generic 0000:01:00.0: failed to suspend device: -16

I didn't have debug logs enabled in that build, so I created a new one
with them enabled, and... these are the logs I'm getting now (not the
same ones as above, not sure why):

// Cold boots fail (tried several times)
[    7.033370] mhi-pci-generic 0000:01:00.0: MHI PCI device found: sierra-em919x
[    7.040558] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
0x600000000-0x600000fff 64bit]
[    7.049105] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
[    7.055923] mhi-pci-generic 0000:01:00.0: using shared MSI
[    7.062130] mhi mhi0: Requested to power ON
[    7.066351] mhi mhi0: Attempting power on with EE: PASS THROUGH,
state: SYS ERROR
[   20.572010] mhi mhi0: Power on setup success
[   20.576412] mhi mhi0: Handling state transition: PBL
[   55.139820] mhi mhi0: Device failed to enter MHI Ready
[   55.144978] mhi mhi0: MHI did not enter READY state
[   55.149876] mhi mhi0: Handling state transition: DISABLE
[   55.155203] mhi mhi0: Processing disable transition with PM state:
Firmware Download Error
[   55.163482] mhi mhi0: Triggering MHI Reset in device
[   89.727820] mhi mhi0: Device failed to clear MHI Reset
[   89.732975] mhi mhi0: Waiting for all pending event ring processing
to complete
[   89.740316] mhi mhi0: Waiting for all pending threads to complete
[   89.746422] mhi mhi0: Reset all active channels and remove MHI devices
[   89.752963] mhi mhi0: Resetting EV CTXT and CMD CTXT
[   89.757940] mhi mhi0: Error moving from PM state: Firmware Download
Error to: DISABLE
[   89.765785] mhi mhi0: Exiting with PM state: Firmware Download
Error, MHI state: RESET
[   89.773793] mhi-pci-generic 0000:01:00.0: failed to power up MHI controller
[   89.781067] mhi-pci-generic: probe of 0000:01:00.0 failed with error -110

// Warm reboots afterwards seem to go ok (tried several times)
[    7.072762] mhi-pci-generic 0000:01:00.0: MHI PCI device found: sierra-em919x
[    7.079942] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
0x600000000-0x600000fff 64bit]
[    7.088505] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
[    7.095331] mhi-pci-generic 0000:01:00.0: using shared MSI
[    7.101628] mhi mhi0: Requested to power ON
[    7.105859] mhi mhi0: Attempting power on with EE: PASS THROUGH,
state: SYS ERROR
[    7.224053] mhi mhi0: Power on setup success
[    7.228373] mhi mhi0: local ee: INVALID_EE state: RESET device ee:
MISSION MODE state: READY
[    7.236878] mhi mhi0: Handling state transition: PBL
[    7.241871] mhi mhi0: Device in READY State
[    7.246118] mhi mhi0: Initializing MHI registers
[    7.250775] mhi mhi0: Wait for device to enter SBL or Mission mode
[   15.418147] mhi mhi0: local ee: MISSION MODE state: READY device
ee: MISSION MODE state: READY
[   16.025027] mhi mhi0: State change event to state: M0
[   16.025041] mhi mhi0: local ee: MISSION MODE state: READY device
ee: MISSION MODE state: M0
[   16.038500] mhi mhi0: Received EE event: MISSION MODE
[   16.038505] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   16.051671] mhi mhi0: Handling state transition: MISSION MODE
[   16.057434] mhi mhi0: Processing Mission Mode transition
[   16.063780] mhi_net mhi0_IP_HW0: 100: Updating channel state to: START
[   16.197073] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   16.201196] mhi_net mhi0_IP_HW0: 100: Channel state change to START
successful
[   16.212565] mhi_net mhi0_IP_HW0: 101: Updating channel state to: START
[   16.362262] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   16.362268] mhi_net mhi0_IP_HW0: 101: Channel state change to START
successful
[   18.860081] mhi mhi0: Allowing M3 transition
[   18.864380] mhi mhi0: Waiting for M3 completion
[   19.080218] mhi mhi0: State change event to state: M3
[   19.080228] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M3
[   21.899049] mhi_wwan_ctrl mhi0_DUN: 32: Updating channel state to: START
[   21.924031] mhi mhi0: Entered with PM state: M3, MHI state: M3
[   21.952174] mhi mhi0: State change event to state: M0
[   21.952193] mhi mhi0: local ee: MISSION MODE state: M3 device ee:
MISSION MODE state: M0
[   21.967549] mhi mhi0: local ee: MISSION MODE state: M0 device ee:
MISSION MODE state: M0
[   21.967553] mhi_wwan_ctrl mhi0_DUN: 32: Channel state change to
START successful

I didn't try the v1 or v2 patches (sorry!), so not sure if the issues
come in this last iteration or in an earlier one. Do you want me to
try with v1 and v2 as well?

The patch that was working very reliably (100%) for me was the "bus:
mhi: Register IRQ handler after SYS_ERR check during power up" one,
which you attached here:
https://www.spinics.net/lists/linux-arm-msm/msg97646.html

-- 
Aleksander
https://aleksander.es

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up
  2021-11-18  9:55 ` Aleksander Morgado
@ 2021-11-20 11:25   ` Manivannan Sadhasivam
  2021-11-20 11:36     ` Manivannan Sadhasivam
  0 siblings, 1 reply; 6+ messages in thread
From: Manivannan Sadhasivam @ 2021-11-20 11:25 UTC (permalink / raw)
  To: Aleksander Morgado
  Cc: mhi, Loic Poulain, Thomas Perrot, Hemant Kumar, Bhaumik Bhatt,
	quic_jhugo, linux-arm-msm, stable

Hey,

On Thu, Nov 18, 2021 at 10:55:51AM +0100, Aleksander Morgado wrote:
> Hey Mani,
> 
> On Thu, Nov 18, 2021 at 6:57 AM Manivannan Sadhasivam
> <manivannan.sadhasivam@linaro.org> wrote:
> >
> > Some devices tend to trigger SYS_ERR interrupt while the host handling
> > SYS_ERR state of the device during power up. This creates a race
> > condition and causes a failure in booting up the device.
> >
> > The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
> > handling in mhi_async_power_up(). Once the host detects that the device
> > is in SYS_ERR state, it issues MHI_RESET and waits for the device to
> > process the reset request. During this time, the device triggers SYS_ERR
> > interrupt to the host and host starts handling SYS_ERR execution.
> >
> > So by the time the device has completed reset, host starts SYS_ERR
> > handling. This causes the race condition and the modem fails to boot.
> >
> > Hence, register the IRQ handler only after handling the SYS_ERR check
> > to avoid getting spurious IRQs from the device.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
> > Reported-by: Aleksander Morgado <aleksander@aleksander.es>
> > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > ---
> >
> > Changes in v3:
> >
> > * Moved BHI_INTVEC setup after irq setup
> > * Used interval_us as the delay for the polling API
> >
> > Changes in v2:
> >
> > * Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
> >
> 
> I tried this v3 patch and I'm not sure if it's working properly in my
> setup; not all boots are successfully bringing the modem up.
> 

Ouch!

> Once I installed it, I kept having this kind of logs on every boot:
> [    7.030407] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
> 0x600000000-0x600000fff 64bit]
> [    7.038984] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
> [    7.045814] mhi-pci-generic 0000:01:00.0: using shared MSI
> [    7.052191] mhi mhi0: Requested to power ON
> [    7.168042] mhi mhi0: Power on setup success
> [    7.168141] mhi mhi0: Wait for device to enter SBL or Mission mode
> [   15.687938] mhi-pci-generic 0000:01:00.0: failed to suspend device: -16

[...]

> I didn't try the v1 or v2 patches (sorry!), so not sure if the issues
> come in this last iteration or in an earlier one. Do you want me to
> try with v1 and v2 as well?
> 

Yes, please. Nothing changed other than moving the BHI_INTVEC programming.

Thanks,
Mani

> The patch that was working very reliably (100%) for me was the "bus:
> mhi: Register IRQ handler after SYS_ERR check during power up" one,
> which you attached here:
> https://www.spinics.net/lists/linux-arm-msm/msg97646.html
> 
> -- 
> Aleksander
> https://aleksander.es

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up
  2021-11-20 11:25   ` Manivannan Sadhasivam
@ 2021-11-20 11:36     ` Manivannan Sadhasivam
  2021-11-22 13:13       ` Aleksander Morgado
  0 siblings, 1 reply; 6+ messages in thread
From: Manivannan Sadhasivam @ 2021-11-20 11:36 UTC (permalink / raw)
  To: Aleksander Morgado
  Cc: mhi, Loic Poulain, Thomas Perrot, Hemant Kumar, Bhaumik Bhatt,
	quic_jhugo, linux-arm-msm, stable

On Sat, Nov 20, 2021 at 04:55:15PM +0530, Manivannan Sadhasivam wrote:
> Hey,
> 
> On Thu, Nov 18, 2021 at 10:55:51AM +0100, Aleksander Morgado wrote:
> > Hey Mani,
> > 
> > On Thu, Nov 18, 2021 at 6:57 AM Manivannan Sadhasivam
> > <manivannan.sadhasivam@linaro.org> wrote:
> > >
> > > Some devices tend to trigger SYS_ERR interrupt while the host handling
> > > SYS_ERR state of the device during power up. This creates a race
> > > condition and causes a failure in booting up the device.
> > >
> > > The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
> > > handling in mhi_async_power_up(). Once the host detects that the device
> > > is in SYS_ERR state, it issues MHI_RESET and waits for the device to
> > > process the reset request. During this time, the device triggers SYS_ERR
> > > interrupt to the host and host starts handling SYS_ERR execution.
> > >
> > > So by the time the device has completed reset, host starts SYS_ERR
> > > handling. This causes the race condition and the modem fails to boot.
> > >
> > > Hence, register the IRQ handler only after handling the SYS_ERR check
> > > to avoid getting spurious IRQs from the device.
> > >
> > > Cc: stable@vger.kernel.org
> > > Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
> > > Reported-by: Aleksander Morgado <aleksander@aleksander.es>
> > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > ---
> > >
> > > Changes in v3:
> > >
> > > * Moved BHI_INTVEC setup after irq setup
> > > * Used interval_us as the delay for the polling API
> > >
> > > Changes in v2:
> > >
> > > * Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
> > >
> > 
> > I tried this v3 patch and I'm not sure if it's working properly in my
> > setup; not all boots are successfully bringing the modem up.
> > 
> 
> Ouch!
> 
> > Once I installed it, I kept having this kind of logs on every boot:
> > [    7.030407] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
> > 0x600000000-0x600000fff 64bit]
> > [    7.038984] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
> > [    7.045814] mhi-pci-generic 0000:01:00.0: using shared MSI
> > [    7.052191] mhi mhi0: Requested to power ON
> > [    7.168042] mhi mhi0: Power on setup success
> > [    7.168141] mhi mhi0: Wait for device to enter SBL or Mission mode
> > [   15.687938] mhi-pci-generic 0000:01:00.0: failed to suspend device: -16
> 
> [...]
> 
> > I didn't try the v1 or v2 patches (sorry!), so not sure if the issues
> > come in this last iteration or in an earlier one. Do you want me to
> > try with v1 and v2 as well?
> > 
> 
> Yes, please. Nothing changed other than moving the BHI_INTVEC programming.
> 

Or if you want to do it quickly, please test the diff below:

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index ee0515a25e46..21484a61bbed 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -1055,7 +1055,9 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
        mutex_lock(&mhi_cntrl->pm_mutex);
        mhi_cntrl->pm_state = MHI_PM_DISABLE;
 
+       /* Setup BHI INTVEC */
        write_lock_irq(&mhi_cntrl->pm_lock);
+       mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
        mhi_cntrl->pm_state = MHI_PM_POR;
        mhi_cntrl->ee = MHI_EE_MAX;
        current_ee = mhi_get_exec_env(mhi_cntrl);
@@ -1094,9 +1096,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
        if (ret)
                goto error_setup_irq;
 
-       /* Setup BHI INTVEC */
-       mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
-
        /* Transition to next state */
        next_state = MHI_IN_PBL(current_ee) ?
                DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;

Thanks,
Mani

> Thanks,
> Mani
> 
> > The patch that was working very reliably (100%) for me was the "bus:
> > mhi: Register IRQ handler after SYS_ERR check during power up" one,
> > which you attached here:
> > https://www.spinics.net/lists/linux-arm-msm/msg97646.html
> > 
> > -- 
> > Aleksander
> > https://aleksander.es

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up
  2021-11-20 11:36     ` Manivannan Sadhasivam
@ 2021-11-22 13:13       ` Aleksander Morgado
  2021-11-22 15:27         ` Manivannan Sadhasivam
  0 siblings, 1 reply; 6+ messages in thread
From: Aleksander Morgado @ 2021-11-22 13:13 UTC (permalink / raw)
  To: Manivannan Sadhasivam
  Cc: mhi, Loic Poulain, Thomas Perrot, Hemant Kumar, Bhaumik Bhatt,
	quic_jhugo, linux-arm-msm, stable

Hey Mani,

> > > >
> > > > Some devices tend to trigger SYS_ERR interrupt while the host handling
> > > > SYS_ERR state of the device during power up. This creates a race
> > > > condition and causes a failure in booting up the device.
> > > >
> > > > The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
> > > > handling in mhi_async_power_up(). Once the host detects that the device
> > > > is in SYS_ERR state, it issues MHI_RESET and waits for the device to
> > > > process the reset request. During this time, the device triggers SYS_ERR
> > > > interrupt to the host and host starts handling SYS_ERR execution.
> > > >
> > > > So by the time the device has completed reset, host starts SYS_ERR
> > > > handling. This causes the race condition and the modem fails to boot.
> > > >
> > > > Hence, register the IRQ handler only after handling the SYS_ERR check
> > > > to avoid getting spurious IRQs from the device.
> > > >
> > > > Cc: stable@vger.kernel.org
> > > > Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
> > > > Reported-by: Aleksander Morgado <aleksander@aleksander.es>
> > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > ---
> > > >
> > > > Changes in v3:
> > > >
> > > > * Moved BHI_INTVEC setup after irq setup
> > > > * Used interval_us as the delay for the polling API
> > > >
> > > > Changes in v2:
> > > >
> > > > * Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
> > > >
> > >
> > > I tried this v3 patch and I'm not sure if it's working properly in my
> > > setup; not all boots are successfully bringing the modem up.
> > >
> >
> > Ouch!
> >
> > > Once I installed it, I kept having this kind of logs on every boot:
> > > [    7.030407] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
> > > 0x600000000-0x600000fff 64bit]
> > > [    7.038984] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
> > > [    7.045814] mhi-pci-generic 0000:01:00.0: using shared MSI
> > > [    7.052191] mhi mhi0: Requested to power ON
> > > [    7.168042] mhi mhi0: Power on setup success
> > > [    7.168141] mhi mhi0: Wait for device to enter SBL or Mission mode
> > > [   15.687938] mhi-pci-generic 0000:01:00.0: failed to suspend device: -16
> >
> > [...]
> >
> > > I didn't try the v1 or v2 patches (sorry!), so not sure if the issues
> > > come in this last iteration or in an earlier one. Do you want me to
> > > try with v1 and v2 as well?
> > >
> >
> > Yes, please. Nothing changed other than moving the BHI_INTVEC programming.
> >
>
> Or if you want to do it quickly, please test the diff below:
>
> diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> index ee0515a25e46..21484a61bbed 100644
> --- a/drivers/bus/mhi/core/pm.c
> +++ b/drivers/bus/mhi/core/pm.c
> @@ -1055,7 +1055,9 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
>         mutex_lock(&mhi_cntrl->pm_mutex);
>         mhi_cntrl->pm_state = MHI_PM_DISABLE;
>
> +       /* Setup BHI INTVEC */
>         write_lock_irq(&mhi_cntrl->pm_lock);
> +       mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
>         mhi_cntrl->pm_state = MHI_PM_POR;
>         mhi_cntrl->ee = MHI_EE_MAX;
>         current_ee = mhi_get_exec_env(mhi_cntrl);
> @@ -1094,9 +1096,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
>         if (ret)
>                 goto error_setup_irq;
>
> -       /* Setup BHI INTVEC */
> -       mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
> -
>         /* Transition to next state */
>         next_state = MHI_IN_PBL(current_ee) ?
>                 DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
>

I tested that additional diff on top of v3, and so far so good; I did
5 soft reboots and 5 hard boots and they were all successful.

-- 
Aleksander
https://aleksander.es

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up
  2021-11-22 13:13       ` Aleksander Morgado
@ 2021-11-22 15:27         ` Manivannan Sadhasivam
  0 siblings, 0 replies; 6+ messages in thread
From: Manivannan Sadhasivam @ 2021-11-22 15:27 UTC (permalink / raw)
  To: Aleksander Morgado
  Cc: mhi, Loic Poulain, Thomas Perrot, Hemant Kumar, Bhaumik Bhatt,
	quic_jhugo, linux-arm-msm, stable

hey,

On Mon, Nov 22, 2021 at 02:13:08PM +0100, Aleksander Morgado wrote:
> Hey Mani,
> 
> > > > >
> > > > > Some devices tend to trigger SYS_ERR interrupt while the host handling
> > > > > SYS_ERR state of the device during power up. This creates a race
> > > > > condition and causes a failure in booting up the device.
> > > > >
> > > > > The issue is seen on the Sierra Wireless EM9191 modem during SYS_ERR
> > > > > handling in mhi_async_power_up(). Once the host detects that the device
> > > > > is in SYS_ERR state, it issues MHI_RESET and waits for the device to
> > > > > process the reset request. During this time, the device triggers SYS_ERR
> > > > > interrupt to the host and host starts handling SYS_ERR execution.
> > > > >
> > > > > So by the time the device has completed reset, host starts SYS_ERR
> > > > > handling. This causes the race condition and the modem fails to boot.
> > > > >
> > > > > Hence, register the IRQ handler only after handling the SYS_ERR check
> > > > > to avoid getting spurious IRQs from the device.
> > > > >
> > > > > Cc: stable@vger.kernel.org
> > > > > Fixes: e18d4e9fa79b ("bus: mhi: core: Handle syserr during power_up")
> > > > > Reported-by: Aleksander Morgado <aleksander@aleksander.es>
> > > > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> > > > > ---
> > > > >
> > > > > Changes in v3:
> > > > >
> > > > > * Moved BHI_INTVEC setup after irq setup
> > > > > * Used interval_us as the delay for the polling API
> > > > >
> > > > > Changes in v2:
> > > > >
> > > > > * Switched to "mhi_poll_reg_field" for detecting MHI reset in device.
> > > > >
> > > >
> > > > I tried this v3 patch and I'm not sure if it's working properly in my
> > > > setup; not all boots are successfully bringing the modem up.
> > > >
> > >
> > > Ouch!
> > >
> > > > Once I installed it, I kept having this kind of logs on every boot:
> > > > [    7.030407] mhi-pci-generic 0000:01:00.0: BAR 0: assigned [mem
> > > > 0x600000000-0x600000fff 64bit]
> > > > [    7.038984] mhi-pci-generic 0000:01:00.0: enabling device (0000 -> 0002)
> > > > [    7.045814] mhi-pci-generic 0000:01:00.0: using shared MSI
> > > > [    7.052191] mhi mhi0: Requested to power ON
> > > > [    7.168042] mhi mhi0: Power on setup success
> > > > [    7.168141] mhi mhi0: Wait for device to enter SBL or Mission mode
> > > > [   15.687938] mhi-pci-generic 0000:01:00.0: failed to suspend device: -16
> > >
> > > [...]
> > >
> > > > I didn't try the v1 or v2 patches (sorry!), so not sure if the issues
> > > > come in this last iteration or in an earlier one. Do you want me to
> > > > try with v1 and v2 as well?
> > > >
> > >
> > > Yes, please. Nothing changed other than moving the BHI_INTVEC programming.
> > >
> >
> > Or if you want to do it quickly, please test the diff below:
> >
> > diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
> > index ee0515a25e46..21484a61bbed 100644
> > --- a/drivers/bus/mhi/core/pm.c
> > +++ b/drivers/bus/mhi/core/pm.c
> > @@ -1055,7 +1055,9 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
> >         mutex_lock(&mhi_cntrl->pm_mutex);
> >         mhi_cntrl->pm_state = MHI_PM_DISABLE;
> >
> > +       /* Setup BHI INTVEC */
> >         write_lock_irq(&mhi_cntrl->pm_lock);
> > +       mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
> >         mhi_cntrl->pm_state = MHI_PM_POR;
> >         mhi_cntrl->ee = MHI_EE_MAX;
> >         current_ee = mhi_get_exec_env(mhi_cntrl);
> > @@ -1094,9 +1096,6 @@ int mhi_async_power_up(struct mhi_controller *mhi_cntrl)
> >         if (ret)
> >                 goto error_setup_irq;
> >
> > -       /* Setup BHI INTVEC */
> > -       mhi_write_reg(mhi_cntrl, mhi_cntrl->bhi, BHI_INTVEC, 0);
> > -
> >         /* Transition to next state */
> >         next_state = MHI_IN_PBL(current_ee) ?
> >                 DEV_ST_TRANSITION_PBL : DEV_ST_TRANSITION_READY;
> >
> 
> I tested that additional diff on top of v3, and so far so good; I did
> 5 soft reboots and 5 hard boots and they were all successful.
>

Great! Thanks for the testing. I'll post v4 with this change.
Please spare some time in testing that also :)

Thanks,
Mani
 
> -- 
> Aleksander
> https://aleksander.es

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-11-22 15:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-18  5:57 [PATCH v3] bus: mhi: Fix race while handling SYS_ERR at power up Manivannan Sadhasivam
2021-11-18  9:55 ` Aleksander Morgado
2021-11-20 11:25   ` Manivannan Sadhasivam
2021-11-20 11:36     ` Manivannan Sadhasivam
2021-11-22 13:13       ` Aleksander Morgado
2021-11-22 15:27         ` Manivannan Sadhasivam

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).