All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] platform/x86: intel_scu_ipc: Timeout fixes
@ 2023-09-11 19:39 Stephen Boyd
  2023-09-11 19:39 ` [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() Stephen Boyd
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Stephen Boyd @ 2023-09-11 19:39 UTC (permalink / raw)
  To: Mika Westerberg, Hans de Goede, Mark Gross
  Cc: linux-kernel, patches, platform-driver-x86, Andy Shevchenko,
	Kuppuswamy Sathyanarayanan, Prashant Malani

I recently looked at some crash reports on ChromeOS devices that call
into this intel_scu_ipc driver. They were hitting timeouts, and it
certainly looks possible for those timeouts to be triggering because of
scheduling issues. Once things started going south, the timeouts kept
coming. Maybe that's because the other side got seriously confused? I
don't know.

I added some sleeps to these paths to trigger the timeout behavior to
make sure the code works. Simply sleeping for a long time in busy_loop()
hits the timeout, which could happen if the system is scheduling lots of
other things at the time.

I couldn't really test the third patch because forcing a timeout or
returning immediately wasn't fast enough to trigger the second
transaction to run into the first one being processed.

Changes from v2 (https://lore.kernel.org/r/20230906180944.2197111-1-swboyd@chromium.org):
 * Use read_poll_timeout() helper in patch #1 (again)
 * New patch #3 to fix bug pointed out by Andy
 * Consolidate more code into busy check in patch #4

Changes from v1 (https://lore.kernel.org/r/20230831011405.3246849-1-swboyd@chromium.org):
 * Don't use read_poll_timeout() helper in patch 1, just add code
 * Rewrite patch 2 to be simpler
 * Make intel_scu_ipc_busy() return -EBUSY when busy
 * Downgrade dev_err() to dev_dbg() in intel_scu_ipc_busy()

Stephen Boyd (4):
  platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  platform/x86: intel_scu_ipc: Check status upon timeout in
    ipc_wait_for_interrupt()
  platform/x86: intel_scu_ipc: Don't override scu in
    intel_scu_ipc_dev_simple_command()
  platform/x86: intel_scu_ipc: Fail IPC send if still busy

 drivers/platform/x86/intel_scu_ipc.c | 66 +++++++++++++++++-----------
 1 file changed, 40 insertions(+), 26 deletions(-)

Cc: Prashant Malani <pmalani@chromium.org>

base-commit: 2dde18cd1d8fac735875f2e4987f11817cc0bc2c
-- 
https://chromeos.dev


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  2023-09-11 19:39 [PATCH v3 0/4] platform/x86: intel_scu_ipc: Timeout fixes Stephen Boyd
@ 2023-09-11 19:39 ` Stephen Boyd
  2023-09-11 21:17   ` Andy Shevchenko
                     ` (2 more replies)
  2023-09-11 19:39 ` [PATCH v3 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt() Stephen Boyd
                   ` (2 subsequent siblings)
  3 siblings, 3 replies; 15+ messages in thread
From: Stephen Boyd @ 2023-09-11 19:39 UTC (permalink / raw)
  To: Mika Westerberg, Hans de Goede, Mark Gross
  Cc: linux-kernel, patches, platform-driver-x86, Andy Shevchenko,
	Kuppuswamy Sathyanarayanan, Prashant Malani

It's possible for the polling loop in busy_loop() to get scheduled away
for a long time.

  status = ipc_read_status(scu); // status = IPC_STATUS_BUSY
  <long time scheduled away>
  if (!(status & IPC_STATUS_BUSY))

If this happens, then the status bit could change while the task is
scheduled away and this function would never read the status again after
timing out. Instead, the function will return -ETIMEDOUT when it's
possible that scheduling didn't work out and the status bit was cleared.
Bit polling code should always check the bit being polled one more time
after the timeout in case this happens.

Fix this by reading the status once more after the while loop breaks.
The read_poll_timeout() macro implements all of this, and it is
shorter, so use that macro here to consolidate code and fix this.

There were some concerns with using read_poll_timeout() because it uses
timekeeping, and timekeeping isn't running early on or during the late
stages of system suspend or early stages of system resume, but an audit
of the code concluded that this code isn't called during those times so
it is safe to use the macro.

Cc: Prashant Malani <pmalani@chromium.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Fixes: e7b7ab3847c9 ("platform/x86: intel_scu_ipc: Sleeping is fine when polling")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---
 drivers/platform/x86/intel_scu_ipc.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c
index 6851d10d6582..5a37becc65aa 100644
--- a/drivers/platform/x86/intel_scu_ipc.c
+++ b/drivers/platform/x86/intel_scu_ipc.c
@@ -19,6 +19,7 @@
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
+#include <linux/iopoll.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 
@@ -231,19 +232,15 @@ static inline u32 ipc_data_readl(struct intel_scu_ipc_dev *scu, u32 offset)
 /* Wait till scu status is busy */
 static inline int busy_loop(struct intel_scu_ipc_dev *scu)
 {
-	unsigned long end = jiffies + IPC_TIMEOUT;
+	u8 status;
+	int err;
 
-	do {
-		u32 status;
+	err = read_poll_timeout(ipc_read_status, status, !(status & IPC_STATUS_BUSY),
+				100, jiffies_to_usecs(IPC_TIMEOUT), false, scu);
+	if (err)
+		return err;
 
-		status = ipc_read_status(scu);
-		if (!(status & IPC_STATUS_BUSY))
-			return (status & IPC_STATUS_ERR) ? -EIO : 0;
-
-		usleep_range(50, 100);
-	} while (time_before(jiffies, end));
-
-	return -ETIMEDOUT;
+	return (status & IPC_STATUS_ERR) ? -EIO : 0;
 }
 
 /* Wait till ipc ioc interrupt is received or timeout in 10 HZ */
-- 
https://chromeos.dev


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()
  2023-09-11 19:39 [PATCH v3 0/4] platform/x86: intel_scu_ipc: Timeout fixes Stephen Boyd
  2023-09-11 19:39 ` [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() Stephen Boyd
@ 2023-09-11 19:39 ` Stephen Boyd
  2023-09-12  5:00   ` Mika Westerberg
  2023-09-11 19:39 ` [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command() Stephen Boyd
  2023-09-11 19:39 ` [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy Stephen Boyd
  3 siblings, 1 reply; 15+ messages in thread
From: Stephen Boyd @ 2023-09-11 19:39 UTC (permalink / raw)
  To: Mika Westerberg, Hans de Goede, Mark Gross
  Cc: linux-kernel, patches, platform-driver-x86, Andy Shevchenko,
	Kuppuswamy Sathyanarayanan, Prashant Malani

It's possible for the completion in ipc_wait_for_interrupt() to timeout,
simply because the interrupt was delayed in being processed. A timeout
in itself is not an error. This driver should check the status register
upon a timeout to ensure that scheduling or interrupt processing delays
don't affect the outcome of the IPC return value.

 CPU0                                                   SCU
 ----                                                   ---
 ipc_wait_for_interrupt()
  wait_for_completion_timeout(&scu->cmd_complete)
  [TIMEOUT]                                             status[IPC_STATUS_BUSY]=0

Fix this problem by reading the status bit in all cases, regardless of
the timeout. If the completion times out, we'll assume the problem was
that the IPC_STATUS_BUSY bit was still set, but if the status bit is
cleared in the meantime we know that we hit some scheduling delay and we
should just check the error bit.

Cc: Prashant Malani <pmalani@chromium.org>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Fixes: ed12f295bfd5 ("ipc: Added support for IPC interrupt mode")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---
 drivers/platform/x86/intel_scu_ipc.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c
index 5a37becc65aa..8be1686e22e9 100644
--- a/drivers/platform/x86/intel_scu_ipc.c
+++ b/drivers/platform/x86/intel_scu_ipc.c
@@ -248,10 +248,12 @@ static inline int ipc_wait_for_interrupt(struct intel_scu_ipc_dev *scu)
 {
 	int status;
 
-	if (!wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT))
-		return -ETIMEDOUT;
+	wait_for_completion_timeout(&scu->cmd_complete, IPC_TIMEOUT);
 
 	status = ipc_read_status(scu);
+	if (status & IPC_STATUS_BUSY)
+		return -ETIMEDOUT;
+
 	if (status & IPC_STATUS_ERR)
 		return -EIO;
 
-- 
https://chromeos.dev


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command()
  2023-09-11 19:39 [PATCH v3 0/4] platform/x86: intel_scu_ipc: Timeout fixes Stephen Boyd
  2023-09-11 19:39 ` [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() Stephen Boyd
  2023-09-11 19:39 ` [PATCH v3 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt() Stephen Boyd
@ 2023-09-11 19:39 ` Stephen Boyd
  2023-09-11 21:18   ` Andy Shevchenko
  2023-09-12  5:01   ` Mika Westerberg
  2023-09-11 19:39 ` [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy Stephen Boyd
  3 siblings, 2 replies; 15+ messages in thread
From: Stephen Boyd @ 2023-09-11 19:39 UTC (permalink / raw)
  To: Mika Westerberg, Hans de Goede, Mark Gross
  Cc: linux-kernel, patches, platform-driver-x86, Andy Shevchenko,
	Kuppuswamy Sathyanarayanan, Prashant Malani

Andy discovered this bug during patch review. The 'scu' argument to this
function shouldn't be overridden by the function itself. It doesn't make
any sense. Looking at the commit history, we see that commit
f57fa18583f5 ("platform/x86: intel_scu_ipc: Introduce new SCU IPC API")
removed the setting of the scu to ipcdev in other functions, but not
this one. That was an oversight. Remove this line so that we stop
overriding the scu instance that is used by this function.

Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Closes: https://lore.kernel.org/r/ZPjdZ3xNmBEBvNiS@smile.fi.intel.com
Cc: Prashant Malani <pmalani@chromium.org>
Fixes: f57fa18583f5 ("platform/x86: intel_scu_ipc: Introduce new SCU IPC API")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---
 drivers/platform/x86/intel_scu_ipc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c
index 8be1686e22e9..6958265db29d 100644
--- a/drivers/platform/x86/intel_scu_ipc.c
+++ b/drivers/platform/x86/intel_scu_ipc.c
@@ -443,7 +443,6 @@ int intel_scu_ipc_dev_simple_command(struct intel_scu_ipc_dev *scu, int cmd,
 		mutex_unlock(&ipclock);
 		return -ENODEV;
 	}
-	scu = ipcdev;
 	cmdval = sub << 12 | cmd;
 	ipc_command(scu, cmdval);
 	err = intel_scu_ipc_check_status(scu);
-- 
https://chromeos.dev


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy
  2023-09-11 19:39 [PATCH v3 0/4] platform/x86: intel_scu_ipc: Timeout fixes Stephen Boyd
                   ` (2 preceding siblings ...)
  2023-09-11 19:39 ` [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command() Stephen Boyd
@ 2023-09-11 19:39 ` Stephen Boyd
  2023-09-11 21:22   ` Andy Shevchenko
  2023-09-12  5:01   ` Mika Westerberg
  3 siblings, 2 replies; 15+ messages in thread
From: Stephen Boyd @ 2023-09-11 19:39 UTC (permalink / raw)
  To: Mika Westerberg, Hans de Goede, Mark Gross
  Cc: linux-kernel, patches, platform-driver-x86, Andy Shevchenko,
	Kuppuswamy Sathyanarayanan, Prashant Malani

It's possible for interrupts to get significantly delayed to the point
that callers of intel_scu_ipc_dev_command() and friends can call the
function once, hit a timeout, and call it again while the interrupt
still hasn't been processed. This driver will get seriously confused if
the interrupt is finally processed after the second IPC has been sent
with ipc_command(). It won't know which IPC has been completed. This
could be quite disastrous if calling code assumes something has happened
upon return from intel_scu_ipc_dev_simple_command() when it actually
hasn't.

Let's avoid this scenario by simply returning -EBUSY in this case.
Hopefully higher layers will know to back off or fail gracefully when
this happens. It's all highly unlikely anyway, but it's better to be
correct here as we have no way to know which IPC the status register is
telling us about if we send a second IPC while the previous IPC is still
processing.

Cc: Prashant Malani <pmalani@chromium.org>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Fixes: ed12f295bfd5 ("ipc: Added support for IPC interrupt mode")
Signed-off-by: Stephen Boyd <swboyd@chromium.org>
---
 drivers/platform/x86/intel_scu_ipc.c | 40 +++++++++++++++++++---------
 1 file changed, 28 insertions(+), 12 deletions(-)

diff --git a/drivers/platform/x86/intel_scu_ipc.c b/drivers/platform/x86/intel_scu_ipc.c
index 6958265db29d..c5b15450598e 100644
--- a/drivers/platform/x86/intel_scu_ipc.c
+++ b/drivers/platform/x86/intel_scu_ipc.c
@@ -265,6 +265,24 @@ static int intel_scu_ipc_check_status(struct intel_scu_ipc_dev *scu)
 	return scu->irq > 0 ? ipc_wait_for_interrupt(scu) : busy_loop(scu);
 }
 
+static struct intel_scu_ipc_dev *intel_scu_ipc_get(struct intel_scu_ipc_dev *scu)
+{
+	u8 status;
+
+	if (!scu)
+		scu = ipcdev;
+	if (!scu)
+		return ERR_PTR(-ENODEV);
+
+	status = ipc_read_status(scu);
+	if (status & IPC_STATUS_BUSY) {
+		dev_dbg(&scu->dev, "device is busy\n");
+		return ERR_PTR(-EBUSY);
+	}
+
+	return scu;
+}
+
 /* Read/Write power control(PMIC in Langwell, MSIC in PenWell) registers */
 static int pwr_reg_rdwr(struct intel_scu_ipc_dev *scu, u16 *addr, u8 *data,
 			u32 count, u32 op, u32 id)
@@ -278,11 +296,10 @@ static int pwr_reg_rdwr(struct intel_scu_ipc_dev *scu, u16 *addr, u8 *data,
 	memset(cbuf, 0, sizeof(cbuf));
 
 	mutex_lock(&ipclock);
-	if (!scu)
-		scu = ipcdev;
-	if (!scu) {
+	scu = intel_scu_ipc_get(scu);
+	if (IS_ERR(scu)) {
 		mutex_unlock(&ipclock);
-		return -ENODEV;
+		return PTR_ERR(scu);
 	}
 
 	for (nc = 0; nc < count; nc++, offset += 2) {
@@ -437,12 +454,12 @@ int intel_scu_ipc_dev_simple_command(struct intel_scu_ipc_dev *scu, int cmd,
 	int err;
 
 	mutex_lock(&ipclock);
-	if (!scu)
-		scu = ipcdev;
-	if (!scu) {
+	scu = intel_scu_ipc_get(scu);
+	if (IS_ERR(scu)) {
 		mutex_unlock(&ipclock);
-		return -ENODEV;
+		return PTR_ERR(scu);
 	}
+
 	cmdval = sub << 12 | cmd;
 	ipc_command(scu, cmdval);
 	err = intel_scu_ipc_check_status(scu);
@@ -482,11 +499,10 @@ int intel_scu_ipc_dev_command_with_size(struct intel_scu_ipc_dev *scu, int cmd,
 		return -EINVAL;
 
 	mutex_lock(&ipclock);
-	if (!scu)
-		scu = ipcdev;
-	if (!scu) {
+	scu = intel_scu_ipc_get(scu);
+	if (IS_ERR(scu)) {
 		mutex_unlock(&ipclock);
-		return -ENODEV;
+		return PTR_ERR(scu);
 	}
 
 	memcpy(inbuf, in, inlen);
-- 
https://chromeos.dev


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  2023-09-11 19:39 ` [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() Stephen Boyd
@ 2023-09-11 21:17   ` Andy Shevchenko
  2023-09-11 21:19     ` Andy Shevchenko
  2023-09-11 21:41     ` Stephen Boyd
  2023-09-12  5:02   ` Mika Westerberg
  2023-09-12  5:04   ` Kuppuswamy Sathyanarayanan
  2 siblings, 2 replies; 15+ messages in thread
From: Andy Shevchenko @ 2023-09-11 21:17 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mika Westerberg, Hans de Goede, Mark Gross, linux-kernel,
	patches, platform-driver-x86, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Mon, Sep 11, 2023 at 12:39:33PM -0700, Stephen Boyd wrote:
> It's possible for the polling loop in busy_loop() to get scheduled away
> for a long time.
> 
>   status = ipc_read_status(scu); // status = IPC_STATUS_BUSY
>   <long time scheduled away>
>   if (!(status & IPC_STATUS_BUSY))
> 
> If this happens, then the status bit could change while the task is
> scheduled away and this function would never read the status again after
> timing out. Instead, the function will return -ETIMEDOUT when it's
> possible that scheduling didn't work out and the status bit was cleared.
> Bit polling code should always check the bit being polled one more time
> after the timeout in case this happens.
> 
> Fix this by reading the status once more after the while loop breaks.
> The read_poll_timeout() macro implements all of this, and it is
> shorter, so use that macro here to consolidate code and fix this.
> 
> There were some concerns with using read_poll_timeout() because it uses
> timekeeping, and timekeeping isn't running early on or during the late
> stages of system suspend or early stages of system resume, but an audit
> of the code concluded that this code isn't called during those times so
> it is safe to use the macro.

...

> +	err = read_poll_timeout(ipc_read_status, status, !(status & IPC_STATUS_BUSY),
> +				100, jiffies_to_usecs(IPC_TIMEOUT), false, scu);

Since "false" you probably can utilize readx_poll_timeout().

> +	if (err)
> +		return err;
>  
-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command()
  2023-09-11 19:39 ` [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command() Stephen Boyd
@ 2023-09-11 21:18   ` Andy Shevchenko
  2023-09-12  5:01   ` Mika Westerberg
  1 sibling, 0 replies; 15+ messages in thread
From: Andy Shevchenko @ 2023-09-11 21:18 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mika Westerberg, Hans de Goede, Mark Gross, linux-kernel,
	patches, platform-driver-x86, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Mon, Sep 11, 2023 at 12:39:35PM -0700, Stephen Boyd wrote:
> Andy discovered this bug during patch review. The 'scu' argument to this
> function shouldn't be overridden by the function itself. It doesn't make
> any sense. Looking at the commit history, we see that commit
> f57fa18583f5 ("platform/x86: intel_scu_ipc: Introduce new SCU IPC API")
> removed the setting of the scu to ipcdev in other functions, but not
> this one. That was an oversight. Remove this line so that we stop
> overriding the scu instance that is used by this function.

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  2023-09-11 21:17   ` Andy Shevchenko
@ 2023-09-11 21:19     ` Andy Shevchenko
  2023-09-11 21:41     ` Stephen Boyd
  1 sibling, 0 replies; 15+ messages in thread
From: Andy Shevchenko @ 2023-09-11 21:19 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mika Westerberg, Hans de Goede, Mark Gross, linux-kernel,
	patches, platform-driver-x86, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Tue, Sep 12, 2023 at 12:17:22AM +0300, Andy Shevchenko wrote:
> On Mon, Sep 11, 2023 at 12:39:33PM -0700, Stephen Boyd wrote:

...

> > +	err = read_poll_timeout(ipc_read_status, status, !(status & IPC_STATUS_BUSY),
> > +				100, jiffies_to_usecs(IPC_TIMEOUT), false, scu);
> 
> Since "false" you probably can utilize readx_poll_timeout().

...and because only a single parameter taken.

> > +	if (err)
> > +		return err;

With that,
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy
  2023-09-11 19:39 ` [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy Stephen Boyd
@ 2023-09-11 21:22   ` Andy Shevchenko
  2023-09-12  5:01   ` Mika Westerberg
  1 sibling, 0 replies; 15+ messages in thread
From: Andy Shevchenko @ 2023-09-11 21:22 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Mika Westerberg, Hans de Goede, Mark Gross, linux-kernel,
	patches, platform-driver-x86, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Mon, Sep 11, 2023 at 12:39:36PM -0700, Stephen Boyd wrote:
> It's possible for interrupts to get significantly delayed to the point
> that callers of intel_scu_ipc_dev_command() and friends can call the
> function once, hit a timeout, and call it again while the interrupt
> still hasn't been processed. This driver will get seriously confused if
> the interrupt is finally processed after the second IPC has been sent
> with ipc_command(). It won't know which IPC has been completed. This
> could be quite disastrous if calling code assumes something has happened
> upon return from intel_scu_ipc_dev_simple_command() when it actually
> hasn't.
> 
> Let's avoid this scenario by simply returning -EBUSY in this case.
> Hopefully higher layers will know to back off or fail gracefully when
> this happens. It's all highly unlikely anyway, but it's better to be
> correct here as we have no way to know which IPC the status register is
> telling us about if we send a second IPC while the previous IPC is still
> processing.

...

> +static struct intel_scu_ipc_dev *intel_scu_ipc_get(struct intel_scu_ipc_dev *scu)
> +{
> +	u8 status;

> +	if (!scu)
> +		scu = ipcdev;

I would write this as

	scu = scu ?: ipcdev;

> +	if (!scu)
> +		return ERR_PTR(-ENODEV);
> +
> +	status = ipc_read_status(scu);
> +	if (status & IPC_STATUS_BUSY) {
> +		dev_dbg(&scu->dev, "device is busy\n");
> +		return ERR_PTR(-EBUSY);
> +	}
> +
> +	return scu;
> +}

Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  2023-09-11 21:17   ` Andy Shevchenko
  2023-09-11 21:19     ` Andy Shevchenko
@ 2023-09-11 21:41     ` Stephen Boyd
  1 sibling, 0 replies; 15+ messages in thread
From: Stephen Boyd @ 2023-09-11 21:41 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Mika Westerberg, Hans de Goede, Mark Gross, linux-kernel,
	patches, platform-driver-x86, Kuppuswamy Sathyanarayanan,
	Prashant Malani

Quoting Andy Shevchenko (2023-09-11 14:17:22)
> On Mon, Sep 11, 2023 at 12:39:33PM -0700, Stephen Boyd wrote:
> > It's possible for the polling loop in busy_loop() to get scheduled away
> > for a long time.
> >
> >   status = ipc_read_status(scu); // status = IPC_STATUS_BUSY
> >   <long time scheduled away>
> >   if (!(status & IPC_STATUS_BUSY))
> >
> > If this happens, then the status bit could change while the task is
> > scheduled away and this function would never read the status again after
> > timing out. Instead, the function will return -ETIMEDOUT when it's
> > possible that scheduling didn't work out and the status bit was cleared.
> > Bit polling code should always check the bit being polled one more time
> > after the timeout in case this happens.
> >
> > Fix this by reading the status once more after the while loop breaks.
> > The read_poll_timeout() macro implements all of this, and it is
> > shorter, so use that macro here to consolidate code and fix this.
> >
> > There were some concerns with using read_poll_timeout() because it uses
> > timekeeping, and timekeeping isn't running early on or during the late
> > stages of system suspend or early stages of system resume, but an audit
> > of the code concluded that this code isn't called during those times so
> > it is safe to use the macro.
>
> ...
>
> > +     err = read_poll_timeout(ipc_read_status, status, !(status & IPC_STATUS_BUSY),
> > +                             100, jiffies_to_usecs(IPC_TIMEOUT), false, scu);
>
> Since "false" you probably can utilize readx_poll_timeout().
>

You mean 'addr' will be 'scu'? Ok.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt()
  2023-09-11 19:39 ` [PATCH v3 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt() Stephen Boyd
@ 2023-09-12  5:00   ` Mika Westerberg
  0 siblings, 0 replies; 15+ messages in thread
From: Mika Westerberg @ 2023-09-12  5:00 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Hans de Goede, Mark Gross, linux-kernel, patches,
	platform-driver-x86, Andy Shevchenko, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Mon, Sep 11, 2023 at 12:39:34PM -0700, Stephen Boyd wrote:
> It's possible for the completion in ipc_wait_for_interrupt() to timeout,
> simply because the interrupt was delayed in being processed. A timeout
> in itself is not an error. This driver should check the status register
> upon a timeout to ensure that scheduling or interrupt processing delays
> don't affect the outcome of the IPC return value.
> 
>  CPU0                                                   SCU
>  ----                                                   ---
>  ipc_wait_for_interrupt()
>   wait_for_completion_timeout(&scu->cmd_complete)
>   [TIMEOUT]                                             status[IPC_STATUS_BUSY]=0
> 
> Fix this problem by reading the status bit in all cases, regardless of
> the timeout. If the completion times out, we'll assume the problem was
> that the IPC_STATUS_BUSY bit was still set, but if the status bit is
> cleared in the meantime we know that we hit some scheduling delay and we
> should just check the error bit.
> 
> Cc: Prashant Malani <pmalani@chromium.org>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command()
  2023-09-11 19:39 ` [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command() Stephen Boyd
  2023-09-11 21:18   ` Andy Shevchenko
@ 2023-09-12  5:01   ` Mika Westerberg
  1 sibling, 0 replies; 15+ messages in thread
From: Mika Westerberg @ 2023-09-12  5:01 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Hans de Goede, Mark Gross, linux-kernel, patches,
	platform-driver-x86, Andy Shevchenko, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Mon, Sep 11, 2023 at 12:39:35PM -0700, Stephen Boyd wrote:
> Andy discovered this bug during patch review. The 'scu' argument to this
> function shouldn't be overridden by the function itself. It doesn't make
> any sense. Looking at the commit history, we see that commit
> f57fa18583f5 ("platform/x86: intel_scu_ipc: Introduce new SCU IPC API")
> removed the setting of the scu to ipcdev in other functions, but not
> this one. That was an oversight. Remove this line so that we stop
> overriding the scu instance that is used by this function.
> 
> Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Closes: https://lore.kernel.org/r/ZPjdZ3xNmBEBvNiS@smile.fi.intel.com
> Cc: Prashant Malani <pmalani@chromium.org>

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy
  2023-09-11 19:39 ` [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy Stephen Boyd
  2023-09-11 21:22   ` Andy Shevchenko
@ 2023-09-12  5:01   ` Mika Westerberg
  1 sibling, 0 replies; 15+ messages in thread
From: Mika Westerberg @ 2023-09-12  5:01 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Hans de Goede, Mark Gross, linux-kernel, patches,
	platform-driver-x86, Andy Shevchenko, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Mon, Sep 11, 2023 at 12:39:36PM -0700, Stephen Boyd wrote:
> It's possible for interrupts to get significantly delayed to the point
> that callers of intel_scu_ipc_dev_command() and friends can call the
> function once, hit a timeout, and call it again while the interrupt
> still hasn't been processed. This driver will get seriously confused if
> the interrupt is finally processed after the second IPC has been sent
> with ipc_command(). It won't know which IPC has been completed. This
> could be quite disastrous if calling code assumes something has happened
> upon return from intel_scu_ipc_dev_simple_command() when it actually
> hasn't.
> 
> Let's avoid this scenario by simply returning -EBUSY in this case.
> Hopefully higher layers will know to back off or fail gracefully when
> this happens. It's all highly unlikely anyway, but it's better to be
> correct here as we have no way to know which IPC the status register is
> telling us about if we send a second IPC while the previous IPC is still
> processing.
> 
> Cc: Prashant Malani <pmalani@chromium.org>
> Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  2023-09-11 19:39 ` [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() Stephen Boyd
  2023-09-11 21:17   ` Andy Shevchenko
@ 2023-09-12  5:02   ` Mika Westerberg
  2023-09-12  5:04   ` Kuppuswamy Sathyanarayanan
  2 siblings, 0 replies; 15+ messages in thread
From: Mika Westerberg @ 2023-09-12  5:02 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Hans de Goede, Mark Gross, linux-kernel, patches,
	platform-driver-x86, Andy Shevchenko, Kuppuswamy Sathyanarayanan,
	Prashant Malani

On Mon, Sep 11, 2023 at 12:39:33PM -0700, Stephen Boyd wrote:
> It's possible for the polling loop in busy_loop() to get scheduled away
> for a long time.
> 
>   status = ipc_read_status(scu); // status = IPC_STATUS_BUSY
>   <long time scheduled away>
>   if (!(status & IPC_STATUS_BUSY))
> 
> If this happens, then the status bit could change while the task is
> scheduled away and this function would never read the status again after
> timing out. Instead, the function will return -ETIMEDOUT when it's
> possible that scheduling didn't work out and the status bit was cleared.
> Bit polling code should always check the bit being polled one more time
> after the timeout in case this happens.
> 
> Fix this by reading the status once more after the while loop breaks.
> The read_poll_timeout() macro implements all of this, and it is
> shorter, so use that macro here to consolidate code and fix this.
> 
> There were some concerns with using read_poll_timeout() because it uses
> timekeeping, and timekeeping isn't running early on or during the late
> stages of system suspend or early stages of system resume, but an audit
> of the code concluded that this code isn't called during those times so
> it is safe to use the macro.
> 
> Cc: Prashant Malani <pmalani@chromium.org>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop()
  2023-09-11 19:39 ` [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() Stephen Boyd
  2023-09-11 21:17   ` Andy Shevchenko
  2023-09-12  5:02   ` Mika Westerberg
@ 2023-09-12  5:04   ` Kuppuswamy Sathyanarayanan
  2 siblings, 0 replies; 15+ messages in thread
From: Kuppuswamy Sathyanarayanan @ 2023-09-12  5:04 UTC (permalink / raw)
  To: Stephen Boyd, Mika Westerberg, Hans de Goede, Mark Gross
  Cc: linux-kernel, patches, platform-driver-x86, Andy Shevchenko,
	Prashant Malani

Hi,

On 9/11/2023 12:39 PM, Stephen Boyd wrote:
> It's possible for the polling loop in busy_loop() to get scheduled away
> for a long time.
> 
>   status = ipc_read_status(scu); // status = IPC_STATUS_BUSY
>   <long time scheduled away>
>   if (!(status & IPC_STATUS_BUSY))
> 
> If this happens, then the status bit could change while the task is
> scheduled away and this function would never read the status again after
> timing out. Instead, the function will return -ETIMEDOUT when it's
> possible that scheduling didn't work out and the status bit was cleared.
> Bit polling code should always check the bit being polled one more time
> after the timeout in case this happens.
> 
> Fix this by reading the status once more after the while loop breaks.
> The read_poll_timeout() macro implements all of this, and it is
> shorter, so use that macro here to consolidate code and fix this.
> 
> There were some concerns with using read_poll_timeout() because it uses
> timekeeping, and timekeeping isn't running early on or during the late
> stages of system suspend or early stages of system resume, but an audit
> of the code concluded that this code isn't called during those times so
> it is safe to use the macro.
> 
> Cc: Prashant Malani <pmalani@chromium.org>
> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> Fixes: e7b7ab3847c9 ("platform/x86: intel_scu_ipc: Sleeping is fine when polling")
> Signed-off-by: Stephen Boyd <swboyd@chromium.org>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>


-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2023-09-12  5:05 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 19:39 [PATCH v3 0/4] platform/x86: intel_scu_ipc: Timeout fixes Stephen Boyd
2023-09-11 19:39 ` [PATCH v3 1/4] platform/x86: intel_scu_ipc: Check status after timeout in busy_loop() Stephen Boyd
2023-09-11 21:17   ` Andy Shevchenko
2023-09-11 21:19     ` Andy Shevchenko
2023-09-11 21:41     ` Stephen Boyd
2023-09-12  5:02   ` Mika Westerberg
2023-09-12  5:04   ` Kuppuswamy Sathyanarayanan
2023-09-11 19:39 ` [PATCH v3 2/4] platform/x86: intel_scu_ipc: Check status upon timeout in ipc_wait_for_interrupt() Stephen Boyd
2023-09-12  5:00   ` Mika Westerberg
2023-09-11 19:39 ` [PATCH v3 3/4] platform/x86: intel_scu_ipc: Don't override scu in intel_scu_ipc_dev_simple_command() Stephen Boyd
2023-09-11 21:18   ` Andy Shevchenko
2023-09-12  5:01   ` Mika Westerberg
2023-09-11 19:39 ` [PATCH v3 4/4] platform/x86: intel_scu_ipc: Fail IPC send if still busy Stephen Boyd
2023-09-11 21:22   ` Andy Shevchenko
2023-09-12  5:01   ` Mika Westerberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.