All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] remoteproc: some bug fixes
@ 2020-02-28 18:33 Alex Elder
  2020-02-28 18:33 ` [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery() Alex Elder
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Alex Elder @ 2020-02-28 18:33 UTC (permalink / raw)
  To: Ohad Ben-Cohen, Bjorn Andersson, Andy Gross
  Cc: linux-remoteproc, linux-arm-msm, linux-kernel

This series fixes some bugs in the remoteproc code.
  - The first patch moves some checks of the remoteproc state inside
    the protection of the mutex.
  - The second just eliminates an unsafe state check before calling
    a function, because the function will make the same check safely.
  - The third does proper recovery in the q6v5_probe() error path.
  - The last returns EINVAL if an unrecognized value is written
    into the "recovery" debugfs file.

This series is available (based on v5.6-rc3) in branch "remoteproc-v1"
in this git repository:
  https://git.linaro.org/people/alex.elder/linux.git

					-Alex

Alex Elder (4):
  remoteproc: re-check state in rproc_trigger_recovery()
  remoteproc: remoteproc debugfs file fixes
  remoteproc: qcom: fix q6v5 probe error paths
  remoteproc: return error for bad "recovery" debugfs input

 drivers/remoteproc/qcom_q6v5_mss.c      | 23 +++++++++++++++--------
 drivers/remoteproc/remoteproc_core.c    | 12 ++++++++----
 drivers/remoteproc/remoteproc_debugfs.c | 14 +++++++-------
 3 files changed, 30 insertions(+), 19 deletions(-)

-- 
2.20.1

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery()
  2020-02-28 18:33 [PATCH 0/4] remoteproc: some bug fixes Alex Elder
@ 2020-02-28 18:33 ` Alex Elder
  2020-03-09 20:56   ` Mathieu Poirier
  2020-02-28 18:33 ` [PATCH 2/4] remoteproc: remoteproc debugfs file fixes Alex Elder
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Alex Elder @ 2020-02-28 18:33 UTC (permalink / raw)
  To: Ohad Ben-Cohen, Bjorn Andersson, Andy Gross
  Cc: linux-remoteproc, linux-arm-msm, linux-kernel

Two places call rproc_trigger_recovery():
  - rproc_crash_handler_work() sets rproc->state to CRASHED under
    protection of the mutex, then calls it if recovery is not
    disabled.  This function is called in workqueue context when
    scheduled in rproc_report_crash().
  - rproc_recovery_write() calls it in two spots, both of which
    the only call it if the rproc->state is CRASHED.

The mutex is taken right away in rproc_trigger_recovery().  However,
by the time the mutex is acquired, something else might have changed
rproc->state to something other than CRASHED.

The work that follows that is only appropriate for a remoteproc in
CRASHED state.  So check the state after acquiring the mutex, and
only proceed with the recovery work if the remoteproc is still in
CRASHED state.

Delay reporting that recovering has begun until after we hold the
mutex and we know the remote processor is in CRASHED state.

Signed-off-by: Alex Elder <elder@linaro.org>
---
 drivers/remoteproc/remoteproc_core.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 097f33e4f1f3..d327cb31d5c8 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -1653,12 +1653,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
 	struct device *dev = &rproc->dev;
 	int ret;
 
+	ret = mutex_lock_interruptible(&rproc->lock);
+	if (ret)
+		return ret;
+
+	/* State could have changed before we got the mutex */
+	if (rproc->state != RPROC_CRASHED)
+		goto unlock_mutex;
+
 	dev_err(dev, "recovering %s\n", rproc->name);
 
-	ret = mutex_lock_interruptible(&rproc->lock);
-	if (ret)
-		return ret;
-
 	ret = rproc_stop(rproc, true);
 	if (ret)
 		goto unlock_mutex;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] remoteproc: remoteproc debugfs file fixes
  2020-02-28 18:33 [PATCH 0/4] remoteproc: some bug fixes Alex Elder
  2020-02-28 18:33 ` [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery() Alex Elder
@ 2020-02-28 18:33 ` Alex Elder
  2020-02-28 18:33 ` [PATCH 3/4] remoteproc: qcom: fix q6v5 probe error paths Alex Elder
  2020-02-28 18:33 ` [PATCH 4/4] remoteproc: return error for bad "recovery" debugfs input Alex Elder
  3 siblings, 0 replies; 9+ messages in thread
From: Alex Elder @ 2020-02-28 18:33 UTC (permalink / raw)
  To: Ohad Ben-Cohen, Bjorn Andersson, Andy Gross
  Cc: linux-remoteproc, linux-arm-msm, linux-kernel

Don't bother checking the remoteproc state before calling
rproc_trigger_recovery() because that function will verify the
state, and the state can only be safely checked while holding the
mutex anyway.

Make the mode for "recovery" be writable.

Signed-off-by: Alex Elder <elder@linaro.org>
---
 drivers/remoteproc/remoteproc_debugfs.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_debugfs.c b/drivers/remoteproc/remoteproc_debugfs.c
index dd93cf04e17f..e995dc49c231 100644
--- a/drivers/remoteproc/remoteproc_debugfs.c
+++ b/drivers/remoteproc/remoteproc_debugfs.c
@@ -138,16 +138,14 @@ rproc_recovery_write(struct file *filp, const char __user *user_buf,
 		buf[count - 1] = '\0';
 
 	if (!strncmp(buf, "enabled", count)) {
+		/* change the flag and begin the recovery process if needed */
 		rproc->recovery_disabled = false;
-		/* if rproc has crashed, trigger recovery */
-		if (rproc->state == RPROC_CRASHED)
-			rproc_trigger_recovery(rproc);
+		rproc_trigger_recovery(rproc);
 	} else if (!strncmp(buf, "disabled", count)) {
 		rproc->recovery_disabled = true;
 	} else if (!strncmp(buf, "recover", count)) {
-		/* if rproc has crashed, trigger recovery */
-		if (rproc->state == RPROC_CRASHED)
-			rproc_trigger_recovery(rproc);
+		/* begin the recovery process without changing the flag */
+		rproc_trigger_recovery(rproc);
 	}
 
 	return count;
@@ -349,7 +347,7 @@ void rproc_create_debug_dir(struct rproc *rproc)
 
 	debugfs_create_file("name", 0400, rproc->dbg_dir,
 			    rproc, &rproc_name_ops);
-	debugfs_create_file("recovery", 0400, rproc->dbg_dir,
+	debugfs_create_file("recovery", 0600, rproc->dbg_dir,
 			    rproc, &rproc_recovery_ops);
 	debugfs_create_file("crash", 0200, rproc->dbg_dir,
 			    rproc, &rproc_crash_ops);
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] remoteproc: qcom: fix q6v5 probe error paths
  2020-02-28 18:33 [PATCH 0/4] remoteproc: some bug fixes Alex Elder
  2020-02-28 18:33 ` [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery() Alex Elder
  2020-02-28 18:33 ` [PATCH 2/4] remoteproc: remoteproc debugfs file fixes Alex Elder
@ 2020-02-28 18:33 ` Alex Elder
  2020-02-28 18:33 ` [PATCH 4/4] remoteproc: return error for bad "recovery" debugfs input Alex Elder
  3 siblings, 0 replies; 9+ messages in thread
From: Alex Elder @ 2020-02-28 18:33 UTC (permalink / raw)
  To: Ohad Ben-Cohen, Bjorn Andersson, Andy Gross
  Cc: linux-remoteproc, linux-arm-msm, linux-kernel

Remove the remoteproc subdevices if an error occurs after they have
been added.

Have cleanup activity done in q6v5_remove() be done in the reverse
order they are set up in q6v5_probe() (and the same order they're
done in the q6v5_probe() error path).  Use a local variable for
the remoteproc pointer, which is used repeatedly.

Signed-off-by: Alex Elder <elder@linaro.org>
---
 drivers/remoteproc/qcom_q6v5_mss.c | 23 +++++++++++++++--------
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/drivers/remoteproc/qcom_q6v5_mss.c b/drivers/remoteproc/qcom_q6v5_mss.c
index a1cc9cbe038f..97093f4f58e1 100644
--- a/drivers/remoteproc/qcom_q6v5_mss.c
+++ b/drivers/remoteproc/qcom_q6v5_mss.c
@@ -1667,15 +1667,21 @@ static int q6v5_probe(struct platform_device *pdev)
 	qproc->sysmon = qcom_add_sysmon_subdev(rproc, "modem", 0x12);
 	if (IS_ERR(qproc->sysmon)) {
 		ret = PTR_ERR(qproc->sysmon);
-		goto detach_proxy_pds;
+		goto remove_ssr_subdev;
 	}
 
 	ret = rproc_add(rproc);
 	if (ret)
-		goto detach_proxy_pds;
+		goto remove_sysmon_subdev;
 
 	return 0;
 
+remove_sysmon_subdev:
+	qcom_remove_sysmon_subdev(qproc->sysmon);
+remove_ssr_subdev:
+	qcom_remove_ssr_subdev(qproc->rproc, &qproc->ssr_subdev);
+	qcom_remove_smd_subdev(qproc->rproc, &qproc->smd_subdev);
+	qcom_remove_glink_subdev(qproc->rproc, &qproc->glink_subdev);
 detach_proxy_pds:
 	q6v5_pds_detach(qproc, qproc->proxy_pds, qproc->proxy_pd_count);
 detach_active_pds:
@@ -1689,18 +1695,19 @@ static int q6v5_probe(struct platform_device *pdev)
 static int q6v5_remove(struct platform_device *pdev)
 {
 	struct q6v5 *qproc = platform_get_drvdata(pdev);
+	struct rproc *rproc = qproc->rproc;
 
-	rproc_del(qproc->rproc);
+	rproc_del(rproc);
 
 	qcom_remove_sysmon_subdev(qproc->sysmon);
-	qcom_remove_glink_subdev(qproc->rproc, &qproc->glink_subdev);
-	qcom_remove_smd_subdev(qproc->rproc, &qproc->smd_subdev);
-	qcom_remove_ssr_subdev(qproc->rproc, &qproc->ssr_subdev);
+	qcom_remove_ssr_subdev(rproc, &qproc->ssr_subdev);
+	qcom_remove_smd_subdev(rproc, &qproc->smd_subdev);
+	qcom_remove_glink_subdev(rproc, &qproc->glink_subdev);
 
-	q6v5_pds_detach(qproc, qproc->active_pds, qproc->active_pd_count);
 	q6v5_pds_detach(qproc, qproc->proxy_pds, qproc->proxy_pd_count);
+	q6v5_pds_detach(qproc, qproc->active_pds, qproc->active_pd_count);
 
-	rproc_free(qproc->rproc);
+	rproc_free(rproc);
 
 	return 0;
 }
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] remoteproc: return error for bad "recovery" debugfs input
  2020-02-28 18:33 [PATCH 0/4] remoteproc: some bug fixes Alex Elder
                   ` (2 preceding siblings ...)
  2020-02-28 18:33 ` [PATCH 3/4] remoteproc: qcom: fix q6v5 probe error paths Alex Elder
@ 2020-02-28 18:33 ` Alex Elder
  3 siblings, 0 replies; 9+ messages in thread
From: Alex Elder @ 2020-02-28 18:33 UTC (permalink / raw)
  To: Ohad Ben-Cohen, Bjorn Andersson, Andy Gross
  Cc: linux-remoteproc, linux-arm-msm, linux-kernel

If the value written to the "recovery" debugfs file is not one of
the recognized commands return an error to indicate it's invalid.

Signed-off-by: Alex Elder <elder@linaro.org>
---
 drivers/remoteproc/remoteproc_debugfs.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/remoteproc/remoteproc_debugfs.c b/drivers/remoteproc/remoteproc_debugfs.c
index e995dc49c231..0d478bfefd9c 100644
--- a/drivers/remoteproc/remoteproc_debugfs.c
+++ b/drivers/remoteproc/remoteproc_debugfs.c
@@ -146,6 +146,8 @@ rproc_recovery_write(struct file *filp, const char __user *user_buf,
 	} else if (!strncmp(buf, "recover", count)) {
 		/* begin the recovery process without changing the flag */
 		rproc_trigger_recovery(rproc);
+	} else {
+		return -EINVAL;
 	}
 
 	return count;
-- 
2.20.1

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery()
  2020-02-28 18:33 ` [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery() Alex Elder
@ 2020-03-09 20:56   ` Mathieu Poirier
  2020-03-11 23:44       ` Bjorn Andersson
  0 siblings, 1 reply; 9+ messages in thread
From: Mathieu Poirier @ 2020-03-09 20:56 UTC (permalink / raw)
  To: Alex Elder
  Cc: Ohad Ben-Cohen, Bjorn Andersson, Andy Gross, linux-remoteproc,
	linux-arm-msm, linux-kernel

On Fri, Feb 28, 2020 at 12:33:56PM -0600, Alex Elder wrote:
> Two places call rproc_trigger_recovery():
>   - rproc_crash_handler_work() sets rproc->state to CRASHED under
>     protection of the mutex, then calls it if recovery is not
>     disabled.  This function is called in workqueue context when
>     scheduled in rproc_report_crash().
>   - rproc_recovery_write() calls it in two spots, both of which
>     the only call it if the rproc->state is CRASHED.
> 
> The mutex is taken right away in rproc_trigger_recovery().  However,
> by the time the mutex is acquired, something else might have changed
> rproc->state to something other than CRASHED.

I'm interested in the "something might have changed" part.  The only thing I can
see is if rproc_trigger_recovery() has been called from debugfs between the time
the mutex is released but just before rproc_trigger_recovery() is called in
rproc_crash_handler_work().  In this case we would be done twice, something your
patch prevents.  Have you found other scenarios?

Thanks,
Mathieu

> 
> The work that follows that is only appropriate for a remoteproc in
> CRASHED state.  So check the state after acquiring the mutex, and
> only proceed with the recovery work if the remoteproc is still in
> CRASHED state.
> 
> Delay reporting that recovering has begun until after we hold the
> mutex and we know the remote processor is in CRASHED state.
> 
> Signed-off-by: Alex Elder <elder@linaro.org>
> ---
>  drivers/remoteproc/remoteproc_core.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> index 097f33e4f1f3..d327cb31d5c8 100644
> --- a/drivers/remoteproc/remoteproc_core.c
> +++ b/drivers/remoteproc/remoteproc_core.c
> @@ -1653,12 +1653,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
>  	struct device *dev = &rproc->dev;
>  	int ret;
>  
> +	ret = mutex_lock_interruptible(&rproc->lock);
> +	if (ret)
> +		return ret;
> +
> +	/* State could have changed before we got the mutex */
> +	if (rproc->state != RPROC_CRASHED)
> +		goto unlock_mutex;
> +
>  	dev_err(dev, "recovering %s\n", rproc->name);
>  
> -	ret = mutex_lock_interruptible(&rproc->lock);
> -	if (ret)
> -		return ret;
> -
>  	ret = rproc_stop(rproc, true);
>  	if (ret)
>  		goto unlock_mutex;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery()
  2020-03-09 20:56   ` Mathieu Poirier
@ 2020-03-11 23:44       ` Bjorn Andersson
  0 siblings, 0 replies; 9+ messages in thread
From: Bjorn Andersson @ 2020-03-11 23:44 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Alex Elder, Ohad Ben-Cohen, Andy Gross, linux-remoteproc,
	linux-arm-msm, linux-kernel

On Mon 09 Mar 13:56 PDT 2020, Mathieu Poirier wrote:

> On Fri, Feb 28, 2020 at 12:33:56PM -0600, Alex Elder wrote:
> > Two places call rproc_trigger_recovery():
> >   - rproc_crash_handler_work() sets rproc->state to CRASHED under
> >     protection of the mutex, then calls it if recovery is not
> >     disabled.  This function is called in workqueue context when
> >     scheduled in rproc_report_crash().
> >   - rproc_recovery_write() calls it in two spots, both of which
> >     the only call it if the rproc->state is CRASHED.
> > 
> > The mutex is taken right away in rproc_trigger_recovery().  However,
> > by the time the mutex is acquired, something else might have changed
> > rproc->state to something other than CRASHED.
> 
> I'm interested in the "something might have changed" part.  The only thing I can
> see is if rproc_trigger_recovery() has been called from debugfs between the time
> the mutex is released but just before rproc_trigger_recovery() is called in
> rproc_crash_handler_work().  In this case we would be done twice, something your
> patch prevents.  Have you found other scenarios?
> 

Alex is right, by checking rproc->state outside of the lock
rproc_recovery_write() allows for multiple contexts to enter
rproc_trigger_recovery() at once.

Further more, these multiple context will be held up at the
mutex_lock_interruptible() and as each one completes the recovery the
subsequent ones will stop the rproc, generate a coredump and then start
it again.


This patch would be to fix the latter problem and allows the next patch
to move the check in the debugfs interface in under the mutex. As such
I've picked up patch 1, 2 and 4.

Regards,
Bjorn

> Thanks,
> Mathieu
> 
> > 
> > The work that follows that is only appropriate for a remoteproc in
> > CRASHED state.  So check the state after acquiring the mutex, and
> > only proceed with the recovery work if the remoteproc is still in
> > CRASHED state.
> > 
> > Delay reporting that recovering has begun until after we hold the
> > mutex and we know the remote processor is in CRASHED state.
> > 
> > Signed-off-by: Alex Elder <elder@linaro.org>
> > ---
> >  drivers/remoteproc/remoteproc_core.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index 097f33e4f1f3..d327cb31d5c8 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1653,12 +1653,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> >  	struct device *dev = &rproc->dev;
> >  	int ret;
> >  
> > +	ret = mutex_lock_interruptible(&rproc->lock);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* State could have changed before we got the mutex */
> > +	if (rproc->state != RPROC_CRASHED)
> > +		goto unlock_mutex;
> > +
> >  	dev_err(dev, "recovering %s\n", rproc->name);
> >  
> > -	ret = mutex_lock_interruptible(&rproc->lock);
> > -	if (ret)
> > -		return ret;
> > -
> >  	ret = rproc_stop(rproc, true);
> >  	if (ret)
> >  		goto unlock_mutex;
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery()
@ 2020-03-11 23:44       ` Bjorn Andersson
  0 siblings, 0 replies; 9+ messages in thread
From: Bjorn Andersson @ 2020-03-11 23:44 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Alex Elder, Ohad Ben-Cohen, Andy Gross, linux-remoteproc,
	linux-arm-msm, linux-kernel

On Mon 09 Mar 13:56 PDT 2020, Mathieu Poirier wrote:

> On Fri, Feb 28, 2020 at 12:33:56PM -0600, Alex Elder wrote:
> > Two places call rproc_trigger_recovery():
> >   - rproc_crash_handler_work() sets rproc->state to CRASHED under
> >     protection of the mutex, then calls it if recovery is not
> >     disabled.  This function is called in workqueue context when
> >     scheduled in rproc_report_crash().
> >   - rproc_recovery_write() calls it in two spots, both of which
> >     the only call it if the rproc->state is CRASHED.
> > 
> > The mutex is taken right away in rproc_trigger_recovery().  However,
> > by the time the mutex is acquired, something else might have changed
> > rproc->state to something other than CRASHED.
> 
> I'm interested in the "something might have changed" part.  The only thing I can
> see is if rproc_trigger_recovery() has been called from debugfs between the time
> the mutex is released but just before rproc_trigger_recovery() is called in
> rproc_crash_handler_work().  In this case we would be done twice, something your
> patch prevents.  Have you found other scenarios?
> 

Alex is right, by checking rproc->state outside of the lock
rproc_recovery_write() allows for multiple contexts to enter
rproc_trigger_recovery() at once.

Further more, these multiple context will be held up at the
mutex_lock_interruptible() and as each one completes the recovery the
subsequent ones will stop the rproc, generate a coredump and then start
it again.


This patch would be to fix the latter problem and allows the next patch
to move the check in the debugfs interface in under the mutex. As such
I've picked up patch 1, 2 and 4.

Regards,
Bjorn

> Thanks,
> Mathieu
> 
> > 
> > The work that follows that is only appropriate for a remoteproc in
> > CRASHED state.  So check the state after acquiring the mutex, and
> > only proceed with the recovery work if the remoteproc is still in
> > CRASHED state.
> > 
> > Delay reporting that recovering has begun until after we hold the
> > mutex and we know the remote processor is in CRASHED state.
> > 
> > Signed-off-by: Alex Elder <elder@linaro.org>
> > ---
> >  drivers/remoteproc/remoteproc_core.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
> > index 097f33e4f1f3..d327cb31d5c8 100644
> > --- a/drivers/remoteproc/remoteproc_core.c
> > +++ b/drivers/remoteproc/remoteproc_core.c
> > @@ -1653,12 +1653,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
> >  	struct device *dev = &rproc->dev;
> >  	int ret;
> >  
> > +	ret = mutex_lock_interruptible(&rproc->lock);
> > +	if (ret)
> > +		return ret;
> > +
> > +	/* State could have changed before we got the mutex */
> > +	if (rproc->state != RPROC_CRASHED)
> > +		goto unlock_mutex;
> > +
> >  	dev_err(dev, "recovering %s\n", rproc->name);
> >  
> > -	ret = mutex_lock_interruptible(&rproc->lock);
> > -	if (ret)
> > -		return ret;
> > -
> >  	ret = rproc_stop(rproc, true);
> >  	if (ret)
> >  		goto unlock_mutex;
> > -- 
> > 2.20.1
> > 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery()
  2020-03-11 23:44       ` Bjorn Andersson
  (?)
@ 2020-03-12  2:58       ` Alex Elder
  -1 siblings, 0 replies; 9+ messages in thread
From: Alex Elder @ 2020-03-12  2:58 UTC (permalink / raw)
  To: Bjorn Andersson, Mathieu Poirier
  Cc: Ohad Ben-Cohen, Andy Gross, linux-remoteproc, linux-arm-msm,
	linux-kernel

On 3/11/20 6:44 PM, Bjorn Andersson wrote:
> On Mon 09 Mar 13:56 PDT 2020, Mathieu Poirier wrote:
> 
>> On Fri, Feb 28, 2020 at 12:33:56PM -0600, Alex Elder wrote:
>>> Two places call rproc_trigger_recovery():
>>>   - rproc_crash_handler_work() sets rproc->state to CRASHED under
>>>     protection of the mutex, then calls it if recovery is not
>>>     disabled.  This function is called in workqueue context when
>>>     scheduled in rproc_report_crash().
>>>   - rproc_recovery_write() calls it in two spots, both of which
>>>     the only call it if the rproc->state is CRASHED.
>>>
>>> The mutex is taken right away in rproc_trigger_recovery().  However,
>>> by the time the mutex is acquired, something else might have changed
>>> rproc->state to something other than CRASHED.
>>
>> I'm interested in the "something might have changed" part.  The only thing I can
>> see is if rproc_trigger_recovery() has been called from debugfs between the time
>> the mutex is released but just before rproc_trigger_recovery() is called in
>> rproc_crash_handler_work().  In this case we would be done twice, something your
>> patch prevents.  Have you found other scenarios?

Sorry I didn't respond earlier, I was on vacation and was
actively trying to avoid getting sucked into work...

I don't expect my answer here will be very satisfying.

I implemented this a long time ago and don't remember all
the details. But regardless, if one case permits the crash
handler to be run twice for a single crash, that's one case
too many.

I started doing some analysis but have stopped for now
because Bjorn has already decided to accept it.  If you
want me to provide some more detail just say so and I'll
spend a little more time on it tomorrow.

					-Alex

> Alex is right, by checking rproc->state outside of the lock
> rproc_recovery_write() allows for multiple contexts to enter
> rproc_trigger_recovery() at once.
> 
> Further more, these multiple context will be held up at the
> mutex_lock_interruptible() and as each one completes the recovery the
> subsequent ones will stop the rproc, generate a coredump and then start
> it again.
> 
> 
> This patch would be to fix the latter problem and allows the next patch
> to move the check in the debugfs interface in under the mutex. As such
> I've picked up patch 1, 2 and 4.
> 
> Regards,
> Bjorn
> 
>> Thanks,
>> Mathieu
>>
>>>
>>> The work that follows that is only appropriate for a remoteproc in
>>> CRASHED state.  So check the state after acquiring the mutex, and
>>> only proceed with the recovery work if the remoteproc is still in
>>> CRASHED state.
>>>
>>> Delay reporting that recovering has begun until after we hold the
>>> mutex and we know the remote processor is in CRASHED state.
>>>
>>> Signed-off-by: Alex Elder <elder@linaro.org>
>>> ---
>>>  drivers/remoteproc/remoteproc_core.c | 12 ++++++++----
>>>  1 file changed, 8 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
>>> index 097f33e4f1f3..d327cb31d5c8 100644
>>> --- a/drivers/remoteproc/remoteproc_core.c
>>> +++ b/drivers/remoteproc/remoteproc_core.c
>>> @@ -1653,12 +1653,16 @@ int rproc_trigger_recovery(struct rproc *rproc)
>>>  	struct device *dev = &rproc->dev;
>>>  	int ret;
>>>  
>>> +	ret = mutex_lock_interruptible(&rproc->lock);
>>> +	if (ret)
>>> +		return ret;
>>> +
>>> +	/* State could have changed before we got the mutex */
>>> +	if (rproc->state != RPROC_CRASHED)
>>> +		goto unlock_mutex;
>>> +
>>>  	dev_err(dev, "recovering %s\n", rproc->name);
>>>  
>>> -	ret = mutex_lock_interruptible(&rproc->lock);
>>> -	if (ret)
>>> -		return ret;
>>> -
>>>  	ret = rproc_stop(rproc, true);
>>>  	if (ret)
>>>  		goto unlock_mutex;
>>> -- 
>>> 2.20.1
>>>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-03-12  2:58 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-28 18:33 [PATCH 0/4] remoteproc: some bug fixes Alex Elder
2020-02-28 18:33 ` [PATCH 1/4] remoteproc: re-check state in rproc_trigger_recovery() Alex Elder
2020-03-09 20:56   ` Mathieu Poirier
2020-03-11 23:44     ` Bjorn Andersson
2020-03-11 23:44       ` Bjorn Andersson
2020-03-12  2:58       ` Alex Elder
2020-02-28 18:33 ` [PATCH 2/4] remoteproc: remoteproc debugfs file fixes Alex Elder
2020-02-28 18:33 ` [PATCH 3/4] remoteproc: qcom: fix q6v5 probe error paths Alex Elder
2020-02-28 18:33 ` [PATCH 4/4] remoteproc: return error for bad "recovery" debugfs input Alex Elder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.