linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed
@ 2012-08-20 13:07 Sjur Brændeland
  2012-08-20 20:45 ` Guzman Lugo, Fernando
  0 siblings, 1 reply; 4+ messages in thread
From: Sjur Brændeland @ 2012-08-20 13:07 UTC (permalink / raw)
  To: Fernando Guzman Lugo, Ohad Ben-Cohen; +Cc: LKML

Hi Fernando,

>This patch is introducing rproc_trigger_recover function which is in
>charge of recovering the rproc. One way to recover the rproc after a crash
>is resetting all its virtio devices. Doing that, all rpmsg drivers are
>restored along with the rpmsg devices and that also causes the reset of
>the remoteproc making the rpmsg communication with the remoteproc
>functional again. So far, rproc_trigger_recover function is only resetting
>all virtio devices, if in the future other rproc features are introduced
>and need to be reset too, rproc_trigger_recover function should take care
>of that.

I think you drop the driver module's ref count during recovery, because
rproc_shutdown calls module_put(). Maybe you should move driver
ref count handling to rproc_add and rproc_type_release, instead of
rproc_boot() and rproc_shutdown()?

...
>+int rproc_trigger_recover(struct rproc *rproc)
>+{
>+	struct rproc_vdev *rvdev, *rvtmp;
>+
>+	dev_err(&rproc->dev, "recovering %s\n", rproc->name);
>+
>+	/* clean up remote vdev entries */
>+	list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node)
>+		rproc_remove_virtio_dev(rvdev);

...

Is this safe? Are you guaranteed that rproc->power
is counted down to zero at this point. Won't this fail if the
firmware loading starts before all clients has called
rproc_shutdown()?

I guess virtio drivers potentially could defer the call to del_vqs()
to a work-queue. In this case  you cannot be guaranteed that the
rproc is shut down at this point. You may also have e.g. platform
driver who has previously called rproc_boot() and calls
rproc_shutdown() after calling rproc_crash().

I think you should wait for the rproc->power count to go to zero
before initiating the firmware_loading. You could e.g.
move firmware_loading to rproc_shutdown(), or add a
completion.

BTW: I ran into a bug in unregister_virtio_device when testing
your feature with SLAB_DEBUG=y. The dev->index is accessed
after the device is freed. I'll submit a patch on this at some point.

After fixing that, your patch works for me!

Regards,
Sjur

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed
  2012-08-20 13:07 [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed Sjur Brændeland
@ 2012-08-20 20:45 ` Guzman Lugo, Fernando
  2012-08-22 11:57   ` Sjur BRENDELAND
  0 siblings, 1 reply; 4+ messages in thread
From: Guzman Lugo, Fernando @ 2012-08-20 20:45 UTC (permalink / raw)
  To: Sjur Brændeland; +Cc: Ohad Ben-Cohen, LKML

Hi Sjur,

On Mon, Aug 20, 2012 at 8:07 AM, Sjur Brændeland
<sjur.brandeland@stericsson.com> wrote:
>
> Hi Fernando,
>
> >This patch is introducing rproc_trigger_recover function which is in
> >charge of recovering the rproc. One way to recover the rproc after a
> > crash
> >is resetting all its virtio devices. Doing that, all rpmsg drivers are
> >restored along with the rpmsg devices and that also causes the reset of
> >the remoteproc making the rpmsg communication with the remoteproc
> >functional again. So far, rproc_trigger_recover function is only
> > resetting
> >all virtio devices, if in the future other rproc features are introduced
> >and need to be reset too, rproc_trigger_recover function should take care
> >of that.
>
> I think you drop the driver module's ref count during recovery, because
> rproc_shutdown calls module_put(). Maybe you should move driver
> ref count handling to rproc_add and rproc_type_release, instead of
> rproc_boot() and rproc_shutdown()?

How could you remove the remoteproc modules then?

omap remoteproc for example:

# modprobe omap_remoteproc

# lsmod
Module                  Size  Used by    Tainted: G
virtio_rpmsg_bus       12557  0
omap_remoteproc         6194  1
remoteproc             31112  1 omap_remoteproc
virtio_ring             9820  2 virtio_rpmsg_bus,remoteproc
virtio                  4865  2 virtio_rpmsg_bus,remoteproc

here if I want to remove the modules I can do:

modprobe -r virtio_rpmsg_bus

getting:

# lsmod
Module                  Size  Used by    Tainted: G
omap_remoteproc         6194  0
remoteproc             31112  1 omap_remoteproc
virtio_ring             9820  1 remoteproc
virtio

So, at this moment omap_remoteproc use count becomes 0 and then I can remove
omap_remoteproc and then remoteproc modules.

However if I do what you suggested then:
- omap_remoteproc depends on remoteproc module and it will increase the use
count of remoteproc module.
- as soon omap_remoteproc call rproc_add, remoteproc module will call
module_get on omap_remoteproc module so it will increase omap_remoteproc use
counter.

At this point we have a circular dependency, omap_remoteproc depends on
remoteproc and vice-versa. Making impossible to remove the remoteproc
modules. So, I don't think it is a good idea to move module_get/put calls.

what I could do is call module_get before calling rproc_remove_virtio_dev
for the virtio devices and call module_put after loading the firmware.
However, I am no sure why you want to address something like that. Don't
expect recovery to work if you call rmmod <platform rproc driver> at the
middle of the recovery, is like if in my example I would do

# modprobe omap_remoteproc

# rmmod virtio_rpmsg_bus
# rmmod omap_remoteproc

# modprobe virtio_rpmsg_bus

At this moment the remote proce will not boot because there is not more omap
remoteproc devices, that is expected and it won't generate any crash the
same way recovery won't generate any crash it will only do nothing after
removing the devices or it will fail gracefully because the rproc name does
not exists anymore.

However, if you still have concerns about that or a valid testcase where
that would have an unexpected behaviour we can think in a way to fix it.
Probably just calling module_get/put inside  rproc_trigger_recover is the
easiest way. Please let me know what you think.

>
>
> ...
> >+int rproc_trigger_recover(struct rproc *rproc)
> >+{
> >+      struct rproc_vdev *rvdev, *rvtmp;
> >+
> >+      dev_err(&rproc->dev, "recovering %s\n", rproc->name);
> >+
> >+      /* clean up remote vdev entries */
> >+      list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node)
> >+              rproc_remove_virtio_dev(rvdev);
>
> ...
>
> Is this safe? Are you guaranteed that rproc->power
> is counted down to zero at this point.

if the remoteproc is using rpmsg and there is no other module/user calling
rproc_boot it is guaranteed. Otherwise, it does not.

>
> Won't this fail if the
> firmware loading starts before all clients has called
> rproc_shutdown()?

Yes, the rproc refcount will never reach 0. Therefore, the remoteproc will
not be reset. At this moment we only support recovery of rprocs which are
only used by rpmsg module. If there are any other user which call directly
to rproc_boot then the recovery will not work for that case. AFAIK, there is
no such users right now, and we will need to implement some kind of rproc
crash notifications in order to address those cases. At the moment we don't
need that, but if you have some of those test cases we can discuss how to
address them. Please let me know what do you think.

>
>
> I guess virtio drivers potentially could defer the call to del_vqs()
> to a work-queue.

that would need to change the remoteproc_virtio.c which means change the
remoteproc module, that is not a thing that the platform remoteproc
implementation can change. So, if there are some changes in the remoteproc
core module to defer the find_vqs, the recovery should also change to
guarantee it will be functional.

>
> In this case  you cannot be guaranteed that the
> rproc is shut down at this point. You may also have e.g. platform
> driver who has previously called rproc_boot() and calls
> rproc_shutdown() after calling rproc_crash().

do you mean rproc_report_crash()? that function is not supposed to be called
by any platform driver which uses remoteproc, expect for the platform driver
which implements a specific remoteproc. So, that sequence should not be
valid at first place.

>
>
> I think you should wait for the rproc->power count to go to zero
> before initiating the firmware_loading. You could e.g.
> move firmware_loading to rproc_shutdown(), or add a
> completion.

Yes, that sounds good and it will work when we supports rproc_boot/shutdown
calls from anywhere and not only from rpmsg module(find_vqs). Also, I think
with that wait_for_completion inside rproc_trigger_recover will make more
clear how the recovery is performed (actually the need for ref counter to
become 0 in order to work).

I will try that and send a new patch to get more comments.

>
>
> BTW: I ran into a bug in unregister_virtio_device when testing
> your feature with SLAB_DEBUG=y. The dev->index is accessed
> after the device is freed. I'll submit a patch on this at some point.

good!

>
>
> After fixing that, your patch works for me!

Excellent news :).

Thanks for the comments and for try them out on your system.

Regards,
Fernando.

>
>
> Regards,
> Sjur

^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed
  2012-08-20 20:45 ` Guzman Lugo, Fernando
@ 2012-08-22 11:57   ` Sjur BRENDELAND
  0 siblings, 0 replies; 4+ messages in thread
From: Sjur BRENDELAND @ 2012-08-22 11:57 UTC (permalink / raw)
  To: Guzman Lugo, Fernando; +Cc: Ohad Ben-Cohen, LKML

Hi Fernando,

> > I think you drop the driver module's ref count during recovery, because
> > rproc_shutdown calls module_put(). Maybe you should move driver
> > ref count handling to rproc_add and rproc_type_release, instead of
> > rproc_boot() and rproc_shutdown()?
> 
> How could you remove the remoteproc modules then?
...
> At this point we have a circular dependency, omap_remoteproc depends on
> remoteproc and vice-versa. Making impossible to remove the remoteproc
> modules. So, I don't think it is a good idea to move module_get/put
> calls.

Yes, I see - you're right. What I proposed won't work.

> However, if you still have concerns about that or a valid testcase where
> that would have an unexpected behaviour we can think in a way to fix it.
> Probably just calling module_get/put inside  rproc_trigger_recover is
> the easiest way. Please let me know what you think.

My concern was unloading of pdev or driver modules durng async firmware
loading. However it turns out that my issue was that I was using
rproc_put() instead of rproc_del() when unloading my driver module.
So after fixing my driver your patch works fine even when unloading the
driver during firmware loading.

> > >+int rproc_trigger_recover(struct rproc *rproc)
> > >+{
> > >+      struct rproc_vdev *rvdev, *rvtmp;
> > >+
> > >+      dev_err(&rproc->dev, "recovering %s\n", rproc->name);
> > >+
> > >+      /* clean up remote vdev entries */
> > >+      list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node)
> > >+              rproc_remove_virtio_dev(rvdev);
> > ...
> > Is this safe? Are you guaranteed that rproc->power
> > is counted down to zero at this point.
...
> > I think you should wait for the rproc->power count to go to zero
> > before initiating the firmware_loading. You could e.g.
> > move firmware_loading to rproc_shutdown(), or add a
> > completion.
> 
> Yes, that sounds good and it will work when we supports
> rproc_boot/shutdown > calls from anywhere and not only from
> rpmsg module(find_vqs). Also, I think with that wait_for_completion
> inside rproc_trigger_recover will make more clear how the recovery
> is performed (actually the need for ref counter to
> become 0 in order to work).
> 
> I will try that and send a new patch to get more comments.


Ok, looking forward to your next patch then.

Regards,
Sjur

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed
  2012-08-08 23:07 [PATCH 0/3] remoteproc: introduce rproc recovery Fernando Guzman Lugo
@ 2012-08-08 23:07 ` Fernando Guzman Lugo
  0 siblings, 0 replies; 4+ messages in thread
From: Fernando Guzman Lugo @ 2012-08-08 23:07 UTC (permalink / raw)
  To: ohad, linux-omap, linux-arm-kernel, linux-kernel; +Cc: Fernando Guzman Lugo

This patch is introducing rproc_trigger_recover function which is in
charge of recovering the rproc. One way to recover the rproc after a crash
is resetting all its virtio devices. Doing that, all rpmsg drivers are
restored along with the rpmsg devices and that also causes the reset of
the remoteproc making the rpmsg communication with the remoteproc
functional again. So far, rproc_trigger_recover function is only resetting
all virtio devices, if in the future other rproc features are introduced
and need to be reset too, rproc_trigger_recover function should take care
of that.

Signed-off-by: Fernando Guzman Lugo <fernando.lugo@ti.com>
---
 drivers/remoteproc/remoteproc_core.c     |   28 +++++++++++++++++++++++++++-
 drivers/remoteproc/remoteproc_internal.h |    1 +
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index 3a6f1a1..c879069 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -882,6 +882,32 @@ out:
 }
 
 /**
+ * rproc_trigger_recover() - recover a remoteproc
+ * @rproc: the remote processor
+ *
+ * The recovery is done by reseting all the virtio devices, that way all the
+ * rpmsg drivers will be reseted along with the remote processor making the
+ * remoteproc functional again.
+ *
+ * This function can sleep, so that it cannot be called from atomic context.
+ */
+int rproc_trigger_recover(struct rproc *rproc)
+{
+	struct rproc_vdev *rvdev, *rvtmp;
+
+	dev_err(&rproc->dev, "recovering %s\n", rproc->name);
+
+	/* clean up remote vdev entries */
+	list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node)
+		rproc_remove_virtio_dev(rvdev);
+
+	/* run rproc_fw_config_virtio to create vdevs again */
+	return request_firmware_nowait(THIS_MODULE, FW_ACTION_HOTPLUG,
+			rproc->firmware, &rproc->dev, GFP_KERNEL,
+			rproc, rproc_fw_config_virtio);
+}
+
+/**
  * rproc_crash_handler_work() - handle a crash
  *
  * This function needs to handle everything related to a crash, like cpu
@@ -906,7 +932,7 @@ static void rproc_crash_handler_work(struct work_struct *work)
 		++rproc->crash_cnt, rproc->name);
 	mutex_unlock(&rproc->lock);
 
-	/* TODO: handle crash */
+	rproc_trigger_recover(rproc);
 }
 
 /**
diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h
index a690ebe..d9c0730 100644
--- a/drivers/remoteproc/remoteproc_internal.h
+++ b/drivers/remoteproc/remoteproc_internal.h
@@ -63,6 +63,7 @@ void rproc_free_vring(struct rproc_vring *rvring);
 int rproc_alloc_vring(struct rproc_vdev *rvdev, int i);
 
 void *rproc_da_to_va(struct rproc *rproc, u64 da, int len);
+int rproc_trigger_recover(struct rproc *rproc);
 
 static inline
 int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2012-08-22 11:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-20 13:07 [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed Sjur Brændeland
2012-08-20 20:45 ` Guzman Lugo, Fernando
2012-08-22 11:57   ` Sjur BRENDELAND
  -- strict thread matches above, loose matches on Subject: below --
2012-08-08 23:07 [PATCH 0/3] remoteproc: introduce rproc recovery Fernando Guzman Lugo
2012-08-08 23:07 ` [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed Fernando Guzman Lugo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).