* Re: [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed @ 2012-08-20 13:07 Sjur Brændeland 2012-08-20 20:45 ` Guzman Lugo, Fernando 0 siblings, 1 reply; 4+ messages in thread From: Sjur Brændeland @ 2012-08-20 13:07 UTC (permalink / raw) To: Fernando Guzman Lugo, Ohad Ben-Cohen; +Cc: LKML Hi Fernando, >This patch is introducing rproc_trigger_recover function which is in >charge of recovering the rproc. One way to recover the rproc after a crash >is resetting all its virtio devices. Doing that, all rpmsg drivers are >restored along with the rpmsg devices and that also causes the reset of >the remoteproc making the rpmsg communication with the remoteproc >functional again. So far, rproc_trigger_recover function is only resetting >all virtio devices, if in the future other rproc features are introduced >and need to be reset too, rproc_trigger_recover function should take care >of that. I think you drop the driver module's ref count during recovery, because rproc_shutdown calls module_put(). Maybe you should move driver ref count handling to rproc_add and rproc_type_release, instead of rproc_boot() and rproc_shutdown()? ... >+int rproc_trigger_recover(struct rproc *rproc) >+{ >+ struct rproc_vdev *rvdev, *rvtmp; >+ >+ dev_err(&rproc->dev, "recovering %s\n", rproc->name); >+ >+ /* clean up remote vdev entries */ >+ list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node) >+ rproc_remove_virtio_dev(rvdev); ... Is this safe? Are you guaranteed that rproc->power is counted down to zero at this point. Won't this fail if the firmware loading starts before all clients has called rproc_shutdown()? I guess virtio drivers potentially could defer the call to del_vqs() to a work-queue. In this case you cannot be guaranteed that the rproc is shut down at this point. You may also have e.g. platform driver who has previously called rproc_boot() and calls rproc_shutdown() after calling rproc_crash(). I think you should wait for the rproc->power count to go to zero before initiating the firmware_loading. You could e.g. move firmware_loading to rproc_shutdown(), or add a completion. BTW: I ran into a bug in unregister_virtio_device when testing your feature with SLAB_DEBUG=y. The dev->index is accessed after the device is freed. I'll submit a patch on this at some point. After fixing that, your patch works for me! Regards, Sjur ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed 2012-08-20 13:07 [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed Sjur Brændeland @ 2012-08-20 20:45 ` Guzman Lugo, Fernando 2012-08-22 11:57 ` Sjur BRENDELAND 0 siblings, 1 reply; 4+ messages in thread From: Guzman Lugo, Fernando @ 2012-08-20 20:45 UTC (permalink / raw) To: Sjur Brændeland; +Cc: Ohad Ben-Cohen, LKML Hi Sjur, On Mon, Aug 20, 2012 at 8:07 AM, Sjur Brændeland <sjur.brandeland@stericsson.com> wrote: > > Hi Fernando, > > >This patch is introducing rproc_trigger_recover function which is in > >charge of recovering the rproc. One way to recover the rproc after a > > crash > >is resetting all its virtio devices. Doing that, all rpmsg drivers are > >restored along with the rpmsg devices and that also causes the reset of > >the remoteproc making the rpmsg communication with the remoteproc > >functional again. So far, rproc_trigger_recover function is only > > resetting > >all virtio devices, if in the future other rproc features are introduced > >and need to be reset too, rproc_trigger_recover function should take care > >of that. > > I think you drop the driver module's ref count during recovery, because > rproc_shutdown calls module_put(). Maybe you should move driver > ref count handling to rproc_add and rproc_type_release, instead of > rproc_boot() and rproc_shutdown()? How could you remove the remoteproc modules then? omap remoteproc for example: # modprobe omap_remoteproc # lsmod Module Size Used by Tainted: G virtio_rpmsg_bus 12557 0 omap_remoteproc 6194 1 remoteproc 31112 1 omap_remoteproc virtio_ring 9820 2 virtio_rpmsg_bus,remoteproc virtio 4865 2 virtio_rpmsg_bus,remoteproc here if I want to remove the modules I can do: modprobe -r virtio_rpmsg_bus getting: # lsmod Module Size Used by Tainted: G omap_remoteproc 6194 0 remoteproc 31112 1 omap_remoteproc virtio_ring 9820 1 remoteproc virtio So, at this moment omap_remoteproc use count becomes 0 and then I can remove omap_remoteproc and then remoteproc modules. However if I do what you suggested then: - omap_remoteproc depends on remoteproc module and it will increase the use count of remoteproc module. - as soon omap_remoteproc call rproc_add, remoteproc module will call module_get on omap_remoteproc module so it will increase omap_remoteproc use counter. At this point we have a circular dependency, omap_remoteproc depends on remoteproc and vice-versa. Making impossible to remove the remoteproc modules. So, I don't think it is a good idea to move module_get/put calls. what I could do is call module_get before calling rproc_remove_virtio_dev for the virtio devices and call module_put after loading the firmware. However, I am no sure why you want to address something like that. Don't expect recovery to work if you call rmmod <platform rproc driver> at the middle of the recovery, is like if in my example I would do # modprobe omap_remoteproc # rmmod virtio_rpmsg_bus # rmmod omap_remoteproc # modprobe virtio_rpmsg_bus At this moment the remote proce will not boot because there is not more omap remoteproc devices, that is expected and it won't generate any crash the same way recovery won't generate any crash it will only do nothing after removing the devices or it will fail gracefully because the rproc name does not exists anymore. However, if you still have concerns about that or a valid testcase where that would have an unexpected behaviour we can think in a way to fix it. Probably just calling module_get/put inside rproc_trigger_recover is the easiest way. Please let me know what you think. > > > ... > >+int rproc_trigger_recover(struct rproc *rproc) > >+{ > >+ struct rproc_vdev *rvdev, *rvtmp; > >+ > >+ dev_err(&rproc->dev, "recovering %s\n", rproc->name); > >+ > >+ /* clean up remote vdev entries */ > >+ list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node) > >+ rproc_remove_virtio_dev(rvdev); > > ... > > Is this safe? Are you guaranteed that rproc->power > is counted down to zero at this point. if the remoteproc is using rpmsg and there is no other module/user calling rproc_boot it is guaranteed. Otherwise, it does not. > > Won't this fail if the > firmware loading starts before all clients has called > rproc_shutdown()? Yes, the rproc refcount will never reach 0. Therefore, the remoteproc will not be reset. At this moment we only support recovery of rprocs which are only used by rpmsg module. If there are any other user which call directly to rproc_boot then the recovery will not work for that case. AFAIK, there is no such users right now, and we will need to implement some kind of rproc crash notifications in order to address those cases. At the moment we don't need that, but if you have some of those test cases we can discuss how to address them. Please let me know what do you think. > > > I guess virtio drivers potentially could defer the call to del_vqs() > to a work-queue. that would need to change the remoteproc_virtio.c which means change the remoteproc module, that is not a thing that the platform remoteproc implementation can change. So, if there are some changes in the remoteproc core module to defer the find_vqs, the recovery should also change to guarantee it will be functional. > > In this case you cannot be guaranteed that the > rproc is shut down at this point. You may also have e.g. platform > driver who has previously called rproc_boot() and calls > rproc_shutdown() after calling rproc_crash(). do you mean rproc_report_crash()? that function is not supposed to be called by any platform driver which uses remoteproc, expect for the platform driver which implements a specific remoteproc. So, that sequence should not be valid at first place. > > > I think you should wait for the rproc->power count to go to zero > before initiating the firmware_loading. You could e.g. > move firmware_loading to rproc_shutdown(), or add a > completion. Yes, that sounds good and it will work when we supports rproc_boot/shutdown calls from anywhere and not only from rpmsg module(find_vqs). Also, I think with that wait_for_completion inside rproc_trigger_recover will make more clear how the recovery is performed (actually the need for ref counter to become 0 in order to work). I will try that and send a new patch to get more comments. > > > BTW: I ran into a bug in unregister_virtio_device when testing > your feature with SLAB_DEBUG=y. The dev->index is accessed > after the device is freed. I'll submit a patch on this at some point. good! > > > After fixing that, your patch works for me! Excellent news :). Thanks for the comments and for try them out on your system. Regards, Fernando. > > > Regards, > Sjur ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed 2012-08-20 20:45 ` Guzman Lugo, Fernando @ 2012-08-22 11:57 ` Sjur BRENDELAND 0 siblings, 0 replies; 4+ messages in thread From: Sjur BRENDELAND @ 2012-08-22 11:57 UTC (permalink / raw) To: Guzman Lugo, Fernando; +Cc: Ohad Ben-Cohen, LKML Hi Fernando, > > I think you drop the driver module's ref count during recovery, because > > rproc_shutdown calls module_put(). Maybe you should move driver > > ref count handling to rproc_add and rproc_type_release, instead of > > rproc_boot() and rproc_shutdown()? > > How could you remove the remoteproc modules then? ... > At this point we have a circular dependency, omap_remoteproc depends on > remoteproc and vice-versa. Making impossible to remove the remoteproc > modules. So, I don't think it is a good idea to move module_get/put > calls. Yes, I see - you're right. What I proposed won't work. > However, if you still have concerns about that or a valid testcase where > that would have an unexpected behaviour we can think in a way to fix it. > Probably just calling module_get/put inside rproc_trigger_recover is > the easiest way. Please let me know what you think. My concern was unloading of pdev or driver modules durng async firmware loading. However it turns out that my issue was that I was using rproc_put() instead of rproc_del() when unloading my driver module. So after fixing my driver your patch works fine even when unloading the driver during firmware loading. > > >+int rproc_trigger_recover(struct rproc *rproc) > > >+{ > > >+ struct rproc_vdev *rvdev, *rvtmp; > > >+ > > >+ dev_err(&rproc->dev, "recovering %s\n", rproc->name); > > >+ > > >+ /* clean up remote vdev entries */ > > >+ list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node) > > >+ rproc_remove_virtio_dev(rvdev); > > ... > > Is this safe? Are you guaranteed that rproc->power > > is counted down to zero at this point. ... > > I think you should wait for the rproc->power count to go to zero > > before initiating the firmware_loading. You could e.g. > > move firmware_loading to rproc_shutdown(), or add a > > completion. > > Yes, that sounds good and it will work when we supports > rproc_boot/shutdown > calls from anywhere and not only from > rpmsg module(find_vqs). Also, I think with that wait_for_completion > inside rproc_trigger_recover will make more clear how the recovery > is performed (actually the need for ref counter to > become 0 in order to work). > > I will try that and send a new patch to get more comments. Ok, looking forward to your next patch then. Regards, Sjur ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 0/3] remoteproc: introduce rproc recovery @ 2012-08-08 23:07 Fernando Guzman Lugo 2012-08-08 23:07 ` [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed Fernando Guzman Lugo 0 siblings, 1 reply; 4+ messages in thread From: Fernando Guzman Lugo @ 2012-08-08 23:07 UTC (permalink / raw) To: ohad, linux-omap, linux-arm-kernel, linux-kernel; +Cc: Fernando Guzman Lugo These set of patches make possible the remoteproc recover after a crash. This is a hard recovery, that means the remoteproc is reset and it will start from the beginning. When a crash happen all the virtio devices are destroyed. Therefore, both rpmsg drivers and devices are gracefully removed which also cause rproc users become 0 and the remoteproc is turned off. After the virtio devices are destroyed the crash handler function will read the virtio information from the firmware in order to recreate the virtio devices that will boot the remoteproc and everything will be functional again. Fernando Guzman Lugo (3): remoteproc: add rproc_report_crash function to notify rproc crashes remoteproc: recover a remoteproc when it has crashed remoteproc: create debugfs entry to disable/enable recovery dynamically Documentation/remoteproc.txt | 7 ++ drivers/remoteproc/remoteproc_core.c | 107 ++++++++++++++++++++++++++++-- drivers/remoteproc/remoteproc_debugfs.c | 83 +++++++++++++++++++++++ drivers/remoteproc/remoteproc_internal.h | 1 + include/linux/remoteproc.h | 20 ++++++ 5 files changed, 211 insertions(+), 7 deletions(-) ^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed 2012-08-08 23:07 [PATCH 0/3] remoteproc: introduce rproc recovery Fernando Guzman Lugo @ 2012-08-08 23:07 ` Fernando Guzman Lugo 0 siblings, 0 replies; 4+ messages in thread From: Fernando Guzman Lugo @ 2012-08-08 23:07 UTC (permalink / raw) To: ohad, linux-omap, linux-arm-kernel, linux-kernel; +Cc: Fernando Guzman Lugo This patch is introducing rproc_trigger_recover function which is in charge of recovering the rproc. One way to recover the rproc after a crash is resetting all its virtio devices. Doing that, all rpmsg drivers are restored along with the rpmsg devices and that also causes the reset of the remoteproc making the rpmsg communication with the remoteproc functional again. So far, rproc_trigger_recover function is only resetting all virtio devices, if in the future other rproc features are introduced and need to be reset too, rproc_trigger_recover function should take care of that. Signed-off-by: Fernando Guzman Lugo <fernando.lugo@ti.com> --- drivers/remoteproc/remoteproc_core.c | 28 +++++++++++++++++++++++++++- drivers/remoteproc/remoteproc_internal.h | 1 + 2 files changed, 28 insertions(+), 1 deletions(-) diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c index 3a6f1a1..c879069 100644 --- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -882,6 +882,32 @@ out: } /** + * rproc_trigger_recover() - recover a remoteproc + * @rproc: the remote processor + * + * The recovery is done by reseting all the virtio devices, that way all the + * rpmsg drivers will be reseted along with the remote processor making the + * remoteproc functional again. + * + * This function can sleep, so that it cannot be called from atomic context. + */ +int rproc_trigger_recover(struct rproc *rproc) +{ + struct rproc_vdev *rvdev, *rvtmp; + + dev_err(&rproc->dev, "recovering %s\n", rproc->name); + + /* clean up remote vdev entries */ + list_for_each_entry_safe(rvdev, rvtmp, &rproc->rvdevs, node) + rproc_remove_virtio_dev(rvdev); + + /* run rproc_fw_config_virtio to create vdevs again */ + return request_firmware_nowait(THIS_MODULE, FW_ACTION_HOTPLUG, + rproc->firmware, &rproc->dev, GFP_KERNEL, + rproc, rproc_fw_config_virtio); +} + +/** * rproc_crash_handler_work() - handle a crash * * This function needs to handle everything related to a crash, like cpu @@ -906,7 +932,7 @@ static void rproc_crash_handler_work(struct work_struct *work) ++rproc->crash_cnt, rproc->name); mutex_unlock(&rproc->lock); - /* TODO: handle crash */ + rproc_trigger_recover(rproc); } /** diff --git a/drivers/remoteproc/remoteproc_internal.h b/drivers/remoteproc/remoteproc_internal.h index a690ebe..d9c0730 100644 --- a/drivers/remoteproc/remoteproc_internal.h +++ b/drivers/remoteproc/remoteproc_internal.h @@ -63,6 +63,7 @@ void rproc_free_vring(struct rproc_vring *rvring); int rproc_alloc_vring(struct rproc_vdev *rvdev, int i); void *rproc_da_to_va(struct rproc *rproc, u64 da, int len); +int rproc_trigger_recover(struct rproc *rproc); static inline int rproc_fw_sanity_check(struct rproc *rproc, const struct firmware *fw) -- 1.7.1 ^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-22 11:57 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-08-20 13:07 [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed Sjur Brændeland 2012-08-20 20:45 ` Guzman Lugo, Fernando 2012-08-22 11:57 ` Sjur BRENDELAND -- strict thread matches above, loose matches on Subject: below -- 2012-08-08 23:07 [PATCH 0/3] remoteproc: introduce rproc recovery Fernando Guzman Lugo 2012-08-08 23:07 ` [PATCH 2/3] remoteproc: recover a remoteproc when it has crashed Fernando Guzman Lugo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).