* Re: [BUG] usb: dwc3: Kernel NULL pointer dereference in dwc3_remove() [not found] ` <87sg1q1129.fsf@kernel.org> @ 2021-06-10 15:33 ` Jack Pham 2021-06-14 7:37 ` Sandeep Maheswaram 0 siblings, 1 reply; 3+ messages in thread From: Jack Pham @ 2021-06-10 15:33 UTC (permalink / raw) To: Felipe Balbi, Sandeep Maheswaram Cc: Alexandru Elisei, Greg Kroah-Hartman, p.zabel, linux-usb, Linux Kernel Mailing List, arm-mail-list, linux-arm-msm On Thu, Jun 10, 2021 at 01:11:42PM +0300, Felipe Balbi wrote: > Jack Pham <jackp@codeaurora.org> writes: > > On Fri, Jun 04, 2021 at 11:20:12AM +0300, Felipe Balbi wrote: > >> Jack Pham <jackp@codeaurora.org> writes: > >> >> >>>> Alexandru Elisei <alexandru.elisei@arm.com> writes: > >> >> >>>>> I've been able to bisect the panic and the offending commit is 568262bf5492 ("usb: > >> >> >>>>> dwc3: core: Add shutdown callback for dwc3"). I can provide more diagnostic > >> >> >>>>> information if needed and I can help test the fix. > >> >> >>>> if you simply revert that commit in HEAD, does the problem really go > >> >> >>>> away? > >> >> >>> Kernel built from commit 324c92e5e0ee, which is the kernel tip today, the panic is > >> >> >>> there. Reverting the offending commit, 568262bf5492, makes the panic disappear. > >> >> >> Want to send a revert so I can take it now? > >> >> > > >> >> > I can send a revert, but Felipe was asking Sandeep (the commit author) for a fix, > >> >> > so I'll leave it up to Felipe to decide how to proceed. > >> >> > >> >> I'm okay with a revert. Feel free to add my Acked-by: Felipe Balbi > >> >> <balbi@kernel.org> or it. > >> >> > >> >> Sandeep, please send a new version that doesn't encounter the same > >> >> issue. Make sure to test by reloading the driver in a tight loop for > >> >> several iterations. > >> > > >> > This would probably be tricky to test on other "glue" drivers as the > >> > problem appears to be specific only to dwc3_of_simple. It looks like > >> > both dwc3_of_simple and the dwc3 core now (due to 568262bf5492) each > >> > implement respective .shutdown callbacks. The latter is simply a wrapper > >> > around dwc3_remove(). And from the panic call stack above we see that > >> > dwc3_of_simple_shutdown() calls of_platform_depopulate() which will > >> > again call dwc3_remove() resulting in the double remove. > >> > > >> > So would an alternative approach be to protect against dwc3_remove() > >> > getting called multiple times? IMO it'd be a bit messy to have to add > >> > >> no, I don't think so. That sounds like a workaround. We should be able > >> to guarantee that ->remove() doesn't get called twice using the driver > >> model properly. > > > > Completely fair. So then having a .shutdown callback that directly calls > > dwc3_remove() is probably not the right thing to do as it completely > > bypasses the driver model so if and when the driver core does later > > release the device from the driver that's how we end up with the double > > remove. > > yeah, I would agree with that. > > >> > additional checks there to know if it had already been called. So maybe > >> > avoid it altogether--should dwc3_of_simple_shutdown() just skip calling > >> > of_platform_depopulate()? > >> > >> I don't know what the idiomatic is nowadays, but at least early on, we > >> had to call depopulate. > > > > So any suggestions on how to fix the original issue Sandeep was trying > > to fix with 568262bf5492? Maybe implement .shutdown in dwc3_qcom and have > > it follow what dwc3_of_simple does with of_platform_depopulate()? But > > then wouldn't other "glues" want/need to follow suit? > > I think we can implement shutdown in core, but we need to careful with > it. Instead of just blindly calling remove, let's extract the common > parts to another internal function that both remove and shutdown > call. debugfs removal should not be part of that generic method :-) Hi Sandeep, Upon re-reading your description in 568262bf5492 it sounds like the original intention of your patch is basically to quiesce the HW so that it doesn't continue to run after SMMU/IOMMU is disabled right? If that is the case, couldn't we simply call only dwc3_core_exit_mode() assuming there is no other requirement to do any other cleanup/teardown (PHYs, clocks, resets, runtime PM, et al)? This function should do the bare minimum of stopping the controller in whatever mode (host or peripheral) it is currently operating in. > Anything in that generic method should, probably, be idempotent. Yes we'll need to ensure that dwc3_core_exit_mode() can be called multiple times without additional side effects. At first glance this probably means setting dwc->xhci and dwc->gadget to NULL from dwc3_host_exit() and dwc3_gadget_exit(), respectively. Thanks, Jack -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] usb: dwc3: Kernel NULL pointer dereference in dwc3_remove() 2021-06-10 15:33 ` [BUG] usb: dwc3: Kernel NULL pointer dereference in dwc3_remove() Jack Pham @ 2021-06-14 7:37 ` Sandeep Maheswaram 2021-06-16 0:13 ` Jack Pham 0 siblings, 1 reply; 3+ messages in thread From: Sandeep Maheswaram @ 2021-06-14 7:37 UTC (permalink / raw) To: Jack Pham, Felipe Balbi Cc: Alexandru Elisei, Greg Kroah-Hartman, p.zabel, linux-usb, Linux Kernel Mailing List, arm-mail-list, linux-arm-msm On 6/10/2021 9:03 PM, Jack Pham wrote: > On Thu, Jun 10, 2021 at 01:11:42PM +0300, Felipe Balbi wrote: >> Jack Pham <jackp@codeaurora.org> writes: >>> On Fri, Jun 04, 2021 at 11:20:12AM +0300, Felipe Balbi wrote: >>>> Jack Pham <jackp@codeaurora.org> writes: >>>>>>>>>> Alexandru Elisei <alexandru.elisei@arm.com> writes: >>>>>>>>>>> I've been able to bisect the panic and the offending commit is 568262bf5492 ("usb: >>>>>>>>>>> dwc3: core: Add shutdown callback for dwc3"). I can provide more diagnostic >>>>>>>>>>> information if needed and I can help test the fix. >>>>>>>>>> if you simply revert that commit in HEAD, does the problem really go >>>>>>>>>> away? >>>>>>>>> Kernel built from commit 324c92e5e0ee, which is the kernel tip today, the panic is >>>>>>>>> there. Reverting the offending commit, 568262bf5492, makes the panic disappear. >>>>>>>> Want to send a revert so I can take it now? >>>>>>> I can send a revert, but Felipe was asking Sandeep (the commit author) for a fix, >>>>>>> so I'll leave it up to Felipe to decide how to proceed. >>>>>> I'm okay with a revert. Feel free to add my Acked-by: Felipe Balbi >>>>>> <balbi@kernel.org> or it. >>>>>> >>>>>> Sandeep, please send a new version that doesn't encounter the same >>>>>> issue. Make sure to test by reloading the driver in a tight loop for >>>>>> several iterations. >>>>> This would probably be tricky to test on other "glue" drivers as the >>>>> problem appears to be specific only to dwc3_of_simple. It looks like >>>>> both dwc3_of_simple and the dwc3 core now (due to 568262bf5492) each >>>>> implement respective .shutdown callbacks. The latter is simply a wrapper >>>>> around dwc3_remove(). And from the panic call stack above we see that >>>>> dwc3_of_simple_shutdown() calls of_platform_depopulate() which will >>>>> again call dwc3_remove() resulting in the double remove. >>>>> >>>>> So would an alternative approach be to protect against dwc3_remove() >>>>> getting called multiple times? IMO it'd be a bit messy to have to add >>>> no, I don't think so. That sounds like a workaround. We should be able >>>> to guarantee that ->remove() doesn't get called twice using the driver >>>> model properly. >>> Completely fair. So then having a .shutdown callback that directly calls >>> dwc3_remove() is probably not the right thing to do as it completely >>> bypasses the driver model so if and when the driver core does later >>> release the device from the driver that's how we end up with the double >>> remove. >> yeah, I would agree with that. >> >>>>> additional checks there to know if it had already been called. So maybe >>>>> avoid it altogether--should dwc3_of_simple_shutdown() just skip calling >>>>> of_platform_depopulate()? >>>> I don't know what the idiomatic is nowadays, but at least early on, we >>>> had to call depopulate. >>> So any suggestions on how to fix the original issue Sandeep was trying >>> to fix with 568262bf5492? Maybe implement .shutdown in dwc3_qcom and have >>> it follow what dwc3_of_simple does with of_platform_depopulate()? But >>> then wouldn't other "glues" want/need to follow suit? >> I think we can implement shutdown in core, but we need to careful with >> it. Instead of just blindly calling remove, let's extract the common >> parts to another internal function that both remove and shutdown >> call. debugfs removal should not be part of that generic method :-) > Hi Sandeep, > > Upon re-reading your description in 568262bf5492 it sounds like the > original intention of your patch is basically to quiesce the HW so that > it doesn't continue to run after SMMU/IOMMU is disabled right? > > If that is the case, couldn't we simply call only dwc3_core_exit_mode() > assuming there is no other requirement to do any other cleanup/teardown > (PHYs, clocks, resets, runtime PM, et al)? This function should do the > bare minimum of stopping the controller in whatever mode (host or > peripheral) it is currently operating in. Yes that was the intention. I will call only dwc3_core_exit_mode() and check. Is there any way we can do from dwc3 qcom glue driver to avoid problems for other glue drivers? >> Anything in that generic method should, probably, be idempotent. > Yes we'll need to ensure that dwc3_core_exit_mode() can be called > multiple times without additional side effects. At first glance this > probably means setting dwc->xhci and dwc->gadget to NULL from > dwc3_host_exit() and dwc3_gadget_exit(), respectively. Ok. Is there any way to test this ? > > Thanks, > Jack ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [BUG] usb: dwc3: Kernel NULL pointer dereference in dwc3_remove() 2021-06-14 7:37 ` Sandeep Maheswaram @ 2021-06-16 0:13 ` Jack Pham 0 siblings, 0 replies; 3+ messages in thread From: Jack Pham @ 2021-06-16 0:13 UTC (permalink / raw) To: Sandeep Maheswaram Cc: Felipe Balbi, Alexandru Elisei, Greg Kroah-Hartman, p.zabel, linux-usb, Linux Kernel Mailing List, arm-mail-list, linux-arm-msm Hi Sandeep, On Mon, Jun 14, 2021 at 01:07:11PM +0530, Sandeep Maheswaram wrote: > > On 6/10/2021 9:03 PM, Jack Pham wrote: > > On Thu, Jun 10, 2021 at 01:11:42PM +0300, Felipe Balbi wrote: > > > Jack Pham <jackp@codeaurora.org> writes: > > > > On Fri, Jun 04, 2021 at 11:20:12AM +0300, Felipe Balbi wrote: > > > > > Jack Pham <jackp@codeaurora.org> writes: > > > > > > > > > > > Alexandru Elisei <alexandru.elisei@arm.com> writes: > > > > > > > > > > > > I've been able to bisect the panic and the offending commit is 568262bf5492 ("usb: > > > > > > > > > > > > dwc3: core: Add shutdown callback for dwc3"). I can provide more diagnostic > > > > > > > > > > > > information if needed and I can help test the fix. > > > > > > > > > > > if you simply revert that commit in HEAD, does the problem really go > > > > > > > > > > > away? > > > > > > > > > > Kernel built from commit 324c92e5e0ee, which is the kernel tip today, the panic is > > > > > > > > > > there. Reverting the offending commit, 568262bf5492, makes the panic disappear. > > > > > > > > > Want to send a revert so I can take it now? > > > > > > > > I can send a revert, but Felipe was asking Sandeep (the commit author) for a fix, > > > > > > > > so I'll leave it up to Felipe to decide how to proceed. > > > > > > > I'm okay with a revert. Feel free to add my Acked-by: Felipe Balbi > > > > > > > <balbi@kernel.org> or it. > > > > > > > > > > > > > > Sandeep, please send a new version that doesn't encounter the same > > > > > > > issue. Make sure to test by reloading the driver in a tight loop for > > > > > > > several iterations. > > > > > > This would probably be tricky to test on other "glue" drivers as the > > > > > > problem appears to be specific only to dwc3_of_simple. It looks like > > > > > > both dwc3_of_simple and the dwc3 core now (due to 568262bf5492) each > > > > > > implement respective .shutdown callbacks. The latter is simply a wrapper > > > > > > around dwc3_remove(). And from the panic call stack above we see that > > > > > > dwc3_of_simple_shutdown() calls of_platform_depopulate() which will > > > > > > again call dwc3_remove() resulting in the double remove. > > > > > > > > > > > > So would an alternative approach be to protect against dwc3_remove() > > > > > > getting called multiple times? IMO it'd be a bit messy to have to add > > > > > no, I don't think so. That sounds like a workaround. We should be able > > > > > to guarantee that ->remove() doesn't get called twice using the driver > > > > > model properly. > > > > Completely fair. So then having a .shutdown callback that directly calls > > > > dwc3_remove() is probably not the right thing to do as it completely > > > > bypasses the driver model so if and when the driver core does later > > > > release the device from the driver that's how we end up with the double > > > > remove. > > > yeah, I would agree with that. > > > > > > > > > additional checks there to know if it had already been called. So maybe > > > > > > avoid it altogether--should dwc3_of_simple_shutdown() just skip calling > > > > > > of_platform_depopulate()? > > > > > I don't know what the idiomatic is nowadays, but at least early on, we > > > > > had to call depopulate. > > > > So any suggestions on how to fix the original issue Sandeep was trying > > > > to fix with 568262bf5492? Maybe implement .shutdown in dwc3_qcom and have > > > > it follow what dwc3_of_simple does with of_platform_depopulate()? But > > > > then wouldn't other "glues" want/need to follow suit? > > > I think we can implement shutdown in core, but we need to careful with > > > it. Instead of just blindly calling remove, let's extract the common > > > parts to another internal function that both remove and shutdown > > > call. debugfs removal should not be part of that generic method :-) > > Hi Sandeep, > > > > Upon re-reading your description in 568262bf5492 it sounds like the > > original intention of your patch is basically to quiesce the HW so that > > it doesn't continue to run after SMMU/IOMMU is disabled right? > > > > If that is the case, couldn't we simply call only dwc3_core_exit_mode() > > assuming there is no other requirement to do any other cleanup/teardown > > (PHYs, clocks, resets, runtime PM, et al)? This function should do the > > bare minimum of stopping the controller in whatever mode (host or > > peripheral) it is currently operating in. > > Yes that was the intention. I will call only dwc3_core_exit_mode() > and check. Is there any way we can do from dwc3 qcom glue driver to > avoid problems for other glue drivers? As I mentioned above maybe you could just implement a dwc3_qcom specific .shutdown callback which mimics what dwc3_of_simple() does by calling of_platform_depopulate(). This will allow the kernel driver core to invoke dwc3_remove() rather than calling it directly yourself. The downside is that if other glue drivers want to follow this they'd have to duplicate the same logic. But maybe this is a more cautious approach until we start seeing other drivers needing this generically within core.c. > > > Anything in that generic method should, probably, be idempotent. > > Yes we'll need to ensure that dwc3_core_exit_mode() can be called > > multiple times without additional side effects. At first glance this > > probably means setting dwc->xhci and dwc->gadget to NULL from > > dwc3_host_exit() and dwc3_gadget_exit(), respectively. > > Ok. Is there any way to test this ? You could implement both the dwc3_qcom_shutdown() as above as well as adding back dwc3_shutdown() which only does dwc3_core_exit_mode(). Make sure that even though dwc3_core_exit_mode() gets called twice nothing bad happens. Jack -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2021-06-16 0:14 UTC | newest] Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <c3c75895-313a-5be7-6421-b32bac741a88@arm.com> [not found] ` <87r1hjcvf6.fsf@kernel.org> [not found] ` <70be179c-d36b-de6f-6efc-2888055b1312@arm.com> [not found] ` <YLi/u9J5f+nQO4Cm@kroah.com> [not found] ` <8272121c-ac8a-1565-a047-e3a16dcf13b0@arm.com> [not found] ` <877djbc8xq.fsf@kernel.org> [not found] ` <20210603173632.GA25299@jackp-linux.qualcomm.com> [not found] ` <87mts6avnn.fsf@kernel.org> [not found] ` <20210607180023.GA23045@jackp-linux.qualcomm.com> [not found] ` <87sg1q1129.fsf@kernel.org> 2021-06-10 15:33 ` [BUG] usb: dwc3: Kernel NULL pointer dereference in dwc3_remove() Jack Pham 2021-06-14 7:37 ` Sandeep Maheswaram 2021-06-16 0:13 ` Jack Pham
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).