On Fri, Oct 06, 2023 at 10:00:11AM -0700, Avichal Rakesh wrote: > > >On 10/5/23 15:05, Michael Grzeschik wrote: >> Hi Avichal, >> >> On Thu, Oct 05, 2023 at 11:30:32AM -0700, Avichal Rakesh wrote: >>> On 10/5/23 03:14, Michael Grzeschik wrote: >>>> On Thu, Oct 05, 2023 at 11:23:27AM +0300, Laurent Pinchart wrote: >>>>> On Tue, Oct 03, 2023 at 01:09:06PM +0200, Michael Grzeschik wrote: >>>>>> On Sat, Sep 30, 2023 at 11:48:18AM -0700, Avichal Rakesh wrote: >>>>>> > We have been seeing two main stability issues that uvc gadget driver >>>>>> > runs into when stopping streams: >>>>>> >  1. Attempting to queue usb_requests to a disabled usb_ep >>>>>> >  2. use-after-free issue for inflight usb_requests >>>>>> > >>>>>> > The three patches below fix the two issues above. Patch 1/3 fixes the >>>>>> > first issue, and Patch 2/3 and 3/3 fix the second issue. >>>>>> > >>>>>> > Avichal Rakesh (3): >>>>>> >   usb: gadget: uvc: prevent use of disabled endpoint >>>>>> >   usb: gadget: uvc: Allocate uvc_requests one at a time >>>>>> >   usb: gadget: uvc: Fix use-after-free for inflight usb_requests >>>>>> > >>>>>> > drivers/usb/gadget/function/f_uvc.c     |  11 +- >>>>>> > drivers/usb/gadget/function/f_uvc.h     |   2 +- >>>>>> > drivers/usb/gadget/function/uvc.h       |   6 +- >>>>>> > drivers/usb/gadget/function/uvc_v4l2.c  |  21 ++- >>>>>> > drivers/usb/gadget/function/uvc_video.c | 189 +++++++++++++++++------- >>>>>> > 5 files changed, 164 insertions(+), 65 deletions(-) >>>>>> >>>>>> These patches are not applying on gregkh/usb-testing since >>>>>> Greg did take my patches first. I have already rebased them. >>>>> >>>>> I think they got merged too soon :-( We could fix things on top, but >>>>> there's very little time to do so for v6.7. >>>> >>>> Agreed. I was jumping from one workaround to another one, since this >>>> is not easy to fix in a proper way. And still after this long discussion >>>> with Avichal I don't think we are there yet. >>>> >>>> >>>> So far the first two patches from Avichal look legit. But the overall >>>> Use-After-Free fix is yet to be done properly. >>>> >>>> The "abondoned" method he suggested is really bad to follow and will >>>> add too much complexity and will be hard to debug. >>>> >>>> IMHO it should be possible to introduce two cleanup pathes. >>>> >>>> One path would be in the uvc_cleanup_requests that will cleanup the >>>> requests that are actually not used in the controller and are registered >>>> in the req_free list. >>>> >>>> The second path would be the complete functions that are being run >>>> from the controller and will ensure that the cleanup will really free >>>> the requests from the controller after they were consumed. >>>> >>>> What do you think? >>> >>> I am not sure I follow. Patch 3/3 does exactly what you say here. >> >> Yes, it was just to summ up what the latest state of the idea was, >> so Laurent does not read the whole thread in detail. Sorry for not >> being clear enough about that. > >Whoops! Sorry about the misunderstanding! > >> >>> There are two cleanup paths: >>>  1. uvcg_video_disable cleans up only the requests in req_free, and >>>  2. complete handler cleans up the in-flight requests. >>> >>> The "abandoned" flag is simply to let the completion handler know >>> which requests to clean up and which ones to re-queue back to >>> the gadget driver. >> >> What I don't get is, why in the case of shutdown there needs to >> be something re-queued back to the gadget driver. There should not >> need to be any sort of barrier flag for the requests. Just the >> complete handler running past a barrier where it knows that the >> whole device is stopped. So every call on complete should then clean >> that exact request it is touching currently. >> >> I don't know where the extra complexity comes from. > >A lot of this complexity comes from assuming a back to back >STREAMOFF -> STREAMON sequence is possible where the gadget driver >doesn't have the time to clean up all in-flight usb_requests. >However, looking through the usb gadget APIs again, and it >looks like usb_ep_disable enforces that all requests will >be sent back to the gadget driver before it returns. Great! >So you're right: >With Patch 1/3 in place, I think we can just guard on uvc->state >alone, because control requests are blocked until usb_ep_disable >is finished anyway. I'll upload v4 with the "is_abandoned" >flag removed and the checks simplified once I've verified the >fix locally. > >That should also remove any bookkeeping issues that may have >triggered the stack below. I am currious if we should handle -ECONNRESET and -ESHUTDOWN in more detail in the completion handler and make sure that the request will not be added into the req_free list from there. Will review your v4 then. >>> The other "complications" are around making sure we can trust >>> the values in an inherently racey situation. The reasoning >>> can admittedly be difficult to follow at a glance, which incidentally >>> is why I went with a simple to prove timed wait in the past >>> (https://lore.kernel.org/20230912041910.726442-3-arakesh@google.com). >>> >>> I am not suggesting we go back to a timed wait, but please do look >>> at the patch and let me know which parts don't make sense, or are >>> difficult to understand. We can add more documentation about our >>> assumptions there, or if you have a way to do this that you >>> think is simpler to reason about, then please let me know and I'll >>> be more than happy to use that! >> >> I really try to spin my head around the idea of the is_abondoned flag >> you are using. Unfortunatly for now I am out to debug the issues I see >> with your series. >> >> So I did try these patches you send. Yes the deadlock error is gone with >> v3. But the linked list is still running into cases where >> dwc3_gadget_giveback(complete) is touching requests that are already >> freed. >> >> [   61.408715] ------------[ cut here ]------------ >> [   61.413897] kernel BUG at lib/list_debug.c:56! >> ... >> [   61.590762] Call trace: >> [   61.596890]  __list_del_entry_valid+0xb8/0xe8 >> [   61.603408]  dwc3_gadget_giveback+0x3c/0x1b0 >> [   61.607594]  dwc3_remove_requests.part.0+0xcc/0x100 >> [   61.612948]  __dwc3_gadget_ep_disable+0xbc/0x1b8 >> [   61.621019]  dwc3_gadget_ep_disable+0x48/0x100 >> [   61.627925]  usb_ep_disable+0x3c/0x138 >> [   61.638230]  uvc_function_setup_continue+0x3c/0x60 >> [   61.645040]  uvc_v4l2_streamoff+0x5c/0x80 >> [   61.659812]  v4l_streamoff+0x40/0x60 >> [   61.668950]  __video_do_ioctl+0x344/0x420 >> [   61.679548]  video_usercopy+0x1d0/0x788 >> [   61.685677]  video_ioctl2+0x40/0x70 >> [   61.697439]  v4l2_ioctl+0x68/0xa0 >> [   61.709200]  __arm64_sys_ioctl+0x304/0xda0 >> [   61.720768]  invoke_syscall.constprop.0+0x70/0x130 >> > > -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |