On Wed, Apr 28, 2021 at 11:25:05PM +0000, Thinh Nguyen wrote: >Michael Grzeschik wrote: >> Hi Thinh, >> >> On Wed, Apr 28, 2021 at 01:45:11AM +0000, Thinh Nguyen wrote: >>> Michael Grzeschik wrote: >>>> On Tue, Apr 27, 2021 at 03:18:57AM +0000, Thinh Nguyen wrote: >>>>> Hi Michael, >>>>> >>>>> Thinh Nguyen wrote: >>>>>> Felipe Balbi wrote: >>>>>>> >>>>>>> HI, >>>>>>> >>>>>>> Michael Grzeschik writes: >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> I think I see the issue that Michael reported. >>>>>> >>>>>> The problem is that we're using num_pending_sgs to track both >>>>>> pending SG >>>>>> entries and queued SG entries. num_pending_sgs doesn't get updated >>>>>> until >>>>>> TRB completion interrupt (ie XferInProgress). Before the driver queues >>>>>> more SG requests, it will check if there's any pending SG in the >>>>>> started >>>>>> request list before it prepares more. Since the num_pending_sgs >>>>>> doesn't >>>>>> get updated until the request is completed, the driver doesn't process >>>>>> more until the request is completed. >>>>>> >>>>>> I need to review more on Michael's patches next week, but I think what >>>>>> he suggested makes sense (in term of properly usage of queued sgs vs >>>>>> pending sgs). BTW, please correct me if I'm wrong, but we do modify >>>>>> num_queued_sgs. >>>>>> >>>>> >>>>> There's still some issue with your patch. I think this should cover it. >>>>> Let me know if it works for you. >>>> >>>> This works for me! Will you spin a proper patch from that? >>> >>> Sure. I can do that after 5.13-rc1 is released >> >> Great! >> >>>> >>>>> Note: this however probably needs more "Tested-by" and reviews >>>>> to make sure I'm not missing anything. I only ran some basic tests, >>>>> and will need to run more. >>>> >>>> You may already have mine: >>>> >>>> Tested-by: Michael Grzeschik >>>> >>>>> Let me know if this makes sense. >>>> >>>> From what I understand about the issue and the purpose of all >>>> variables this makes total sense to me. So thanks for taking over >>>> and make a proper solution. >>>> >>>> Thanks, >>>> Michael >>>> >>> >>> Thanks for the Tested-by. >>> >>> Btw, any reason for using SG with transfer less than PAGE_SIZE? I assume >>> your platform is 4KB, but you're splitting your 3KB transfer to smaller. >>> Was it like this before? Note that DWC3 has a limited number of internal >>> TRB cache. But what you're doing now is fine and doesn't impact >>> performance. >> >> It all comes from the limitation of the uvc_video gadget. Look into the >> patches I send to support sg in uvc_video driver. It just maps entries >> from an sg list comming from videobuf2 to an req->sg list. In >> uvc_video_alloc_requests the req->length will be set to req_size which >> is calculated with: >> >>  ep->maxpacket(1024) * maxburst(1-15) * mult(1-3) >> >> With maxburst = 1 and mult = 3 it currently reults in an req->length of: >> >>  1024 * 1 * 3 = 3072 > >I wasn't referring to this, I was just wondering why uvc uses SG. >Correct me if I'm wrong, but isn't uvc allocates and uses contiguous >buffer? Not necessarily. Depending on the source creating vb2 buffers with mmu enabled or directly you may get different patterns. The patches I send are able to also handle contiguous buffers. In that case vb2 would just hand over a scatter list with one big entry, which would be totally fine. In that case we would not have to scatter it for the gadget controller. >Depend on the setup, normal request may show better performance. Right. In my setup the data is coming from an mmu. >> This request will use one extra sg created from the uvc driver adding >> a header of 2 bytes per request and then has 3070 bytes left for the >> payload which will look into the vb2 sg list and increment every sg >> entrie with the the used offset until it reaches sg page boundaries >> and in that case will have to add one extra sg. >> >> So this results in the funny patterns you see. >> >> We can increase the req->length to 46080 with a maxburst of 15 for usb3 >> but I did not try that out yet. >> > >I wouldn't recommend that. Unless you know you're using both DWC3x >device and host, most other controllers we know cannot handle high isoc >throughput, especially at bInterval=1. Right. I think Felipe mentioned that as well. As the parameters are possible to tweak from userspace this is all fine. The user just will have to find the compromise between performance and stability. Thanks, Michael -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |