Hi, Sriharsha Allenki writes: >>> If the HWO bit is set for the TRB (or the first TRB if scatter-gather >>> is used) of a request, it implies that core is still processing it. >>> In that case do not reclaim that TRB and do not giveback the >>> request to the function driver, else it will result in a SMMU >>> translation fault when core tries to access the buffer >>> corresponding to this TRB. >> This is not entirely true. There are cases where driver *must* clear HWO >> bit manually and driver currently accounts for that. Care to explain > > Based on my understanding I am trying to list down the two cases where > driver must clear HWO bit manually and how the patch would not regress > these. > > Additionally, I want to add that this patch is checking for req->trb > (not the TRB pointed by the *trb_dequeue*) which will be pointing to > the first TRB in the case of SG and in the case of linear it point to > the TRB containing the data (not theextra_trb or the trb to handle > zero length packet). > > *Case-1*: > > We are in the middle of series of chained TRBs and we received a short > transfer along the way. Here is the code handling this case: > >         /* >          * If we're in the middle of series of chained TRBs and we >          * receive a short transfer along the way, DWC3 will skip >          * through all TRBs including the last TRB in the chain (the >          * where CHN bit is zero. DWC3 will also avoid clearing HWO >          * bit and SW has to do it manually. >          * >          * We're going to do that here to avoid problems of HW trying >          * to use bogus TRBs for transfers. >          */ >         if (chain && (trb->ctrl & DWC3_TRB_CTRL_HWO)) >                 trb->ctrl &= ~DWC3_TRB_CTRL_HWO; > > > This case occurs only after the first TRB of the chain is processed, > which we arechecking in the patch. Although, this piece of code has > been no-op after introducingthe function > "dwc3_gadget_ep_reclaim_trb_sg".This function checks for the HWO and > does notcall the "dwc3_gadget_ep_reclaim_completed_trb" if it is > set.Hence this condition mostly likely will never hit. You're missing one important detail: If we have e.g. 200 TRBs in a single SG-list and we receive a short packet on TRB 10, we will have 190 TRBs with HWO bit left set and your patch prevents the driver from clearing that bit. Yes, you are regressing a very special case. > *Case-2*: > The second case is wherewe append the actual data buffer TRB with an > extra_trb > for unaligned OUT transfer. Code handling this is: > >         /* >          * If we're dealing with unaligned size OUT transfer, we will > be left >          * with one TRB pending in the ring. We need to manually clear > HWO bit >          * from that TRB. >          */ > >         if (req->needs_extra_trb && !(trb->ctrl & DWC3_TRB_CTRL_CHN)) { >                 trb->ctrl &= ~DWC3_TRB_CTRL_HWO; >                 return 1; >         } > > This also requires that the actual data buffer TRB should be processed > and then we areexpected to reclaim this extra_trb. If the TRB > corresponding to the actual data (req->trb)is not processed we are not > expecting this extra_trb to be reclaimed. if you read dwc3_gadget_ep_reclaim_complete_trb() carefully, you will see that we *never* touch the next TRB from inside that function. We rely on the fact that this function will be called AGAIN to clear HWO bit on both of these special cases and you're causing a regression to both of them. > So, both these cases occurs and valid only if the first TRB/TRB > containing the dataof the request(req->trb) is processed. yes, and your patch makes no distinction of what type of TRB we're dealing with. In the case of unaligned transfers, we would giveback the first TRB with the unaligned length and never clear HWO from the chained TRB following it because your patch would prevent dwc3_gadget_ep_reclaim_trb*() from even being called. > The proposed change is looking for thecompletion of this TRB, soI > don't think this change will regress the above mentioned cases. it will >> what problem you actually found? Preferrably with tracepoint data >> showing the fault. > > Test case here involves f_fs driver in AIO mode and we see ~8 TRBs in > the queue with HWO set and UPDATE_XFER done. In the failure case I see > thatas part of processingthe interrupt generated by the core for the > completion of the first TRB, the driver isgoing ahead and giving we shouldn't get completion interrupt for the first TRB, only the last. Care to share tracepoint data? > backthe requests of all theother queued TRBs, whichinvolves removing > the SMMU mapping of the buffers associated with the requests. But > these are still active and when core processesthese TRBs and their > correspondingun-mapped buffers, I see a translationfaultraised by the > SMMU. > > I hope I have answered your queries, please let me know if I am still > missing something here. yes, tracepoint data showing the problem. Thank you -- balbi