All of lore.kernel.org
 help / color / mirror / Atom feed
* usb: dwc2: gadget: high-bandwidth (mc > 1) status?
@ 2021-11-24  7:39 Pavel Hofman
  2021-11-24 14:04 ` Minas Harutyunyan
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Hofman @ 2021-11-24  7:39 UTC (permalink / raw)
  To: Minas Harutyunyan, linux-usb

Hi Minas at all,

Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC multiple 
transactions mc > 1 reliably? I found this condition 
https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c#L4041

	/* High bandwidth ISOC OUT in DDMA not supported */
	if (using_desc_dma(hsotg) && ep_type == USB_ENDPOINT_XFER_ISOC &&
	    !dir_in && mc > 1) {
		dev_err(hsotg->dev,
			"%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
		return -EINVAL;
	}

But I do not know how the Descriptor DMA is critical and whether 
disabling it will affect gadget performance seriously.

I know about the RX FIFO sizing requirement (and TX FIFO too I guess), 
the current default values can be increased for that particular use case 
if needed.

I am trying to learn if it made sense to spend time on adding support 
for high-bandwidth to the UAC2 audio gadget  to allow using larger 
bInterval and mc=2,3 at high samplerates/channel counts (sort of "burst 
mode" similar to UAC3). When doing some CPU-demanding DSP it would help 
to avoid the time-critical handling every 125us microframe. Both OUT and 
IN are important.


Thanks a lot for your expert advice.


Best regards,


Pavel.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usb: dwc2: gadget: high-bandwidth (mc > 1) status?
  2021-11-24  7:39 usb: dwc2: gadget: high-bandwidth (mc > 1) status? Pavel Hofman
@ 2021-11-24 14:04 ` Minas Harutyunyan
  2021-11-25  8:47   ` Pavel Hofman
  0 siblings, 1 reply; 7+ messages in thread
From: Minas Harutyunyan @ 2021-11-24 14:04 UTC (permalink / raw)
  To: Pavel Hofman, Minas Harutyunyan, linux-usb

Hi Pavel,

On 11/24/2021 11:39 AM, Pavel Hofman wrote:
> Hi Minas at all,
> 
> Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC multiple 
> transactions mc > 1 reliably? I found this condition 
> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c*L4041__;Iw!!A4F2R9G_pg!MMNE6CYvWEFeWt8W9pImwNA-N4_04U8UsBWQmu9O9Bwq1HalCAupyb9kzGBAOOMlKmt6xefz$ 
> 
>      /* High bandwidth ISOC OUT in DDMA not supported */
>      if (using_desc_dma(hsotg) && ep_type == USB_ENDPOINT_XFER_ISOC &&
>          !dir_in && mc > 1) {
>          dev_err(hsotg->dev,
>              "%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
>          return -EINVAL;
>      }
> 
> But I do not know how the Descriptor DMA is critical and whether 
> disabling it will affect gadget performance seriously.
> 
> I know about the RX FIFO sizing requirement (and TX FIFO too I guess), 
> the current default values can be increased for that particular use case 
> if needed.
> 
> I am trying to learn if it made sense to spend time on adding support 
> for high-bandwidth to the UAC2 audio gadget  to allow using larger 
> bInterval and mc=2,3 at high samplerates/channel counts (sort of "burst 
> mode" similar to UAC3). When doing some CPU-demanding DSP it would help 
> to avoid the time-critical handling every 125us microframe. Both OUT and 
> IN are important.
> 

According programming guide:

"Isochronous OUT Transfers
The application programming for isochronous out transfers is in the same 
manner as Bulk OUT transfer sequence, except that the application 
creates only 1 packet per descriptor for an isochronous OUT endpoint.
The controller handles isochronous OUT transfers internally in the same 
way it handles Bulk OUT transfers, and as depicted in Figure 10-28.
If the transfers are for a high-bandwidth endpoint (more than one MPS 
per μframe ), create as many descriptors as the number of packets in a 
μframe (number of descriptors = number of packets per μframe).
Maximum number of descriptors per μframe per endpoint is three."

To program descriptors to start HB ISOC OUT there are no any problem. 
Problem occurs on completions. If, for example mc > 1, driver will 
allocate and program mc * (request count) descriptors. If host send mc 
packets per frame then every mc descriptor perform request completion is 
not big problem. But if host will send less than mc packets in frame 
then not clear how to exclude unused descriptors from desc chain which 
already fetched by core - by stop transfers (disable EP) and re-start 
transfers (fill again desc chain) from next frame? Or purge unused descs 
and shifting descriptors "up" in a chain? You can try to implement.

Thanks,
Minas

> 
> Thanks a lot for your expert advice.
> 
> 
> Best regards,
> 
> 
> Pavel.
> 
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usb: dwc2: gadget: high-bandwidth (mc > 1) status?
  2021-11-24 14:04 ` Minas Harutyunyan
@ 2021-11-25  8:47   ` Pavel Hofman
  2021-11-26  6:35     ` Minas Harutyunyan
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Hofman @ 2021-11-25  8:47 UTC (permalink / raw)
  To: Minas Harutyunyan, linux-usb



Dne 24. 11. 21 v 15:04 Minas Harutyunyan napsal(a):
> Hi Pavel,
> 
> On 11/24/2021 11:39 AM, Pavel Hofman wrote:
>> Hi Minas at all,
>>
>> Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC multiple
>> transactions mc > 1 reliably? I found this condition
>> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c*L4041__;Iw!!A4F2R9G_pg!MMNE6CYvWEFeWt8W9pImwNA-N4_04U8UsBWQmu9O9Bwq1HalCAupyb9kzGBAOOMlKmt6xefz$
>>
>>       /* High bandwidth ISOC OUT in DDMA not supported */
>>       if (using_desc_dma(hsotg) && ep_type == USB_ENDPOINT_XFER_ISOC &&
>>           !dir_in && mc > 1) {
>>           dev_err(hsotg->dev,
>>               "%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
>>           return -EINVAL;
>>       }
>>
>> But I do not know how the Descriptor DMA is critical and whether
>> disabling it will affect gadget performance seriously.
>>
>> I know about the RX FIFO sizing requirement (and TX FIFO too I guess),
>> the current default values can be increased for that particular use case
>> if needed.
>>
>> I am trying to learn if it made sense to spend time on adding support
>> for high-bandwidth to the UAC2 audio gadget  to allow using larger
>> bInterval and mc=2,3 at high samplerates/channel counts (sort of "burst
>> mode" similar to UAC3). When doing some CPU-demanding DSP it would help
>> to avoid the time-critical handling every 125us microframe. Both OUT and
>> IN are important.
>>
> 
> According programming guide:
> 
> "Isochronous OUT Transfers
> The application programming for isochronous out transfers is in the same
> manner as Bulk OUT transfer sequence, except that the application
> creates only 1 packet per descriptor for an isochronous OUT endpoint.
> The controller handles isochronous OUT transfers internally in the same
> way it handles Bulk OUT transfers, and as depicted in Figure 10-28.
> If the transfers are for a high-bandwidth endpoint (more than one MPS
> per μframe ), create as many descriptors as the number of packets in a
> μframe (number of descriptors = number of packets per μframe).
> Maximum number of descriptors per μframe per endpoint is three."
> 
> To program descriptors to start HB ISOC OUT there are no any problem.
> Problem occurs on completions. If, for example mc > 1, driver will
> allocate and program mc * (request count) descriptors. If host send mc
> packets per frame then every mc descriptor perform request completion is
> not big problem. But if host will send less than mc packets in frame
> then not clear how to exclude unused descriptors from desc chain which
> already fetched by core - by stop transfers (disable EP) and re-start
> transfers (fill again desc chain) from next frame? Or purge unused descs
> and shifting descriptors "up" in a chain? You can try to implement.

Hi Minas, thanks for your hints. Unfortunately I am pretty new to dwc2, 
please can you point me to particular parts of the dwc2 code?

I found some dwc2 description which reads your quote in 
https://www.mouser.cn/datasheet/2/196/Infineon-xmc4500_rm_v1.6_2016-UM-v01_06-EN-598157.pdf 
(not for BCM2835 but hopefully the principle is similar). IIUC by 
descriptor the struct dwc2_dma_decs is meant.

I found a function gadget.c:dwc2_gadget_fill_isoc_desc which is called 
in dwc2_gadget_start_isoc_ddma and dwc2_hsotg_ep_queue. Is the code 
after the /* High bandwidth ISOC OUT in DDMA not supported */ comment in 
gadget.c:dwc2_hsotg_ep_enable() because the dwc2 core (the hardware) 
does not support HB in DDMA, or because the linux dwc2 driver does not 
implement the HB support in DDMA yet (which is what we are talking about)?

I am asking because if the HW did not support DDMA, the method 
dwc2_gadget_start_isoc_ddma would be out of game for my analysis, right? 
If the latter is the case, should the HB support implementation change 
dwc2_gadget_start_isoc_ddma?

Please can you explain a bit more the issue about the unused 
descriptors? This is how I understand it (poorly). The driver prepares 
descriptors for all mc required by the transfer (and reported by 
wMaxPacketSize to the host) so that the core (HW) can fill it via DMA. 
However, if the host does not need the whole packet size, it will send 
fewer packets per frame, and some of the dwc2_dma_decs descriptors would 
not be filled with data = unused. The core (HW) somehow marks the 
descriptors whether they were used or not, and the unused descriptors 
(i.e. containing old/bogus data) should not undergo completion somehow. 
But this sounds too simple, not what you described in your post :-)

Also, please when are completion interrupt requests thrown at ISOC OUT? 
After every packet=desc, or after the whole USB frame (i.e. after all 3 
packets in case of mc=3)? If after every packet, the HB mode with larger 
bInterval (less frequent frames with multiple packets) would not spare 
any interrupts/CPU load compared to more frequent frames with single 
packets (no HB mode) and adding the HB ISOC support would "only" allow 
higher ISOC bandwidth, not CPU load reduction. What is the case, please?

Thanks a lot,

Pavel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usb: dwc2: gadget: high-bandwidth (mc > 1) status?
  2021-11-25  8:47   ` Pavel Hofman
@ 2021-11-26  6:35     ` Minas Harutyunyan
  2021-11-26  8:53       ` Pavel Hofman
  0 siblings, 1 reply; 7+ messages in thread
From: Minas Harutyunyan @ 2021-11-26  6:35 UTC (permalink / raw)
  To: Pavel Hofman, Minas Harutyunyan, linux-usb

Hi Pavel,

On 11/25/2021 12:47 PM, Pavel Hofman wrote:
> 
> 
> Dne 24. 11. 21 v 15:04 Minas Harutyunyan napsal(a):
>> Hi Pavel,
>>
>> On 11/24/2021 11:39 AM, Pavel Hofman wrote:
>>> Hi Minas at all,
>>>
>>> Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC multiple
>>> transactions mc > 1 reliably? I found this condition
>>> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c*L4041__;Iw!!A4F2R9G_pg!MMNE6CYvWEFeWt8W9pImwNA-N4_04U8UsBWQmu9O9Bwq1HalCAupyb9kzGBAOOMlKmt6xefz$ 
>>>
>>>
>>>       /* High bandwidth ISOC OUT in DDMA not supported */
>>>       if (using_desc_dma(hsotg) && ep_type == USB_ENDPOINT_XFER_ISOC &&
>>>           !dir_in && mc > 1) {
>>>           dev_err(hsotg->dev,
>>>               "%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
>>>           return -EINVAL;
>>>       }
>>>
>>> But I do not know how the Descriptor DMA is critical and whether
>>> disabling it will affect gadget performance seriously.
>>>
>>> I know about the RX FIFO sizing requirement (and TX FIFO too I guess),
>>> the current default values can be increased for that particular use case
>>> if needed.
>>>
>>> I am trying to learn if it made sense to spend time on adding support
>>> for high-bandwidth to the UAC2 audio gadget  to allow using larger
>>> bInterval and mc=2,3 at high samplerates/channel counts (sort of "burst
>>> mode" similar to UAC3). When doing some CPU-demanding DSP it would help
>>> to avoid the time-critical handling every 125us microframe. Both OUT and
>>> IN are important.
>>>
>>
>> According programming guide:
>>
>> "Isochronous OUT Transfers
>> The application programming for isochronous out transfers is in the same
>> manner as Bulk OUT transfer sequence, except that the application
>> creates only 1 packet per descriptor for an isochronous OUT endpoint.
>> The controller handles isochronous OUT transfers internally in the same
>> way it handles Bulk OUT transfers, and as depicted in Figure 10-28.
>> If the transfers are for a high-bandwidth endpoint (more than one MPS
>> per μframe ), create as many descriptors as the number of packets in a
>> μframe (number of descriptors = number of packets per μframe).
>> Maximum number of descriptors per μframe per endpoint is three."
>>
>> To program descriptors to start HB ISOC OUT there are no any problem.
>> Problem occurs on completions. If, for example mc > 1, driver will
>> allocate and program mc * (request count) descriptors. If host send mc
>> packets per frame then every mc descriptor perform request completion is
>> not big problem. But if host will send less than mc packets in frame
>> then not clear how to exclude unused descriptors from desc chain which
>> already fetched by core - by stop transfers (disable EP) and re-start
>> transfers (fill again desc chain) from next frame? Or purge unused descs
>> and shifting descriptors "up" in a chain? You can try to implement.
> 
> Hi Minas, thanks for your hints. Unfortunately I am pretty new to dwc2, 
> please can you point me to particular parts of the dwc2 code?
> 
> I found some dwc2 description which reads your quote in 
> https://urldefense.com/v3/__https://www.mouser.cn/datasheet/2/196/Infineon-xmc4500_rm_v1.6_2016-UM-v01_06-EN-598157.pdf__;!!A4F2R9G_pg!Jg2wfkRUfyO2jrnLXmO7zO5W0Esw-TTgETCTe5mqtpub1mAmDY7QnixT8HmYyTp0rb_ac7Ot$ 
> (not for BCM2835 but hopefully the principle is similar). IIUC by 
> descriptor the struct dwc2_dma_decs is meant.
> 
Yes, descriptors declared in dwc2 as dwc2_dma_desc.

> I found a function gadget.c:dwc2_gadget_fill_isoc_desc which is called 
> in dwc2_gadget_start_isoc_ddma and dwc2_hsotg_ep_queue. Is the code 
> after the /* High bandwidth ISOC OUT in DDMA not supported */ comment in 
> gadget.c:dwc2_hsotg_ep_enable() because the dwc2 core (the hardware) 
> does not support HB in DDMA, or because the linux dwc2 driver does not 
> implement the HB support in DDMA yet (which is what we are talking about)?
HW supports HB ISOC OUT in DDMA, driver doesn't support. In mentioned by 
you databook, see chapter "16.11.3.2 Isochronous OUT".
> 
> I am asking because if the HW did not support DDMA, the method 
> dwc2_gadget_start_isoc_ddma would be out of game for my analysis, right? 
> If the latter is the case, should the HB support implementation change 
> dwc2_gadget_start_isoc_ddma?
> 
To support HB ISOC OUT should be updated dwc2_gadget_fill_isoc_desc() 
and dwc2_gadget_complete_isoc_request_ddma() functions.

> Please can you explain a bit more the issue about the unused 
> descriptors? This is how I understand it (poorly). The driver prepares 
> descriptors for all mc required by the transfer (and reported by 
> wMaxPacketSize to the host) so that the core (HW) can fill it via DMA. 
> However, if the host does not need the whole packet size, it will send 
> fewer packets per frame, and some of the dwc2_dma_decs descriptors would 
> not be filled with data = unused. The core (HW) somehow marks the 
> descriptors whether they were used or not, and the unused descriptors 
> (i.e. containing old/bogus data) should not undergo completion somehow.
Core doesn't mark unused descriptors.
Driver can detect that it is last packet in frame by checking DPID. If 
DPID is DATA0 then it's last packet in frame and need to complete 
appropriate usb request.
After completion of descriptor, core will process next descriptor which 
is prepared for just completed usb request but not for next request (at 
least from "buffer addresses" point of view).
In case if packet count sent by host in frame less than mc, driver 
should exclude remaining descs for completed usb request from descriptor 
list by "shifting up" descs in descriptor list. But I'm not sure that 
driver have enough time to do that before core fetch next descriptor, 
which should be already updated (at least "buffer address" should be 
point to address for next usb request).

> But this sounds too simple, not what you described in your post :-)
> 
> Also, please when are completion interrupt requests thrown at ISOC OUT? 
> After every packet=desc, or after the whole USB frame (i.e. after all 3 
> packets in case of mc=3)? If after every packet, the HB mode with larger 
> bInterval (less frequent frames with multiple packets) would not spare 
> any interrupts/CPU load compared to more frequent frames with single 
> packets (no HB mode) and adding the HB ISOC support would "only" allow 
> higher ISOC bandwidth, not CPU load reduction. What is the case, please?
Completion interrupt asserted on the end of descriptor processing, if 
IOC (Interrupt on completion) bit is set. For HB ISOC OUT this bit 
should be set on all descriptors.
> 
> Thanks a lot,
> 
> Pavel.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usb: dwc2: gadget: high-bandwidth (mc > 1) status?
  2021-11-26  6:35     ` Minas Harutyunyan
@ 2021-11-26  8:53       ` Pavel Hofman
  2021-11-26  9:49         ` Minas Harutyunyan
  0 siblings, 1 reply; 7+ messages in thread
From: Pavel Hofman @ 2021-11-26  8:53 UTC (permalink / raw)
  To: Minas Harutyunyan, linux-usb

Dne 26. 11. 21 v 7:35 Minas Harutyunyan napsal(a):
> Hi Pavel,
> 
> On 11/25/2021 12:47 PM, Pavel Hofman wrote:
>>
>>
>> Dne 24. 11. 21 v 15:04 Minas Harutyunyan napsal(a):
>>> Hi Pavel,
>>>
>>> On 11/24/2021 11:39 AM, Pavel Hofman wrote:
>>>> Hi Minas at all,
>>>>
>>>> Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC multiple
>>>> transactions mc > 1 reliably? I found this condition
>>>> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c*L4041__;Iw!!A4F2R9G_pg!MMNE6CYvWEFeWt8W9pImwNA-N4_04U8UsBWQmu9O9Bwq1HalCAupyb9kzGBAOOMlKmt6xefz$
>>>>
>>>>
>>>>        /* High bandwidth ISOC OUT in DDMA not supported */
>>>>        if (using_desc_dma(hsotg) && ep_type == USB_ENDPOINT_XFER_ISOC &&
>>>>            !dir_in && mc > 1) {
>>>>            dev_err(hsotg->dev,
>>>>                "%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
>>>>            return -EINVAL;
>>>>        }
>>>>
>>>> But I do not know how the Descriptor DMA is critical and whether
>>>> disabling it will affect gadget performance seriously.
>>>>
>>>> I know about the RX FIFO sizing requirement (and TX FIFO too I guess),
>>>> the current default values can be increased for that particular use case
>>>> if needed.
>>>>
>>>> I am trying to learn if it made sense to spend time on adding support
>>>> for high-bandwidth to the UAC2 audio gadget  to allow using larger
>>>> bInterval and mc=2,3 at high samplerates/channel counts (sort of "burst
>>>> mode" similar to UAC3). When doing some CPU-demanding DSP it would help
>>>> to avoid the time-critical handling every 125us microframe. Both OUT and
>>>> IN are important.
>>>>
>>>
>>> According programming guide:
>>>
>>> "Isochronous OUT Transfers
>>> The application programming for isochronous out transfers is in the same
>>> manner as Bulk OUT transfer sequence, except that the application
>>> creates only 1 packet per descriptor for an isochronous OUT endpoint.
>>> The controller handles isochronous OUT transfers internally in the same
>>> way it handles Bulk OUT transfers, and as depicted in Figure 10-28.
>>> If the transfers are for a high-bandwidth endpoint (more than one MPS
>>> per μframe ), create as many descriptors as the number of packets in a
>>> μframe (number of descriptors = number of packets per μframe).
>>> Maximum number of descriptors per μframe per endpoint is three."
>>>
>>> To program descriptors to start HB ISOC OUT there are no any problem.
>>> Problem occurs on completions. If, for example mc > 1, driver will
>>> allocate and program mc * (request count) descriptors. If host send mc
>>> packets per frame then every mc descriptor perform request completion is
>>> not big problem. But if host will send less than mc packets in frame
>>> then not clear how to exclude unused descriptors from desc chain which
>>> already fetched by core - by stop transfers (disable EP) and re-start
>>> transfers (fill again desc chain) from next frame? Or purge unused descs
>>> and shifting descriptors "up" in a chain? You can try to implement.
>>
>> Hi Minas, thanks for your hints. Unfortunately I am pretty new to dwc2,
>> please can you point me to particular parts of the dwc2 code?
>>
>> I found some dwc2 description which reads your quote in
>> https://urldefense.com/v3/__https://www.mouser.cn/datasheet/2/196/Infineon-xmc4500_rm_v1.6_2016-UM-v01_06-EN-598157.pdf__;!!A4F2R9G_pg!Jg2wfkRUfyO2jrnLXmO7zO5W0Esw-TTgETCTe5mqtpub1mAmDY7QnixT8HmYyTp0rb_ac7Ot$
>> (not for BCM2835 but hopefully the principle is similar). IIUC by
>> descriptor the struct dwc2_dma_decs is meant.
>>
> Yes, descriptors declared in dwc2 as dwc2_dma_desc.
> 
>> I found a function gadget.c:dwc2_gadget_fill_isoc_desc which is called
>> in dwc2_gadget_start_isoc_ddma and dwc2_hsotg_ep_queue. Is the code
>> after the /* High bandwidth ISOC OUT in DDMA not supported */ comment in
>> gadget.c:dwc2_hsotg_ep_enable() because the dwc2 core (the hardware)
>> does not support HB in DDMA, or because the linux dwc2 driver does not
>> implement the HB support in DDMA yet (which is what we are talking about)?
> HW supports HB ISOC OUT in DDMA, driver doesn't support. In mentioned by
> you databook, see chapter "16.11.3.2 Isochronous OUT".
>>
>> I am asking because if the HW did not support DDMA, the method
>> dwc2_gadget_start_isoc_ddma would be out of game for my analysis, right?
>> If the latter is the case, should the HB support implementation change
>> dwc2_gadget_start_isoc_ddma?
>>
> To support HB ISOC OUT should be updated dwc2_gadget_fill_isoc_desc()
> and dwc2_gadget_complete_isoc_request_ddma() functions.
> 
>> Please can you explain a bit more the issue about the unused
>> descriptors? This is how I understand it (poorly). The driver prepares
>> descriptors for all mc required by the transfer (and reported by
>> wMaxPacketSize to the host) so that the core (HW) can fill it via DMA.
>> However, if the host does not need the whole packet size, it will send
>> fewer packets per frame, and some of the dwc2_dma_decs descriptors would
>> not be filled with data = unused. The core (HW) somehow marks the
>> descriptors whether they were used or not, and the unused descriptors
>> (i.e. containing old/bogus data) should not undergo completion somehow.
> Core doesn't mark unused descriptors.
> Driver can detect that it is last packet in frame by checking DPID. If
> DPID is DATA0 then it's last packet in frame and need to complete
> appropriate usb request.
> After completion of descriptor, core will process next descriptor which
> is prepared for just completed usb request but not for next request (at
> least from "buffer addresses" point of view).
> In case if packet count sent by host in frame less than mc, driver
> should exclude remaining descs for completed usb request from descriptor
> list by "shifting up" descs in descriptor list. But I'm not sure that
> driver have enough time to do that before core fetch next descriptor,
> which should be already updated (at least "buffer address" should be
> point to address for next usb request).
> 
>> But this sounds too simple, not what you described in your post :-)
>>
>> Also, please when are completion interrupt requests thrown at ISOC OUT?
>> After every packet=desc, or after the whole USB frame (i.e. after all 3
>> packets in case of mc=3)? If after every packet, the HB mode with larger
>> bInterval (less frequent frames with multiple packets) would not spare
>> any interrupts/CPU load compared to more frequent frames with single
>> packets (no HB mode) and adding the HB ISOC support would "only" allow
>> higher ISOC bandwidth, not CPU load reduction. What is the case, please?
> Completion interrupt asserted on the end of descriptor processing, if
> IOC (Interrupt on completion) bit is set. For HB ISOC OUT this bit
> should be set on all descriptors.
>>

Minas, thanks for your expert answer. Just a quick question regarding 
your previous paragraph - does it mean that ISOC OUT with mc=2 at 
bInterval=2 yields 8k completion IRQs, just like with mc=1 at 
bInterval=1? If so, no real CPU workload would be spared by implementing 
the HB support.

Is there any chance to complete all descriptors filled in one frame with 
one IRQ, by setting the IOC bit only to the last descriptor? IIUC that 
would cause issues when the host does not send data for all descriptors 
"prepared" by the gadget (as discussed above) but IMO that could be 
handled somehow (host would likely not change the number of transactions 
within one continuous stream, gadget could "estimate" how many 
transactions would be used by the host for the particular altsetting). 
Just trying to find if any way to reduce the IRQs is possible :-)

Thanks a lot! Best regards,

Pavel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usb: dwc2: gadget: high-bandwidth (mc > 1) status?
  2021-11-26  8:53       ` Pavel Hofman
@ 2021-11-26  9:49         ` Minas Harutyunyan
  2021-11-26 11:01           ` Pavel Hofman
  0 siblings, 1 reply; 7+ messages in thread
From: Minas Harutyunyan @ 2021-11-26  9:49 UTC (permalink / raw)
  To: Pavel Hofman, Minas Harutyunyan, linux-usb

On 11/26/2021 12:53 PM, Pavel Hofman wrote:
> Dne 26. 11. 21 v 7:35 Minas Harutyunyan napsal(a):
>> Hi Pavel,
>>
>> On 11/25/2021 12:47 PM, Pavel Hofman wrote:
>>>
>>>
>>> Dne 24. 11. 21 v 15:04 Minas Harutyunyan napsal(a):
>>>> Hi Pavel,
>>>>
>>>> On 11/24/2021 11:39 AM, Pavel Hofman wrote:
>>>>> Hi Minas at all,
>>>>>
>>>>> Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC 
>>>>> multiple
>>>>> transactions mc > 1 reliably? I found this condition
>>>>> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c*L4041__;Iw!!A4F2R9G_pg!MMNE6CYvWEFeWt8W9pImwNA-N4_04U8UsBWQmu9O9Bwq1HalCAupyb9kzGBAOOMlKmt6xefz$ 
>>>>>
>>>>>
>>>>>
>>>>>        /* High bandwidth ISOC OUT in DDMA not supported */
>>>>>        if (using_desc_dma(hsotg) && ep_type == 
>>>>> USB_ENDPOINT_XFER_ISOC &&
>>>>>            !dir_in && mc > 1) {
>>>>>            dev_err(hsotg->dev,
>>>>>                "%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
>>>>>            return -EINVAL;
>>>>>        }
>>>>>
>>>>> But I do not know how the Descriptor DMA is critical and whether
>>>>> disabling it will affect gadget performance seriously.
>>>>>
>>>>> I know about the RX FIFO sizing requirement (and TX FIFO too I guess),
>>>>> the current default values can be increased for that particular use 
>>>>> case
>>>>> if needed.
>>>>>
>>>>> I am trying to learn if it made sense to spend time on adding support
>>>>> for high-bandwidth to the UAC2 audio gadget  to allow using larger
>>>>> bInterval and mc=2,3 at high samplerates/channel counts (sort of 
>>>>> "burst
>>>>> mode" similar to UAC3). When doing some CPU-demanding DSP it would 
>>>>> help
>>>>> to avoid the time-critical handling every 125us microframe. Both 
>>>>> OUT and
>>>>> IN are important.
>>>>>
>>>>
>>>> According programming guide:
>>>>
>>>> "Isochronous OUT Transfers
>>>> The application programming for isochronous out transfers is in the 
>>>> same
>>>> manner as Bulk OUT transfer sequence, except that the application
>>>> creates only 1 packet per descriptor for an isochronous OUT endpoint.
>>>> The controller handles isochronous OUT transfers internally in the same
>>>> way it handles Bulk OUT transfers, and as depicted in Figure 10-28.
>>>> If the transfers are for a high-bandwidth endpoint (more than one MPS
>>>> per μframe ), create as many descriptors as the number of packets in a
>>>> μframe (number of descriptors = number of packets per μframe).
>>>> Maximum number of descriptors per μframe per endpoint is three."
>>>>
>>>> To program descriptors to start HB ISOC OUT there are no any problem.
>>>> Problem occurs on completions. If, for example mc > 1, driver will
>>>> allocate and program mc * (request count) descriptors. If host send mc
>>>> packets per frame then every mc descriptor perform request 
>>>> completion is
>>>> not big problem. But if host will send less than mc packets in frame
>>>> then not clear how to exclude unused descriptors from desc chain which
>>>> already fetched by core - by stop transfers (disable EP) and re-start
>>>> transfers (fill again desc chain) from next frame? Or purge unused 
>>>> descs
>>>> and shifting descriptors "up" in a chain? You can try to implement.
>>>
>>> Hi Minas, thanks for your hints. Unfortunately I am pretty new to dwc2,
>>> please can you point me to particular parts of the dwc2 code?
>>>
>>> I found some dwc2 description which reads your quote in
>>> https://urldefense.com/v3/__https://www.mouser.cn/datasheet/2/196/Infineon-xmc4500_rm_v1.6_2016-UM-v01_06-EN-598157.pdf__;!!A4F2R9G_pg!Jg2wfkRUfyO2jrnLXmO7zO5W0Esw-TTgETCTe5mqtpub1mAmDY7QnixT8HmYyTp0rb_ac7Ot$ 
>>>
>>> (not for BCM2835 but hopefully the principle is similar). IIUC by
>>> descriptor the struct dwc2_dma_decs is meant.
>>>
>> Yes, descriptors declared in dwc2 as dwc2_dma_desc.
>>
>>> I found a function gadget.c:dwc2_gadget_fill_isoc_desc which is called
>>> in dwc2_gadget_start_isoc_ddma and dwc2_hsotg_ep_queue. Is the code
>>> after the /* High bandwidth ISOC OUT in DDMA not supported */ comment in
>>> gadget.c:dwc2_hsotg_ep_enable() because the dwc2 core (the hardware)
>>> does not support HB in DDMA, or because the linux dwc2 driver does not
>>> implement the HB support in DDMA yet (which is what we are talking 
>>> about)?
>> HW supports HB ISOC OUT in DDMA, driver doesn't support. In mentioned by
>> you databook, see chapter "16.11.3.2 Isochronous OUT".
>>>
>>> I am asking because if the HW did not support DDMA, the method
>>> dwc2_gadget_start_isoc_ddma would be out of game for my analysis, right?
>>> If the latter is the case, should the HB support implementation change
>>> dwc2_gadget_start_isoc_ddma?
>>>
>> To support HB ISOC OUT should be updated dwc2_gadget_fill_isoc_desc()
>> and dwc2_gadget_complete_isoc_request_ddma() functions.
>>
>>> Please can you explain a bit more the issue about the unused
>>> descriptors? This is how I understand it (poorly). The driver prepares
>>> descriptors for all mc required by the transfer (and reported by
>>> wMaxPacketSize to the host) so that the core (HW) can fill it via DMA.
>>> However, if the host does not need the whole packet size, it will send
>>> fewer packets per frame, and some of the dwc2_dma_decs descriptors would
>>> not be filled with data = unused. The core (HW) somehow marks the
>>> descriptors whether they were used or not, and the unused descriptors
>>> (i.e. containing old/bogus data) should not undergo completion somehow.
>> Core doesn't mark unused descriptors.
>> Driver can detect that it is last packet in frame by checking DPID. If
>> DPID is DATA0 then it's last packet in frame and need to complete
>> appropriate usb request.
>> After completion of descriptor, core will process next descriptor which
>> is prepared for just completed usb request but not for next request (at
>> least from "buffer addresses" point of view).
>> In case if packet count sent by host in frame less than mc, driver
>> should exclude remaining descs for completed usb request from descriptor
>> list by "shifting up" descs in descriptor list. But I'm not sure that
>> driver have enough time to do that before core fetch next descriptor,
>> which should be already updated (at least "buffer address" should be
>> point to address for next usb request).
>>
>>> But this sounds too simple, not what you described in your post :-)
>>>
>>> Also, please when are completion interrupt requests thrown at ISOC OUT?
>>> After every packet=desc, or after the whole USB frame (i.e. after all 3
>>> packets in case of mc=3)? If after every packet, the HB mode with larger
>>> bInterval (less frequent frames with multiple packets) would not spare
>>> any interrupts/CPU load compared to more frequent frames with single
>>> packets (no HB mode) and adding the HB ISOC support would "only" allow
>>> higher ISOC bandwidth, not CPU load reduction. What is the case, please?
>> Completion interrupt asserted on the end of descriptor processing, if
>> IOC (Interrupt on completion) bit is set. For HB ISOC OUT this bit
>> should be set on all descriptors.
>>>
> 
> Minas, thanks for your expert answer. Just a quick question regarding 
> your previous paragraph - does it mean that ISOC OUT with mc=2 at 
> bInterval=2 yields 8k completion IRQs, just like with mc=1 at 
> bInterval=1? If so, no real CPU workload would be spared by implementing 
> the HB support.
Yes, if IOC bit set for all descriptors. For HB ISOC OUT per me should 
be set in all descriptors.

> 
> Is there any chance to complete all descriptors filled in one frame with 
> one IRQ, by setting the IOC bit only to the last descriptor? 
Because of host can send less packets than mc then in this case we can 
miss frame/usb request completion. I mean, data from next frame will be 
DMA-ed to buffer dedicated for previous frame (unused descriptor).

> IIUC that 
> would cause issues when the host does not send data for all descriptors 
> "prepared" by the gadget (as discussed above) but IMO that could be 
> handled somehow (host would likely not change the number of transactions 
> within one continuous stream, gadget could "estimate" how many 
> transactions would be used by the host for the particular altsetting). 
> Just trying to find if any way to reduce the IRQs is possible :-)
> 
I don't know the safe way to reduce the IRQs :-(.

Thanks,
Minas

> Thanks a lot! Best regards,
> 
> Pavel.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: usb: dwc2: gadget: high-bandwidth (mc > 1) status?
  2021-11-26  9:49         ` Minas Harutyunyan
@ 2021-11-26 11:01           ` Pavel Hofman
  0 siblings, 0 replies; 7+ messages in thread
From: Pavel Hofman @ 2021-11-26 11:01 UTC (permalink / raw)
  To: Minas Harutyunyan, linux-usb


Dne 26. 11. 21 v 10:49 Minas Harutyunyan napsal(a):
> On 11/26/2021 12:53 PM, Pavel Hofman wrote:
>> Dne 26. 11. 21 v 7:35 Minas Harutyunyan napsal(a):
>>> Hi Pavel,
>>>
>>> On 11/25/2021 12:47 PM, Pavel Hofman wrote:
>>>>
>>>>
>>>> Dne 24. 11. 21 v 15:04 Minas Harutyunyan napsal(a):
>>>>> Hi Pavel,
>>>>>
>>>>> On 11/24/2021 11:39 AM, Pavel Hofman wrote:
>>>>>> Hi Minas at all,
>>>>>>
>>>>>> Please does dwc2 (specifically in BCM2835/RPi) support HS ISOC
>>>>>> multiple
>>>>>> transactions mc > 1 reliably? I found this condition
>>>>>> https://urldefense.com/v3/__https://elixir.bootlin.com/linux/v5.16-rc2/source/drivers/usb/dwc2/gadget.c*L4041__;Iw!!A4F2R9G_pg!MMNE6CYvWEFeWt8W9pImwNA-N4_04U8UsBWQmu9O9Bwq1HalCAupyb9kzGBAOOMlKmt6xefz$
>>>>>>
>>>>>>
>>>>>>
>>>>>>         /* High bandwidth ISOC OUT in DDMA not supported */
>>>>>>         if (using_desc_dma(hsotg) && ep_type ==
>>>>>> USB_ENDPOINT_XFER_ISOC &&
>>>>>>             !dir_in && mc > 1) {
>>>>>>             dev_err(hsotg->dev,
>>>>>>                 "%s: ISOC OUT, DDMA: HB not supported!\n", __func__);
>>>>>>             return -EINVAL;
>>>>>>         }
>>>>>>
>>>>>> But I do not know how the Descriptor DMA is critical and whether
>>>>>> disabling it will affect gadget performance seriously.
>>>>>>
>>>>>> I know about the RX FIFO sizing requirement (and TX FIFO too I guess),
>>>>>> the current default values can be increased for that particular use
>>>>>> case
>>>>>> if needed.
>>>>>>
>>>>>> I am trying to learn if it made sense to spend time on adding support
>>>>>> for high-bandwidth to the UAC2 audio gadget  to allow using larger
>>>>>> bInterval and mc=2,3 at high samplerates/channel counts (sort of
>>>>>> "burst
>>>>>> mode" similar to UAC3). When doing some CPU-demanding DSP it would
>>>>>> help
>>>>>> to avoid the time-critical handling every 125us microframe. Both
>>>>>> OUT and
>>>>>> IN are important.
>>>>>>
>>>>>
>>>>> According programming guide:
>>>>>
>>>>> "Isochronous OUT Transfers
>>>>> The application programming for isochronous out transfers is in the
>>>>> same
>>>>> manner as Bulk OUT transfer sequence, except that the application
>>>>> creates only 1 packet per descriptor for an isochronous OUT endpoint.
>>>>> The controller handles isochronous OUT transfers internally in the same
>>>>> way it handles Bulk OUT transfers, and as depicted in Figure 10-28.
>>>>> If the transfers are for a high-bandwidth endpoint (more than one MPS
>>>>> per μframe ), create as many descriptors as the number of packets in a
>>>>> μframe (number of descriptors = number of packets per μframe).
>>>>> Maximum number of descriptors per μframe per endpoint is three."
>>>>>
>>>>> To program descriptors to start HB ISOC OUT there are no any problem.
>>>>> Problem occurs on completions. If, for example mc > 1, driver will
>>>>> allocate and program mc * (request count) descriptors. If host send mc
>>>>> packets per frame then every mc descriptor perform request
>>>>> completion is
>>>>> not big problem. But if host will send less than mc packets in frame
>>>>> then not clear how to exclude unused descriptors from desc chain which
>>>>> already fetched by core - by stop transfers (disable EP) and re-start
>>>>> transfers (fill again desc chain) from next frame? Or purge unused
>>>>> descs
>>>>> and shifting descriptors "up" in a chain? You can try to implement.
>>>>
>>>> Hi Minas, thanks for your hints. Unfortunately I am pretty new to dwc2,
>>>> please can you point me to particular parts of the dwc2 code?
>>>>
>>>> I found some dwc2 description which reads your quote in
>>>> https://urldefense.com/v3/__https://www.mouser.cn/datasheet/2/196/Infineon-xmc4500_rm_v1.6_2016-UM-v01_06-EN-598157.pdf__;!!A4F2R9G_pg!Jg2wfkRUfyO2jrnLXmO7zO5W0Esw-TTgETCTe5mqtpub1mAmDY7QnixT8HmYyTp0rb_ac7Ot$
>>>>
>>>> (not for BCM2835 but hopefully the principle is similar). IIUC by
>>>> descriptor the struct dwc2_dma_decs is meant.
>>>>
>>> Yes, descriptors declared in dwc2 as dwc2_dma_desc.
>>>
>>>> I found a function gadget.c:dwc2_gadget_fill_isoc_desc which is called
>>>> in dwc2_gadget_start_isoc_ddma and dwc2_hsotg_ep_queue. Is the code
>>>> after the /* High bandwidth ISOC OUT in DDMA not supported */ comment in
>>>> gadget.c:dwc2_hsotg_ep_enable() because the dwc2 core (the hardware)
>>>> does not support HB in DDMA, or because the linux dwc2 driver does not
>>>> implement the HB support in DDMA yet (which is what we are talking
>>>> about)?
>>> HW supports HB ISOC OUT in DDMA, driver doesn't support. In mentioned by
>>> you databook, see chapter "16.11.3.2 Isochronous OUT".
>>>>
>>>> I am asking because if the HW did not support DDMA, the method
>>>> dwc2_gadget_start_isoc_ddma would be out of game for my analysis, right?
>>>> If the latter is the case, should the HB support implementation change
>>>> dwc2_gadget_start_isoc_ddma?
>>>>
>>> To support HB ISOC OUT should be updated dwc2_gadget_fill_isoc_desc()
>>> and dwc2_gadget_complete_isoc_request_ddma() functions.
>>>
>>>> Please can you explain a bit more the issue about the unused
>>>> descriptors? This is how I understand it (poorly). The driver prepares
>>>> descriptors for all mc required by the transfer (and reported by
>>>> wMaxPacketSize to the host) so that the core (HW) can fill it via DMA.
>>>> However, if the host does not need the whole packet size, it will send
>>>> fewer packets per frame, and some of the dwc2_dma_decs descriptors would
>>>> not be filled with data = unused. The core (HW) somehow marks the
>>>> descriptors whether they were used or not, and the unused descriptors
>>>> (i.e. containing old/bogus data) should not undergo completion somehow.
>>> Core doesn't mark unused descriptors.
>>> Driver can detect that it is last packet in frame by checking DPID. If
>>> DPID is DATA0 then it's last packet in frame and need to complete
>>> appropriate usb request.
>>> After completion of descriptor, core will process next descriptor which
>>> is prepared for just completed usb request but not for next request (at
>>> least from "buffer addresses" point of view).
>>> In case if packet count sent by host in frame less than mc, driver
>>> should exclude remaining descs for completed usb request from descriptor
>>> list by "shifting up" descs in descriptor list. But I'm not sure that
>>> driver have enough time to do that before core fetch next descriptor,
>>> which should be already updated (at least "buffer address" should be
>>> point to address for next usb request).
>>>
>>>> But this sounds too simple, not what you described in your post :-)
>>>>
>>>> Also, please when are completion interrupt requests thrown at ISOC OUT?
>>>> After every packet=desc, or after the whole USB frame (i.e. after all 3
>>>> packets in case of mc=3)? If after every packet, the HB mode with larger
>>>> bInterval (less frequent frames with multiple packets) would not spare
>>>> any interrupts/CPU load compared to more frequent frames with single
>>>> packets (no HB mode) and adding the HB ISOC support would "only" allow
>>>> higher ISOC bandwidth, not CPU load reduction. What is the case, please?
>>> Completion interrupt asserted on the end of descriptor processing, if
>>> IOC (Interrupt on completion) bit is set. For HB ISOC OUT this bit
>>> should be set on all descriptors.
>>>>
>>
>> Minas, thanks for your expert answer. Just a quick question regarding
>> your previous paragraph - does it mean that ISOC OUT with mc=2 at
>> bInterval=2 yields 8k completion IRQs, just like with mc=1 at
>> bInterval=1? If so, no real CPU workload would be spared by implementing
>> the HB support.
> Yes, if IOC bit set for all descriptors. For HB ISOC OUT per me should
> be set in all descriptors.
> 
>>
>> Is there any chance to complete all descriptors filled in one frame with
>> one IRQ, by setting the IOC bit only to the last descriptor?
> Because of host can send less packets than mc then in this case we can
> miss frame/usb request completion. I mean, data from next frame will be
> DMA-ed to buffer dedicated for previous frame (unused descriptor).
> 


This is a very non-expert idea: The ISOC stream starts upon switching 
altsetting from 0. It's very unlikely the host would change number of 
packets sent per frame within one altsetting (maybe?). The first desc 
set would have the IOC bit set for all descriptors corresponding to the 
wMaxPacketSize mc value. Upon completing all descriptors from the first 
frame in the "stream" (i.e. after the altsetting was changed) and 
detecting which frames were not completed/used, the scheduling method 
would learn how many packets are actually used by the host in that 
specific stream run. For the next packet it would schedule only 
appropriate number of descriptors, and set the IOC bit only for the last 
desc. Upon "resetting" the stream by switching back to altsetting 0 the 
"logic" would reset, running the packet count detection at start of the 
next "stream" again.

Could theoretically something like this work (likely in co-operation 
with a specific gadget function)?

Thanks a lot for your patience :-)

Best regards,

Pavel.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-26 11:03 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-11-24  7:39 usb: dwc2: gadget: high-bandwidth (mc > 1) status? Pavel Hofman
2021-11-24 14:04 ` Minas Harutyunyan
2021-11-25  8:47   ` Pavel Hofman
2021-11-26  6:35     ` Minas Harutyunyan
2021-11-26  8:53       ` Pavel Hofman
2021-11-26  9:49         ` Minas Harutyunyan
2021-11-26 11:01           ` Pavel Hofman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.