All of lore.kernel.org
 help / color / mirror / Atom feed
* snd-usb-audio Buffer Sizes and Round Trip Latency
@ 2018-10-17 12:58 Jonathan Liu
  2018-10-22 14:06 ` Pierre-Louis Bossart
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Liu @ 2018-10-17 12:58 UTC (permalink / raw)
  To: Alan Stern, Clemens Ladisch, Takashi Iwai; +Cc: ALSA development

Hi,

I want to start a discussion regarding round trip latency for class
compliant USB audio interfaces on Linux. In particular, I am noticing
with my USB 2.0 RME Babyface Pro audio interface that the round trip
latency is considerably higher on Linux than on macOS High Sierra and
Windows 10.

I tested the round trip latency using a loopback audio cable and the
ReaInsert plugin included with Reaper DAW (www.reaper.fm) that can be
downloaded for Windows/macOS/Linux to calculate the additional delay.

Here are the results for 48000 Hz, 24-bit on my RME Babyface Pro:
===
block_size/periods block_size*periods + additional_delay ~ round_trip_latency
round_trip_latency = (block_size*periods + additional_delay) / 48000 * 1000

Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
16/2 32 + 80 ~ 2.333 ms
16/3 48 + 109 ~ 3.271 ms
32/2 64 + 129 ~ 4.021 ms
32/3 96 + 166 ~ 5.458 ms
64/2 128 + 205 ~ 6.938 ms
64/3 192 + 242 ~ 9.042 ms
128/2 256 + 352 ~ 12.667 ms
128/3 384 + 496 ~ 18.334 ms
256/2 512 + 650 ~ 24.208 ms
256/3 768 + 650 ~ 29.542 ms
512/2 1024 + 634 ~ 34.542 ms
512/3 1536 + 634 ~ 45.208 ms
1024/2 2048 + 650 ~ 56.208 ms
1024/3 3072 + 650 ~ 77.542 ms
2048/2 4096 + 633 ~ 98.521 ms
2048/3 6144 + 633 ~ 141.188 ms

macOS High Sierra, Class Compliant Mode (Apple Driver):
16/2 32 + 205 ~ 4.938 ms
32/2 64 + 205 ~ 5.604 ms
64/2 128 + 205 ~ 6.938 ms
128/2 256 + 205 ~ 9.604 ms
256/2 512 + 205 ~ 14.938 ms
512/2 1024 + 205 ~ 25.604 ms
1024/2 2048 + 205 ~ 46.938 ms
2048/2 4096 + 205 ~ 89.604 ms

macOS High Sierra, PC Mode (RME Driver v3.08):
16/2 32 + 59 ~ 1.896 ms
32/2 64 + 59 ~ 2.563 ms
64/2 128 + 59 ~ 3.896 ms
128/2 256 + 59 ~ 6.563 ms
256/2 512 + 59 ~ 11.596 ms
512/2 1024 + 59 ~ 22.563 ms
1024/2 2048 + 59 ~ 43.896 ms
2048/2 4096 + 59 ~ 86.563 ms

Windows 10, PC Mode (RME Driver 1.099):
48/2 96 + 63 ~ 3.313 ms
64/2 128 + 63 ~ 3.979 ms
96/2 192 + 63 ~ 5.313 ms
128/2 256 + 63 ~ 6.646 ms
256/2 512 + 63 ~ 11.979 ms
512/2 1024 + 63 ~ 22.646 ms
1024/2 2048 + 63 ~ 43.979 ms
2048/2 4096 + 63 ~ 86.646 ms
===

Some things in particular I noticed on Linux:
- additional_delay varies a bit if I close and open the audio device again
- additional_delay seems to increase as the block_size increases. I
can make the additional_delay stay about the same rather than
increasing by setting MAX_PACKS and MAX_PACKS_HS to 1 in
sound/usb/card.h. In Linux versions before 3.13 there was a nrpacks
parameter for snd-usb-audio to control this but it was removed with
commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v3.13&id=976b6c064a957445eb0573b270f2d0282630e9b9
- additional_delay is not constant as block_size is increased like on
macOS and Windows

I made a patch to snd-usb-audio to expose the snd-usb-audio constants
as runtime adjustable module parameters
(/sys/module/snd_usb_audio/parameters/) for testing (takes effect when
the device is disconnected+reconnected and logs the parameter values
to dmesg):
https://aur.archlinux.org/cgit/aur.git/plain/parameters.patch?h=snd-usb-audio-lowlatency-dkms

The patch is used in my Arch Linux AUR package for convenience (using
DKMS to avoid having to recompile entire kernel):
https://aur.archlinux.org/packages/snd-usb-audio-lowlatency-dkms/

Can snd-usb-audio be improved so the additional_delay is always the
same when closing/opening/reconfiguring the audio device and does not
increase as the block_size increases?

I noticed using USB audio on Linux at lower latencies (block_size <=
128) is more prone to audio dropouts under load compared to macOS and
Windows, even with CPU power management disabled (writing 0 to
/dev/cpu_dma_latency). What can be done about this?

Thanks.

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-17 12:58 snd-usb-audio Buffer Sizes and Round Trip Latency Jonathan Liu
@ 2018-10-22 14:06 ` Pierre-Louis Bossart
  2018-10-22 15:40   ` Alan Stern
  0 siblings, 1 reply; 12+ messages in thread
From: Pierre-Louis Bossart @ 2018-10-22 14:06 UTC (permalink / raw)
  To: Jonathan Liu, Alan Stern, Clemens Ladisch, Takashi Iwai; +Cc: ALSA development


On 10/17/18 7:58 AM, Jonathan Liu wrote:
> Hi,
>
> I want to start a discussion regarding round trip latency for class
> compliant USB audio interfaces on Linux. In particular, I am noticing
> with my USB 2.0 RME Babyface Pro audio interface that the round trip
> latency is considerably higher on Linux than on macOS High Sierra and
> Windows 10.
>
> I tested the round trip latency using a loopback audio cable and the
> ReaInsert plugin included with Reaper DAW (www.reaper.fm) that can be
> downloaded for Windows/macOS/Linux to calculate the additional delay.
>
> Here are the results for 48000 Hz, 24-bit on my RME Babyface Pro:
> ===
> block_size/periods block_size*periods + additional_delay ~ round_trip_latency
> round_trip_latency = (block_size*periods + additional_delay) / 48000 * 1000
>
> Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> 16/2 32 + 80 ~ 2.333 ms
> 16/3 48 + 109 ~ 3.271 ms
> 32/2 64 + 129 ~ 4.021 ms
> 32/3 96 + 166 ~ 5.458 ms
> 64/2 128 + 205 ~ 6.938 ms
> 64/3 192 + 242 ~ 9.042 ms
> 128/2 256 + 352 ~ 12.667 ms
> 128/3 384 + 496 ~ 18.334 ms
> 256/2 512 + 650 ~ 24.208 ms
> 256/3 768 + 650 ~ 29.542 ms
> 512/2 1024 + 634 ~ 34.542 ms
> 512/3 1536 + 634 ~ 45.208 ms
> 1024/2 2048 + 650 ~ 56.208 ms
> 1024/3 3072 + 650 ~ 77.542 ms
> 2048/2 4096 + 633 ~ 98.521 ms
> 2048/3 6144 + 633 ~ 141.188 ms
>
> macOS High Sierra, Class Compliant Mode (Apple Driver):
> 16/2 32 + 205 ~ 4.938 ms
> 32/2 64 + 205 ~ 5.604 ms
> 64/2 128 + 205 ~ 6.938 ms
> 128/2 256 + 205 ~ 9.604 ms
> 256/2 512 + 205 ~ 14.938 ms
> 512/2 1024 + 205 ~ 25.604 ms
> 1024/2 2048 + 205 ~ 46.938 ms
> 2048/2 4096 + 205 ~ 89.604 ms

I couldn't figure out how to analyze your data, not sure what the extra 
delays mean nor how you conclude that Linux is worse than MacOS or 
Windows10 for small buffers?

At any rate, I looked into this some time back but had to put the work 
on the back burner due to other priorities. What I do remember is that 
there is a built-in latency due to the fact that on playback the driver 
submits a number of zero-filled URBs and will only add valid audio data 
when the first URB is retired, which means you get a constant startup 
latency you will never be able to catch up.

I also vaguely remember that at some point the buffer/period sizes don't 
matter, each period will be broken up in a series of URBs and hence you 
will have more wake-ups than what is configured by the period size. In 
short I would look into the way the data is spread on multiple URBs and 
check how latency is impacted by the software design.

the last thing I have in mind is that for latency analysis and 
comparisons, using simple devices make sense. Latency can be affected by 
extra processing that might be enabled in the USB device depending on 
user configurations or parameters. Ideally to focus on the ALSA/xHCI 
interaction/latency we'd want to look at really dumb devices with just 
an input and output terminal and no processing.

-Pierre

>
> macOS High Sierra, PC Mode (RME Driver v3.08):
> 16/2 32 + 59 ~ 1.896 ms
> 32/2 64 + 59 ~ 2.563 ms
> 64/2 128 + 59 ~ 3.896 ms
> 128/2 256 + 59 ~ 6.563 ms
> 256/2 512 + 59 ~ 11.596 ms
> 512/2 1024 + 59 ~ 22.563 ms
> 1024/2 2048 + 59 ~ 43.896 ms
> 2048/2 4096 + 59 ~ 86.563 ms
>
> Windows 10, PC Mode (RME Driver 1.099):
> 48/2 96 + 63 ~ 3.313 ms
> 64/2 128 + 63 ~ 3.979 ms
> 96/2 192 + 63 ~ 5.313 ms
> 128/2 256 + 63 ~ 6.646 ms
> 256/2 512 + 63 ~ 11.979 ms
> 512/2 1024 + 63 ~ 22.646 ms
> 1024/2 2048 + 63 ~ 43.979 ms
> 2048/2 4096 + 63 ~ 86.646 ms
> ===
>
> Some things in particular I noticed on Linux:
> - additional_delay varies a bit if I close and open the audio device again
> - additional_delay seems to increase as the block_size increases. I
> can make the additional_delay stay about the same rather than
> increasing by setting MAX_PACKS and MAX_PACKS_HS to 1 in
> sound/usb/card.h. In Linux versions before 3.13 there was a nrpacks
> parameter for snd-usb-audio to control this but it was removed with
> commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v3.13&id=976b6c064a957445eb0573b270f2d0282630e9b9
> - additional_delay is not constant as block_size is increased like on
> macOS and Windows
>
> I made a patch to snd-usb-audio to expose the snd-usb-audio constants
> as runtime adjustable module parameters
> (/sys/module/snd_usb_audio/parameters/) for testing (takes effect when
> the device is disconnected+reconnected and logs the parameter values
> to dmesg):
> https://aur.archlinux.org/cgit/aur.git/plain/parameters.patch?h=snd-usb-audio-lowlatency-dkms
>
> The patch is used in my Arch Linux AUR package for convenience (using
> DKMS to avoid having to recompile entire kernel):
> https://aur.archlinux.org/packages/snd-usb-audio-lowlatency-dkms/
>
> Can snd-usb-audio be improved so the additional_delay is always the
> same when closing/opening/reconfiguring the audio device and does not
> increase as the block_size increases?
>
> I noticed using USB audio on Linux at lower latencies (block_size <=
> 128) is more prone to audio dropouts under load compared to macOS and
> Windows, even with CPU power management disabled (writing 0 to
> /dev/cpu_dma_latency). What can be done about this?
>
> Thanks.
>
> Regards,
> Jonathan
> _______________________________________________
> Alsa-devel mailing list
> Alsa-devel@alsa-project.org
> http://mailman.alsa-project.org/mailman/listinfo/alsa-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-22 14:06 ` Pierre-Louis Bossart
@ 2018-10-22 15:40   ` Alan Stern
  2018-10-23 11:59     ` Jonathan Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Alan Stern @ 2018-10-22 15:40 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Takashi Iwai, Jonathan Liu, Clemens Ladisch, ALSA development

On Mon, 22 Oct 2018, Pierre-Louis Bossart wrote:

> On 10/17/18 7:58 AM, Jonathan Liu wrote:
> > Hi,
> >
> > I want to start a discussion regarding round trip latency for class
> > compliant USB audio interfaces on Linux. In particular, I am noticing
> > with my USB 2.0 RME Babyface Pro audio interface that the round trip
> > latency is considerably higher on Linux than on macOS High Sierra and
> > Windows 10.
> >
> > I tested the round trip latency using a loopback audio cable and the
> > ReaInsert plugin included with Reaper DAW (www.reaper.fm) that can be
> > downloaded for Windows/macOS/Linux to calculate the additional delay.
> >
> > Here are the results for 48000 Hz, 24-bit on my RME Babyface Pro:
> > ===
> > block_size/periods block_size*periods + additional_delay ~ round_trip_latency
> > round_trip_latency = (block_size*periods + additional_delay) / 48000 * 1000

I'm with Pierre-Louis on this; I can't make heads or tails out of these 
formulas.

To begin with, I'm accustomed to talking about frames, periods, and 
buffers.  What are "block"s?  Are they the same as buffers?

What do these formulas mean?  Is the first supposed to be a definition 
of round_trip_latency?  If it isn't, then how do you define or measure 
round_trip_latency?

What is additional_delay?  How is it measured or calculated?

> > Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> > 16/2 32 + 80 ~ 2.333 ms

What are these numbers?  Are these lines supposed to in the format
expressed by the first formula above?  If they are, how come
"block_size/periods" shows up as a pair of numbers "16/2" but
"block_size*periods" shows up as a single number "32"?

> > 16/3 48 + 109 ~ 3.271 ms
> > 32/2 64 + 129 ~ 4.021 ms
> > 32/3 96 + 166 ~ 5.458 ms
> > 64/2 128 + 205 ~ 6.938 ms
> > 64/3 192 + 242 ~ 9.042 ms
> > 128/2 256 + 352 ~ 12.667 ms
> > 128/3 384 + 496 ~ 18.334 ms
> > 256/2 512 + 650 ~ 24.208 ms
> > 256/3 768 + 650 ~ 29.542 ms
> > 512/2 1024 + 634 ~ 34.542 ms
> > 512/3 1536 + 634 ~ 45.208 ms
> > 1024/2 2048 + 650 ~ 56.208 ms
> > 1024/3 3072 + 650 ~ 77.542 ms
> > 2048/2 4096 + 633 ~ 98.521 ms
> > 2048/3 6144 + 633 ~ 141.188 ms
> >
> > macOS High Sierra, Class Compliant Mode (Apple Driver):
> > 16/2 32 + 205 ~ 4.938 ms
> > 32/2 64 + 205 ~ 5.604 ms
> > 64/2 128 + 205 ~ 6.938 ms
> > 128/2 256 + 205 ~ 9.604 ms
> > 256/2 512 + 205 ~ 14.938 ms
> > 512/2 1024 + 205 ~ 25.604 ms
> > 1024/2 2048 + 205 ~ 46.938 ms
> > 2048/2 4096 + 205 ~ 89.604 ms

What are the USB parameters for these tests?  How many bytes/frame?  
What is the endpoint's maxpacket size?  What is the speed of the USB 
bus?

> I couldn't figure out how to analyze your data, not sure what the extra 
> delays mean nor how you conclude that Linux is worse than MacOS or 
> Windows10 for small buffers?
> 
> At any rate, I looked into this some time back but had to put the work 
> on the back burner due to other priorities. What I do remember is that 
> there is a built-in latency due to the fact that on playback the driver 
> submits a number of zero-filled URBs and will only add valid audio data 
> when the first URB is retired, which means you get a constant startup 
> latency you will never be able to catch up.

In theory the number of zero-filled URBs could be reduced, maybe even 
eliminated.

> I also vaguely remember that at some point the buffer/period sizes don't 
> matter, each period will be broken up in a series of URBs and hence you 
> will have more wake-ups than what is configured by the period size. In 
> short I would look into the way the data is spread on multiple URBs and 
> check how latency is impacted by the software design.

Agreed.

> the last thing I have in mind is that for latency analysis and 
> comparisons, using simple devices make sense. Latency can be affected by 
> extra processing that might be enabled in the USB device depending on 
> user configurations or parameters. Ideally to focus on the ALSA/xHCI 
> interaction/latency we'd want to look at really dumb devices with just 
> an input and output terminal and no processing.
> 
> -Pierre
> 
> >
> > macOS High Sierra, PC Mode (RME Driver v3.08):
> > 16/2 32 + 59 ~ 1.896 ms
> > 32/2 64 + 59 ~ 2.563 ms
> > 64/2 128 + 59 ~ 3.896 ms
> > 128/2 256 + 59 ~ 6.563 ms
> > 256/2 512 + 59 ~ 11.596 ms
> > 512/2 1024 + 59 ~ 22.563 ms
> > 1024/2 2048 + 59 ~ 43.896 ms
> > 2048/2 4096 + 59 ~ 86.563 ms
> >
> > Windows 10, PC Mode (RME Driver 1.099):
> > 48/2 96 + 63 ~ 3.313 ms
> > 64/2 128 + 63 ~ 3.979 ms
> > 96/2 192 + 63 ~ 5.313 ms
> > 128/2 256 + 63 ~ 6.646 ms
> > 256/2 512 + 63 ~ 11.979 ms
> > 512/2 1024 + 63 ~ 22.646 ms
> > 1024/2 2048 + 63 ~ 43.979 ms
> > 2048/2 4096 + 63 ~ 86.646 ms
> > ===
> >
> > Some things in particular I noticed on Linux:
> > - additional_delay varies a bit if I close and open the audio device again
> > - additional_delay seems to increase as the block_size increases. I
> > can make the additional_delay stay about the same rather than
> > increasing by setting MAX_PACKS and MAX_PACKS_HS to 1 in
> > sound/usb/card.h. In Linux versions before 3.13 there was a nrpacks
> > parameter for snd-usb-audio to control this but it was removed with
> > commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v3.13&id=976b6c064a957445eb0573b270f2d0282630e9b9
> > - additional_delay is not constant as block_size is increased like on
> > macOS and Windows

Perhaps this additional_delay is caused by the zero-filled URBs 
mentioned earlier.

We can't say anything about the effect of setting MAX_PACKS to 1 
without knowing how the driver is currently fitting packets into frames 
and URBs.  In any case, you should be able to reduce the number of 
packets in each URB simply by reducing the period size, since the 
driver strives to keep each URB not much larger than a period (as I 
recall -- it's been a long time since I worked on this (2013)).

> > I made a patch to snd-usb-audio to expose the snd-usb-audio constants
> > as runtime adjustable module parameters
> > (/sys/module/snd_usb_audio/parameters/) for testing (takes effect when
> > the device is disconnected+reconnected and logs the parameter values
> > to dmesg):
> > https://aur.archlinux.org/cgit/aur.git/plain/parameters.patch?h=snd-usb-audio-lowlatency-dkms
> >
> > The patch is used in my Arch Linux AUR package for convenience (using
> > DKMS to avoid having to recompile entire kernel):
> > https://aur.archlinux.org/packages/snd-usb-audio-lowlatency-dkms/
> >
> > Can snd-usb-audio be improved so the additional_delay is always the
> > same when closing/opening/reconfiguring the audio device and does not
> > increase as the block_size increases?
> >
> > I noticed using USB audio on Linux at lower latencies (block_size <=
> > 128) is more prone to audio dropouts under load compared to macOS and
> > Windows, even with CPU power management disabled (writing 0 to
> > /dev/cpu_dma_latency). What can be done about this?

You can reduce the CPU load.  :-)

Seriously, how can you compare loads between different operating 
systems?

Also, note the Linux's scheduler has a number of adjustable parameters, 
which I am not familiar with.

Alan Stern

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-22 15:40   ` Alan Stern
@ 2018-10-23 11:59     ` Jonathan Liu
  2018-10-23 14:08       ` Pierre-Louis Bossart
  2018-10-23 15:10       ` Alan Stern
  0 siblings, 2 replies; 12+ messages in thread
From: Jonathan Liu @ 2018-10-23 11:59 UTC (permalink / raw)
  To: Alan Stern
  Cc: Takashi Iwai, ALSA development, Clemens Ladisch, pierre-louis.bossart

Hi,

On Tue, 23 Oct 2018 at 02:40, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Mon, 22 Oct 2018, Pierre-Louis Bossart wrote:
>
> > On 10/17/18 7:58 AM, Jonathan Liu wrote:
> > > Hi,
> > >
> > > I want to start a discussion regarding round trip latency for class
> > > compliant USB audio interfaces on Linux. In particular, I am noticing
> > > with my USB 2.0 RME Babyface Pro audio interface that the round trip
> > > latency is considerably higher on Linux than on macOS High Sierra and
> > > Windows 10.
> > >
> > > I tested the round trip latency using a loopback audio cable and the
> > > ReaInsert plugin included with Reaper DAW (www.reaper.fm) that can be
> > > downloaded for Windows/macOS/Linux to calculate the additional delay.
> > >
> > > Here are the results for 48000 Hz, 24-bit on my RME Babyface Pro:
> > > ===
> > > block_size/periods block_size*periods + additional_delay ~ round_trip_latency
> > > round_trip_latency = (block_size*periods + additional_delay) / 48000 * 1000
>
> I'm with Pierre-Louis on this; I can't make heads or tails out of these
> formulas.
>
> To begin with, I'm accustomed to talking about frames, periods, and
> buffers.  What are "block"s?  Are they the same as buffers?
>
> What do these formulas mean?  Is the first supposed to be a definition
> of round_trip_latency?  If it isn't, then how do you define or measure
> round_trip_latency?
>
> What is additional_delay?  How is it measured or calculated?

See below.

>
> > > Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> > > 16/2 32 + 80 ~ 2.333 ms
>
> What are these numbers?  Are these lines supposed to in the format
> expressed by the first formula above?  If they are, how come
> "block_size/periods" shows up as a pair of numbers "16/2" but
> "block_size*periods" shows up as a single number "32"?
>

To interpret "16/2 32 + 80 ~ 2.333 ms"
Block size: 16 samples
Periods: 2 (one period for playback + one period for recording when
determining round trip latency)
The minimum round trip latency is: 16 * 2 = 32 samples
However, I measured 112 samples round trip latency which is an
additional delay of 80 samples (32 + 80 = 112).
112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
ms measured round trip latency.

> > > 16/3 48 + 109 ~ 3.271 ms
> > > 32/2 64 + 129 ~ 4.021 ms
> > > 32/3 96 + 166 ~ 5.458 ms
> > > 64/2 128 + 205 ~ 6.938 ms
> > > 64/3 192 + 242 ~ 9.042 ms
> > > 128/2 256 + 352 ~ 12.667 ms
> > > 128/3 384 + 496 ~ 18.334 ms
> > > 256/2 512 + 650 ~ 24.208 ms
> > > 256/3 768 + 650 ~ 29.542 ms
> > > 512/2 1024 + 634 ~ 34.542 ms
> > > 512/3 1536 + 634 ~ 45.208 ms
> > > 1024/2 2048 + 650 ~ 56.208 ms
> > > 1024/3 3072 + 650 ~ 77.542 ms
> > > 2048/2 4096 + 633 ~ 98.521 ms
> > > 2048/3 6144 + 633 ~ 141.188 ms
> > >
> > > macOS High Sierra, Class Compliant Mode (Apple Driver):
> > > 16/2 32 + 205 ~ 4.938 ms
> > > 32/2 64 + 205 ~ 5.604 ms
> > > 64/2 128 + 205 ~ 6.938 ms
> > > 128/2 256 + 205 ~ 9.604 ms
> > > 256/2 512 + 205 ~ 14.938 ms
> > > 512/2 1024 + 205 ~ 25.604 ms
> > > 1024/2 2048 + 205 ~ 46.938 ms
> > > 2048/2 4096 + 205 ~ 89.604 ms

>
> What are the USB parameters for these tests?  How many bytes/frame?
> What is the endpoint's maxpacket size?  What is the speed of the USB
> bus?
>

How would I determine the USB parameters and bytes/frame?

USB port is Intel USB 3.0 port. Device is USB 2.0 high speed (480 Mbps).

Here is the lsusb output:
Bus 001 Device 004: ID 2a39:3fb0
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass          239 Miscellaneous Device
  bDeviceSubClass         2
  bDeviceProtocol         1 Interface Association
  bMaxPacketSize0        64
  idVendor           0x2a39
  idProduct          0x3fb0
  bcdDevice            0.01
  iManufacturer           1 RME
  iProduct                2 Babyface Pro (71964099)
  iSerial                 3 EF72ADBCCECA4C8
  bNumConfigurations      1
  Configuration Descriptor:
    bLength                 9
    bDescriptorType         2
    wTotalLength       0x01a7
    bNumInterfaces          4
    bConfigurationValue     1
    iConfiguration          0
    bmAttributes         0x80
      (Bus Powered)
    MaxPower              100mA
    Interface Association:
      bLength                 8
      bDescriptorType        11
      bFirstInterface         0
      bInterfaceCount         4
      bFunctionClass          1 Audio
      bFunctionSubClass       0
      bFunctionProtocol      32
      iFunction               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        0
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         1 Audio
      bInterfaceSubClass      1 Control Device
      bInterfaceProtocol     32
      iInterface              0
      AudioControl Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      1 (HEADER)
        bcdADC               2.00
        bCategory               8
        wTotalLength       0x0055
        bmControls           0x00
      AudioControl Interface Descriptor:
        bLength                 8
        bDescriptorType        36
        bDescriptorSubtype     10 (CLOCK_SOURCE)
        bClockID                1
        bmAttributes            3 Internal programmable clock
        bmControls           0x03
          Clock Frequency Control (read/write)
        bAssocTerminal          0
        iClockSource            0
      AudioControl Interface Descriptor:
        bLength                17
        bDescriptorType        36
        bDescriptorSubtype      2 (INPUT_TERMINAL)
        bTerminalID             3
        wTerminalType      0x0101 USB Streaming
        bAssocTerminal          0
        bCSourceID              1
        bNrChannels            12
        bmChannelConfig    0x00000000
        iChannelNames           0
        bmControls         0x0000
        iTerminal               0
      AudioControl Interface Descriptor:
        bLength                17
        bDescriptorType        36
        bDescriptorSubtype      2 (INPUT_TERMINAL)
        bTerminalID             5
        wTerminalType      0x0201 Microphone
        bAssocTerminal          0
        bCSourceID              1
        bNrChannels            12
        bmChannelConfig    0x00000000
        iChannelNames           0
        bmControls         0x0000
        iTerminal               0
      AudioControl Interface Descriptor:
        bLength                12
        bDescriptorType        36
        bDescriptorSubtype      3 (OUTPUT_TERMINAL)
        bTerminalID             4
        wTerminalType      0x0301 Speaker
        bAssocTerminal          0
        bSourceID               2
        bCSourceID              1
        bmControls         0x0000
        iTerminal               0
      AudioControl Interface Descriptor:
        bLength                12
        bDescriptorType        36
        bDescriptorSubtype      3 (OUTPUT_TERMINAL)
        bTerminalID             6
        wTerminalType      0x0101 USB Streaming
        bAssocTerminal          0
        bSourceID               5
        bCSourceID              1
        bmControls         0x0000
        iTerminal               0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol     32
      iInterface              0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       1
      bNumEndpoints           2
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol     32
      iInterface              0
      AudioStreaming Interface Descriptor:
        bLength                16
        bDescriptorType        36
        bDescriptorSubtype      1 (AS_GENERAL)
        bTerminalLink           3
        bmControls           0x00
        bFormatType             1
        bmFormats          0x00000001
          PCM
        bNrChannels             2
        bmChannelConfig    0x00000003
          Front Left (FL)
          Front Right (FR)
        iChannelNames           0
      AudioStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (FORMAT_TYPE)
        bFormatType             1 (FORMAT_TYPE_I)
        bSubslotSize            3
        bBitResolution         24
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x03  EP 3 OUT
        bmAttributes            5
          Transfer Type            Isochronous
          Synch Type               Asynchronous
          Usage Type               Data
        wMaxPacketSize     0x0096  1x 150 bytes
        bInterval               1
        AudioStreaming Endpoint Descriptor:
          bLength                 8
          bDescriptorType        37
          bDescriptorSubtype      1 (EP_GENERAL)
          bmAttributes         0x00
          bmControls           0x00
          bLockDelayUnits         0 Undefined
          wLockDelay         0x0000
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes           17
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Feedback
        wMaxPacketSize     0x0004  1x 4 bytes
        bInterval               4
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        1
      bAlternateSetting       2
      bNumEndpoints           2
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol     32
      iInterface              0
      AudioStreaming Interface Descriptor:
        bLength                16
        bDescriptorType        36
        bDescriptorSubtype      1 (AS_GENERAL)
        bTerminalLink           3
        bmControls           0x00
        bFormatType             1
        bmFormats          0x00000001
          PCM
        bNrChannels            12
        bmChannelConfig    0x00000000
        iChannelNames           0
      AudioStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (FORMAT_TYPE)
        bFormatType             1 (FORMAT_TYPE_I)
        bSubslotSize            3
        bBitResolution         24
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x03  EP 3 OUT
        bmAttributes            5
          Transfer Type            Isochronous
          Synch Type               Asynchronous
          Usage Type               Data
        wMaxPacketSize     0x0384  1x 900 bytes
        bInterval               1
        AudioStreaming Endpoint Descriptor:
          bLength                 8
          bDescriptorType        37
          bDescriptorSubtype      1 (EP_GENERAL)
          bmAttributes         0x00
          bmControls           0x00
          bLockDelayUnits         0 Undefined
          wLockDelay         0x0000
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x83  EP 3 IN
        bmAttributes           17
          Transfer Type            Isochronous
          Synch Type               None
          Usage Type               Feedback
        wMaxPacketSize     0x0004  1x 4 bytes
        bInterval               4
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       0
      bNumEndpoints           0
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol     32
      iInterface              0
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       1
      bNumEndpoints           1
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol     32
      iInterface              0
      AudioStreaming Interface Descriptor:
        bLength                16
        bDescriptorType        36
        bDescriptorSubtype      1 (AS_GENERAL)
        bTerminalLink           6
        bmControls           0x00
        bFormatType             1
        bmFormats          0x00000001
          PCM
        bNrChannels            12
        bmChannelConfig    0x00000000
        iChannelNames           0
      AudioStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (FORMAT_TYPE)
        bFormatType             1 (FORMAT_TYPE_I)
        bSubslotSize            3
        bBitResolution         24
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x84  EP 4 IN
        bmAttributes            5
          Transfer Type            Isochronous
          Synch Type               Asynchronous
          Usage Type               Data
        wMaxPacketSize     0x0384  1x 900 bytes
        bInterval               1
        AudioStreaming Endpoint Descriptor:
          bLength                 8
          bDescriptorType        37
          bDescriptorSubtype      1 (EP_GENERAL)
          bmAttributes         0x00
          bmControls           0x00
          bLockDelayUnits         0 Undefined
          wLockDelay         0x0000
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        2
      bAlternateSetting       2
      bNumEndpoints           1
      bInterfaceClass         1 Audio
      bInterfaceSubClass      2 Streaming
      bInterfaceProtocol     32
      iInterface              0
      AudioStreaming Interface Descriptor:
        bLength                16
        bDescriptorType        36
        bDescriptorSubtype      1 (AS_GENERAL)
        bTerminalLink           6
        bmControls           0x00
        bFormatType             1
        bmFormats          0x00000001
          PCM
        bNrChannels             2
        bmChannelConfig    0x00000003
          Front Left (FL)
          Front Right (FR)
        iChannelNames           0
      AudioStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (FORMAT_TYPE)
        bFormatType             1 (FORMAT_TYPE_I)
        bSubslotSize            3
        bBitResolution         24
      Endpoint Descriptor:
        bLength                 7
        bDescriptorType         5
        bEndpointAddress     0x84  EP 4 IN
        bmAttributes            5
          Transfer Type            Isochronous
          Synch Type               Asynchronous
          Usage Type               Data
        wMaxPacketSize     0x0096  1x 150 bytes
        bInterval               1
        AudioStreaming Endpoint Descriptor:
          bLength                 8
          bDescriptorType        37
          bDescriptorSubtype      1 (EP_GENERAL)
          bmAttributes         0x00
          bmControls           0x00
          bLockDelayUnits         0 Undefined
          wLockDelay         0x0000
    Interface Descriptor:
      bLength                 9
      bDescriptorType         4
      bInterfaceNumber        3
      bAlternateSetting       0
      bNumEndpoints           2
      bInterfaceClass         1 Audio
      bInterfaceSubClass      3 MIDI Streaming
      bInterfaceProtocol      0
      iInterface              2 Babyface Pro (71964099)
      MIDIStreaming Interface Descriptor:
        bLength                 7
        bDescriptorType        36
        bDescriptorSubtype      1 (HEADER)
        bcdADC               1.00
        wTotalLength       0x0061
      MIDIStreaming Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      3 (MIDI_OUT_JACK)
        bJackType               1 Embedded
        bJackID                 3
        bNrInputPins            1
        baSourceID( 0)          2
        BaSourcePin( 0)         1
        iJack                   4 Port 1
      MIDIStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (MIDI_IN_JACK)
        bJackType               2 External
        bJackID                 2
        iJack                   4 Port 1
      MIDIStreaming Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      3 (MIDI_OUT_JACK)
        bJackType               1 Embedded
        bJackID                 7
        bNrInputPins            1
        baSourceID( 0)          6
        BaSourcePin( 0)         1
        iJack                   5 Port 2
      MIDIStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (MIDI_IN_JACK)
        bJackType               2 External
        bJackID                 6
        iJack                   5 Port 2
      MIDIStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (MIDI_IN_JACK)
        bJackType               1 Embedded
        bJackID                 1
        iJack                   4 Port 1
      MIDIStreaming Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      3 (MIDI_OUT_JACK)
        bJackType               2 External
        bJackID                 4
        bNrInputPins            1
        baSourceID( 0)          1
        BaSourcePin( 0)         1
        iJack                   4 Port 1
      MIDIStreaming Interface Descriptor:
        bLength                 6
        bDescriptorType        36
        bDescriptorSubtype      2 (MIDI_IN_JACK)
        bJackType               1 Embedded
        bJackID                 5
        iJack                   5 Port 2
      MIDIStreaming Interface Descriptor:
        bLength                 9
        bDescriptorType        36
        bDescriptorSubtype      3 (MIDI_OUT_JACK)
        bJackType               2 External
        bJackID                 8
        bNrInputPins            1
        baSourceID( 0)          5
        BaSourcePin( 0)         1
        iJack                   5 Port 2
      Endpoint Descriptor:
        bLength                 9
        bDescriptorType         5
        bEndpointAddress     0x07  EP 7 OUT
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
        bRefresh                0
        bSynchAddress           0
        MIDIStreaming Endpoint Descriptor:
          bLength                 6
          bDescriptorType        37
          bDescriptorSubtype      1 (GENERAL)
          bNumEmbMIDIJack         2
          baAssocJackID( 0)       1
          baAssocJackID( 1)       5
      Endpoint Descriptor:
        bLength                 9
        bDescriptorType         5
        bEndpointAddress     0x86  EP 6 IN
        bmAttributes            2
          Transfer Type            Bulk
          Synch Type               None
          Usage Type               Data
        wMaxPacketSize     0x0200  1x 512 bytes
        bInterval               0
        bRefresh                0
        bSynchAddress           0
        MIDIStreaming Endpoint Descriptor:
          bLength                 6
          bDescriptorType        37
          bDescriptorSubtype      1 (GENERAL)
          bNumEmbMIDIJack         2
          baAssocJackID( 0)       3
          baAssocJackID( 1)       7
Device Qualifier (for other device speed):
  bLength                10
  bDescriptorType         6
  bcdUSB               2.00
  bDeviceClass          239 Miscellaneous Device
  bDeviceSubClass         2
  bDeviceProtocol         1 Interface Association
  bMaxPacketSize0        64
  bNumConfigurations      0
Device Status:     0x0e00
  (Bus Powered)

> > I couldn't figure out how to analyze your data, not sure what the extra
> > delays mean nor how you conclude that Linux is worse than MacOS or
> > Windows10 for small buffers?
> >
> > At any rate, I looked into this some time back but had to put the work
> > on the back burner due to other priorities. What I do remember is that
> > there is a built-in latency due to the fact that on playback the driver
> > submits a number of zero-filled URBs and will only add valid audio data
> > when the first URB is retired, which means you get a constant startup
> > latency you will never be able to catch up.
>
> In theory the number of zero-filled URBs could be reduced, maybe even
> eliminated.
>
> > I also vaguely remember that at some point the buffer/period sizes don't
> > matter, each period will be broken up in a series of URBs and hence you
> > will have more wake-ups than what is configured by the period size. In
> > short I would look into the way the data is spread on multiple URBs and
> > check how latency is impacted by the software design.
>
> Agreed.
>
> > the last thing I have in mind is that for latency analysis and
> > comparisons, using simple devices make sense. Latency can be affected by
> > extra processing that might be enabled in the USB device depending on
> > user configurations or parameters. Ideally to focus on the ALSA/xHCI
> > interaction/latency we'd want to look at really dumb devices with just
> > an input and output terminal and no processing.
> >
> > -Pierre
> >
> > >
> > > macOS High Sierra, PC Mode (RME Driver v3.08):
> > > 16/2 32 + 59 ~ 1.896 ms
> > > 32/2 64 + 59 ~ 2.563 ms
> > > 64/2 128 + 59 ~ 3.896 ms
> > > 128/2 256 + 59 ~ 6.563 ms
> > > 256/2 512 + 59 ~ 11.596 ms
> > > 512/2 1024 + 59 ~ 22.563 ms
> > > 1024/2 2048 + 59 ~ 43.896 ms
> > > 2048/2 4096 + 59 ~ 86.563 ms
> > >
> > > Windows 10, PC Mode (RME Driver 1.099):
> > > 48/2 96 + 63 ~ 3.313 ms
> > > 64/2 128 + 63 ~ 3.979 ms
> > > 96/2 192 + 63 ~ 5.313 ms
> > > 128/2 256 + 63 ~ 6.646 ms
> > > 256/2 512 + 63 ~ 11.979 ms
> > > 512/2 1024 + 63 ~ 22.646 ms
> > > 1024/2 2048 + 63 ~ 43.979 ms
> > > 2048/2 4096 + 63 ~ 86.646 ms
> > > ===
> > >
> > > Some things in particular I noticed on Linux:
> > > - additional_delay varies a bit if I close and open the audio device again
> > > - additional_delay seems to increase as the block_size increases. I
> > > can make the additional_delay stay about the same rather than
> > > increasing by setting MAX_PACKS and MAX_PACKS_HS to 1 in
> > > sound/usb/card.h. In Linux versions before 3.13 there was a nrpacks
> > > parameter for snd-usb-audio to control this but it was removed with
> > > commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v3.13&id=976b6c064a957445eb0573b270f2d0282630e9b9
> > > - additional_delay is not constant as block_size is increased like on
> > > macOS and Windows
>
> Perhaps this additional_delay is caused by the zero-filled URBs
> mentioned earlier.
>
> We can't say anything about the effect of setting MAX_PACKS to 1
> without knowing how the driver is currently fitting packets into frames
> and URBs.  In any case, you should be able to reduce the number of
> packets in each URB simply by reducing the period size, since the
> driver strives to keep each URB not much larger than a period (as I
> recall -- it's been a long time since I worked on this (2013)).
>
> > > I made a patch to snd-usb-audio to expose the snd-usb-audio constants
> > > as runtime adjustable module parameters
> > > (/sys/module/snd_usb_audio/parameters/) for testing (takes effect when
> > > the device is disconnected+reconnected and logs the parameter values
> > > to dmesg):
> > > https://aur.archlinux.org/cgit/aur.git/plain/parameters.patch?h=snd-usb-audio-lowlatency-dkms
> > >
> > > The patch is used in my Arch Linux AUR package for convenience (using
> > > DKMS to avoid having to recompile entire kernel):
> > > https://aur.archlinux.org/packages/snd-usb-audio-lowlatency-dkms/
> > >
> > > Can snd-usb-audio be improved so the additional_delay is always the
> > > same when closing/opening/reconfiguring the audio device and does not
> > > increase as the block_size increases?
> > >
> > > I noticed using USB audio on Linux at lower latencies (block_size <=
> > > 128) is more prone to audio dropouts under load compared to macOS and
> > > Windows, even with CPU power management disabled (writing 0 to
> > > /dev/cpu_dma_latency). What can be done about this?
>
> You can reduce the CPU load.  :-)
>
> Seriously, how can you compare loads between different operating
> systems?
>
> Also, note the Linux's scheduler has a number of adjustable parameters,
> which I am not familiar with.
>
> Alan Stern
>

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-23 11:59     ` Jonathan Liu
@ 2018-10-23 14:08       ` Pierre-Louis Bossart
  2018-10-24  7:13         ` Takashi Iwai
  2018-10-23 15:10       ` Alan Stern
  1 sibling, 1 reply; 12+ messages in thread
From: Pierre-Louis Bossart @ 2018-10-23 14:08 UTC (permalink / raw)
  To: Jonathan Liu, Alan Stern; +Cc: Takashi Iwai, ALSA development, Clemens Ladisch


>>>> Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
>>>> 16/2 32 + 80 ~ 2.333 ms
>> What are these numbers?  Are these lines supposed to in the format
>> expressed by the first formula above?  If they are, how come
>> "block_size/periods" shows up as a pair of numbers "16/2" but
>> "block_size*periods" shows up as a single number "32"?
>>
> To interpret "16/2 32 + 80 ~ 2.333 ms"
> Block size: 16 samples
> Periods: 2 (one period for playback + one period for recording when
> determining round trip latency)
> The minimum round trip latency is: 16 * 2 = 32 samples
> However, I measured 112 samples round trip latency which is an
> additional delay of 80 samples (32 + 80 = 112).
> 112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
> ms measured round trip latency.

ok, so what problem are you trying to fix?

Are you concerned about the latency numbers (but then they seem lower on 
Linux and latency concerns with large buffers are a self-negating 
proposition)? are you concerned about the variable delay that doesn't 
seem to exist on MacOS or Windows? Are you trying to match the 
performance of the RME driver on MacOS?

I am not sure how this comparison is done btw, the delay includes both 
buffering on the device side before reaching the analog parts as well as 
buffering on the OS side. While the former should be constant, the 
latter depends a great deal on implementation, not sure there are direct 
lessons to be applied to ALSA. I also see inconsistent/non-linear 
results where with a larger block size the delay is smaller, e.g.

256/2 512 + 650 ~ 24.208 ms
2048/3 6144 + 633 ~ 141.188 ms

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-23 11:59     ` Jonathan Liu
  2018-10-23 14:08       ` Pierre-Louis Bossart
@ 2018-10-23 15:10       ` Alan Stern
  2018-10-24  9:29         ` Jonathan Liu
  1 sibling, 1 reply; 12+ messages in thread
From: Alan Stern @ 2018-10-23 15:10 UTC (permalink / raw)
  To: Jonathan Liu
  Cc: Takashi Iwai, ALSA development, Clemens Ladisch, pierre-louis.bossart

On Tue, 23 Oct 2018, Jonathan Liu wrote:

> > > > Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> > > > 16/2 32 + 80 ~ 2.333 ms
> >
> > What are these numbers?  Are these lines supposed to in the format
> > expressed by the first formula above?  If they are, how come
> > "block_size/periods" shows up as a pair of numbers "16/2" but
> > "block_size*periods" shows up as a single number "32"?
> >
> 
> To interpret "16/2 32 + 80 ~ 2.333 ms"
> Block size: 16 samples

Is this what ALSA would call the number of frames per period?  I
presume your sample is the same as an ALSA frame.  (As I recall, in
ALSA each frame in a stereo stream contains two samples.  You _are_
using stereo, right?  And each sample would be 3 bytes for 24-bit
audio.  Also, in ALSA the period size and block size are the sizes in
bytes, not in frames.)

> Periods: 2 (one period for playback + one period for recording when
> determining round trip latency)

In other words, one period per block in each direction?

> The minimum round trip latency is: 16 * 2 = 32 samples
> However, I measured 112 samples round trip latency which is an
> additional delay of 80 samples (32 + 80 = 112).
> 112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
> ms measured round trip latency.
> 
> > > > 16/3 48 + 109 ~ 3.271 ms

Presumably this indicates three periods, then.  Is that two in the 
outward direction and one in the inward direction, or vice versa?

> > > > 32/2 64 + 129 ~ 4.021 ms
> > > > 32/3 96 + 166 ~ 5.458 ms
> > > > 64/2 128 + 205 ~ 6.938 ms
> > > > 64/3 192 + 242 ~ 9.042 ms
> > > > 128/2 256 + 352 ~ 12.667 ms
> > > > 128/3 384 + 496 ~ 18.334 ms
> > > > 256/2 512 + 650 ~ 24.208 ms
> > > > 256/3 768 + 650 ~ 29.542 ms
> > > > 512/2 1024 + 634 ~ 34.542 ms
> > > > 512/3 1536 + 634 ~ 45.208 ms
> > > > 1024/2 2048 + 650 ~ 56.208 ms
> > > > 1024/3 3072 + 650 ~ 77.542 ms
> > > > 2048/2 4096 + 633 ~ 98.521 ms
> > > > 2048/3 6144 + 633 ~ 141.188 ms

As compared to the other systems, it appears that in Linux the
additional delay increases with the period size.  This could be a
result of the initial zero-filled URBs, since the size or number of
those URBs may depend on the other settings.

> > > > macOS High Sierra, Class Compliant Mode (Apple Driver):
> > > > 16/2 32 + 205 ~ 4.938 ms
> > > > 32/2 64 + 205 ~ 5.604 ms
> > > > 64/2 128 + 205 ~ 6.938 ms
> > > > 128/2 256 + 205 ~ 9.604 ms
> > > > 256/2 512 + 205 ~ 14.938 ms
> > > > 512/2 1024 + 205 ~ 25.604 ms
> > > > 1024/2 2048 + 205 ~ 46.938 ms
> > > > 2048/2 4096 + 205 ~ 89.604 ms
> 
> >
> > What are the USB parameters for these tests?  How many bytes/frame?
> > What is the endpoint's maxpacket size?  What is the speed of the USB
> > bus?
> >
> 
> How would I determine the USB parameters and bytes/frame?
> 
> USB port is Intel USB 3.0 port. Device is USB 2.0 high speed (480 Mbps).
> 
> Here is the lsusb output:

Both too much information and too little.  Instead, let's see the
device's entry in /sys/kernel/debug/usb/devices, copied at a time while
the test is running.  That will omit a lot of irrelevant information
and will indicate which of all the possible device settings is the one
actually in use.

If you want to get a better idea for exactly what is happening at the
USB level, you can collect a usbmon trace while running a test.  Also,
it wouldn't hurt to see the values of max_packs_per_urb, urb_packs,
max_packs_per_period, urbs_per_period, ep->max_urb_frames, and
ep->nurbs from data_ep_set_params() in the audio driver.

Alan Stern

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-23 14:08       ` Pierre-Louis Bossart
@ 2018-10-24  7:13         ` Takashi Iwai
  2019-04-22 10:50           ` Jonathan Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Takashi Iwai @ 2018-10-24  7:13 UTC (permalink / raw)
  To: Pierre-Louis Bossart
  Cc: Clemens Ladisch, Jonathan Liu, Alan Stern, ALSA development

On Tue, 23 Oct 2018 16:08:22 +0200,
Pierre-Louis Bossart wrote:
> 
> 
> >>>> Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> >>>> 16/2 32 + 80 ~ 2.333 ms
> >> What are these numbers?  Are these lines supposed to in the format
> >> expressed by the first formula above?  If they are, how come
> >> "block_size/periods" shows up as a pair of numbers "16/2" but
> >> "block_size*periods" shows up as a single number "32"?
> >>
> > To interpret "16/2 32 + 80 ~ 2.333 ms"
> > Block size: 16 samples
> > Periods: 2 (one period for playback + one period for recording when
> > determining round trip latency)
> > The minimum round trip latency is: 16 * 2 = 32 samples
> > However, I measured 112 samples round trip latency which is an
> > additional delay of 80 samples (32 + 80 = 112).
> > 112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
> > ms measured round trip latency.
> 
> ok, so what problem are you trying to fix?
> 
> Are you concerned about the latency numbers (but then they seem lower
> on Linux and latency concerns with large buffers are a self-negating
> proposition)? are you concerned about the variable delay that doesn't
> seem to exist on MacOS or Windows? Are you trying to match the
> performance of the RME driver on MacOS?
> 
> I am not sure how this comparison is done btw, the delay includes both
> buffering on the device side before reaching the analog parts as well
> as buffering on the OS side. While the former should be constant, the
> latter depends a great deal on implementation, not sure there are
> direct lessons to be applied to ALSA. I also see
> inconsistent/non-linear results where with a larger block size the
> delay is smaller, e.g.
> 
> 256/2 512 + 650 ~ 24.208 ms
> 2048/3 6144 + 633 ~ 141.188 ms

Independently from the measurement done in this thread, actually,
there is a known latency source in the playback path in USB-audio
driver code -- which I mentioned in the audio mini conf in the last
year: namely, the USB-audio driver starts streaming at prepare time
for playback, not at the trigger-START time.  This is a sort of
workaround to make the device looking similar to the standard
ring-buffer behavior.

Maybe moving the start at trigger (like the capture direction) would
reduce this artificial latency, but it makes the driver behaving in an
unexpected manner.  Then it may wake up for period_elapsed soon after
the stream start with a large runtime->delay value, as the data in
in-flight URBs are seen as already "processed".


thanks,

Takashi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-23 15:10       ` Alan Stern
@ 2018-10-24  9:29         ` Jonathan Liu
  2018-10-24 14:20           ` Alan Stern
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Liu @ 2018-10-24  9:29 UTC (permalink / raw)
  To: Alan Stern
  Cc: Takashi Iwai, ALSA development, Clemens Ladisch, pierre-louis.bossart

Hi,

On Wed, 24 Oct 2018 at 02:10, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Tue, 23 Oct 2018, Jonathan Liu wrote:
>
> > > > > Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> > > > > 16/2 32 + 80 ~ 2.333 ms
> > >
> > > What are these numbers?  Are these lines supposed to in the format
> > > expressed by the first formula above?  If they are, how come
> > > "block_size/periods" shows up as a pair of numbers "16/2" but
> > > "block_size*periods" shows up as a single number "32"?
> > >
> >
> > To interpret "16/2 32 + 80 ~ 2.333 ms"
> > Block size: 16 samples
>
> Is this what ALSA would call the number of frames per period?  I
> presume your sample is the same as an ALSA frame.  (As I recall, in
> ALSA each frame in a stereo stream contains two samples.  You _are_
> using stereo, right?  And each sample would be 3 bytes for 24-bit
> audio.  Also, in ALSA the period size and block size are the sizes in
> bytes, not in frames.)
>

Yes, I am using 2 channels input and 2 channels output for testing on Linux.

> > Periods: 2 (one period for playback + one period for recording when
> > determining round trip latency)
>
> In other words, one period per block in each direction?
>

Yes.

> > The minimum round trip latency is: 16 * 2 = 32 samples
> > However, I measured 112 samples round trip latency which is an
> > additional delay of 80 samples (32 + 80 = 112).
> > 112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
> > ms measured round trip latency.
> >
> > > > > 16/3 48 + 109 ~ 3.271 ms
>
> Presumably this indicates three periods, then.  Is that two in the
> outward direction and one in the inward direction, or vice versa?
>

Yes, one period is always for capture and the remaining periods are
for playback.

> > > > > 32/2 64 + 129 ~ 4.021 ms
> > > > > 32/3 96 + 166 ~ 5.458 ms
> > > > > 64/2 128 + 205 ~ 6.938 ms
> > > > > 64/3 192 + 242 ~ 9.042 ms
> > > > > 128/2 256 + 352 ~ 12.667 ms
> > > > > 128/3 384 + 496 ~ 18.334 ms
> > > > > 256/2 512 + 650 ~ 24.208 ms
> > > > > 256/3 768 + 650 ~ 29.542 ms
> > > > > 512/2 1024 + 634 ~ 34.542 ms
> > > > > 512/3 1536 + 634 ~ 45.208 ms
> > > > > 1024/2 2048 + 650 ~ 56.208 ms
> > > > > 1024/3 3072 + 650 ~ 77.542 ms
> > > > > 2048/2 4096 + 633 ~ 98.521 ms
> > > > > 2048/3 6144 + 633 ~ 141.188 ms
>
> As compared to the other systems, it appears that in Linux the
> additional delay increases with the period size.  This could be a
> result of the initial zero-filled URBs, since the size or number of
> those URBs may depend on the other settings.
>
> > > > > macOS High Sierra, Class Compliant Mode (Apple Driver):
> > > > > 16/2 32 + 205 ~ 4.938 ms
> > > > > 32/2 64 + 205 ~ 5.604 ms
> > > > > 64/2 128 + 205 ~ 6.938 ms
> > > > > 128/2 256 + 205 ~ 9.604 ms
> > > > > 256/2 512 + 205 ~ 14.938 ms
> > > > > 512/2 1024 + 205 ~ 25.604 ms
> > > > > 1024/2 2048 + 205 ~ 46.938 ms
> > > > > 2048/2 4096 + 205 ~ 89.604 ms
> >
> > >
> > > What are the USB parameters for these tests?  How many bytes/frame?
> > > What is the endpoint's maxpacket size?  What is the speed of the USB
> > > bus?
> > >
> >
> > How would I determine the USB parameters and bytes/frame?
> >
> > USB port is Intel USB 3.0 port. Device is USB 2.0 high speed (480 Mbps).
> >
> > Here is the lsusb output:
>
> Both too much information and too little.  Instead, let's see the
> device's entry in /sys/kernel/debug/usb/devices, copied at a time while
> the test is running.  That will omit a lot of irrelevant information
> and will indicate which of all the possible device settings is the one
> actually in use.
>

T:  Bus=01 Lev=01 Prnt=01 Port=04 Cnt=01 Dev#=  2 Spd=480  MxCh= 0
D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
P:  Vendor=2a39 ProdID=3fb0 Rev= 0.01
S:  Manufacturer=RME
S:  Product=Babyface Pro (71964099)
S:  SerialNumber=EF72ADBCCECA4C8
C:* #Ifs= 4 Cfg#= 1 Atr=80 MxPwr=100mA
A:  FirstIf#= 0 IfCount= 4 Cls=01(audio) Sub=00 Prot=20
I:* If#= 0 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio
I:  If#= 1 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
I:* If#= 1 Alt= 1 #EPs= 2 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=03(O) Atr=05(Isoc) MxPS= 150 Ivl=125us
E:  Ad=83(I) Atr=11(Isoc) MxPS=   4 Ivl=1ms
I:  If#= 1 Alt= 2 #EPs= 2 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=03(O) Atr=05(Isoc) MxPS= 900 Ivl=125us
E:  Ad=83(I) Atr=11(Isoc) MxPS=   4 Ivl=1ms
I:  If#= 2 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
I:  If#= 2 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=84(I) Atr=05(Isoc) MxPS= 900 Ivl=125us
I:* If#= 2 Alt= 2 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
E:  Ad=84(I) Atr=05(Isoc) MxPS= 150 Ivl=125us
I:* If#= 3 Alt= 0 #EPs= 2 Cls=01(audio) Sub=03 Prot=00 Driver=snd-usb-audio
E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

> If you want to get a better idea for exactly what is happening at the
> USB level, you can collect a usbmon trace while running a test.  Also,
> it wouldn't hurt to see the values of max_packs_per_urb, urb_packs,
> max_packs_per_period, urbs_per_period, ep->max_urb_frames, and
> ep->nurbs from data_ep_set_params() in the audio driver.

Maybe in a few weeks. I suspect the additional latency is mainly in
the playback direction.

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-24  9:29         ` Jonathan Liu
@ 2018-10-24 14:20           ` Alan Stern
  0 siblings, 0 replies; 12+ messages in thread
From: Alan Stern @ 2018-10-24 14:20 UTC (permalink / raw)
  To: Jonathan Liu
  Cc: Takashi Iwai, ALSA development, Clemens Ladisch, pierre-louis.bossart

On Wed, 24 Oct 2018, Jonathan Liu wrote:

> > Both too much information and too little.  Instead, let's see the
> > device's entry in /sys/kernel/debug/usb/devices, copied at a time while
> > the test is running.  That will omit a lot of irrelevant information
> > and will indicate which of all the possible device settings is the one
> > actually in use.
> >
> 
> T:  Bus=01 Lev=01 Prnt=01 Port=04 Cnt=01 Dev#=  2 Spd=480  MxCh= 0
> D:  Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs=  1
> P:  Vendor=2a39 ProdID=3fb0 Rev= 0.01
> S:  Manufacturer=RME
> S:  Product=Babyface Pro (71964099)
> S:  SerialNumber=EF72ADBCCECA4C8
> C:* #Ifs= 4 Cfg#= 1 Atr=80 MxPwr=100mA
> A:  FirstIf#= 0 IfCount= 4 Cls=01(audio) Sub=00 Prot=20
> I:* If#= 0 Alt= 0 #EPs= 0 Cls=01(audio) Sub=01 Prot=20 Driver=snd-usb-audio

This is the control interface; it is not directly involved.

> I:  If#= 1 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
> I:* If#= 1 Alt= 1 #EPs= 2 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
> E:  Ad=03(O) Atr=05(Isoc) MxPS= 150 Ivl=125us
> E:  Ad=83(I) Atr=11(Isoc) MxPS=   4 Ivl=1ms

This is one of the interfaces in use; it handles playback data (i.e.,
data sent to the device).  The maxpacket size is 150 bytes, which is 25
frames at 3 bytes/sample and 2 channels.  The interval is 125 us,
giving a maximum throughput of 200 frames/ms, comfortably larger than
the bandwidth being used (48 frames/ms).

> I:  If#= 1 Alt= 2 #EPs= 2 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
> E:  Ad=03(O) Atr=05(Isoc) MxPS= 900 Ivl=125us
> E:  Ad=83(I) Atr=11(Isoc) MxPS=   4 Ivl=1ms
> I:  If#= 2 Alt= 0 #EPs= 0 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
> I:  If#= 2 Alt= 1 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
> E:  Ad=84(I) Atr=05(Isoc) MxPS= 900 Ivl=125us
> I:* If#= 2 Alt= 2 #EPs= 1 Cls=01(audio) Sub=02 Prot=20 Driver=snd-usb-audio
> E:  Ad=84(I) Atr=05(Isoc) MxPS= 150 Ivl=125us

This is the other interface being used for audio data; it handles the 
record direction.  Parameters are the same as for playback.

> I:* If#= 3 Alt= 0 #EPs= 2 Cls=01(audio) Sub=03 Prot=00 Driver=snd-usb-audio
> E:  Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
> E:  Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms

I don't know what this interface is for.  It's probably not directly 
relevant to the issue.

> > If you want to get a better idea for exactly what is happening at the
> > USB level, you can collect a usbmon trace while running a test.  Also,
> > it wouldn't hurt to see the values of max_packs_per_urb, urb_packs,
> > max_packs_per_period, urbs_per_period, ep->max_urb_frames, and
> > ep->nurbs from data_ep_set_params() in the audio driver.
> 
> Maybe in a few weeks. I suspect the additional latency is mainly in
> the playback direction.

That seems likely, especially in the light of Takashi's comments.

Alan Stern

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2018-10-24  7:13         ` Takashi Iwai
@ 2019-04-22 10:50           ` Jonathan Liu
  2019-04-24 14:05             ` Takashi Iwai
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Liu @ 2019-04-22 10:50 UTC (permalink / raw)
  To: Takashi Iwai
  Cc: Clemens Ladisch, ALSA development, Alan Stern, Pierre-Louis Bossart

On Wed, 24 Oct 2018 at 18:13, Takashi Iwai <tiwai@suse.de> wrote:
>
> On Tue, 23 Oct 2018 16:08:22 +0200,
> Pierre-Louis Bossart wrote:
> >
> >
> > >>>> Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> > >>>> 16/2 32 + 80 ~ 2.333 ms
> > >> What are these numbers?  Are these lines supposed to in the format
> > >> expressed by the first formula above?  If they are, how come
> > >> "block_size/periods" shows up as a pair of numbers "16/2" but
> > >> "block_size*periods" shows up as a single number "32"?
> > >>
> > > To interpret "16/2 32 + 80 ~ 2.333 ms"
> > > Block size: 16 samples
> > > Periods: 2 (one period for playback + one period for recording when
> > > determining round trip latency)
> > > The minimum round trip latency is: 16 * 2 = 32 samples
> > > However, I measured 112 samples round trip latency which is an
> > > additional delay of 80 samples (32 + 80 = 112).
> > > 112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
> > > ms measured round trip latency.
> >
> > ok, so what problem are you trying to fix?
> >
> > Are you concerned about the latency numbers (but then they seem lower
> > on Linux and latency concerns with large buffers are a self-negating
> > proposition)? are you concerned about the variable delay that doesn't
> > seem to exist on MacOS or Windows? Are you trying to match the
> > performance of the RME driver on MacOS?
> >
> > I am not sure how this comparison is done btw, the delay includes both
> > buffering on the device side before reaching the analog parts as well
> > as buffering on the OS side. While the former should be constant, the
> > latter depends a great deal on implementation, not sure there are
> > direct lessons to be applied to ALSA. I also see
> > inconsistent/non-linear results where with a larger block size the
> > delay is smaller, e.g.
> >
> > 256/2 512 + 650 ~ 24.208 ms
> > 2048/3 6144 + 633 ~ 141.188 ms

>
> Independently from the measurement done in this thread, actually,
> there is a known latency source in the playback path in USB-audio
> driver code -- which I mentioned in the audio mini conf in the last
> year: namely, the USB-audio driver starts streaming at prepare time
> for playback, not at the trigger-START time.  This is a sort of
> workaround to make the device looking similar to the standard
> ring-buffer behavior.
>
> Maybe moving the start at trigger (like the capture direction) would
> reduce this artificial latency, but it makes the driver behaving in an
> unexpected manner.  Then it may wake up for period_elapsed soon after
> the stream start with a large runtime->delay value, as the data in
> in-flight URBs are seen as already "processed".

I observed that snd_usb_pcm_prepare calls start_endpoints which ends
up submitting silent urbs (prepared by prepare_silent_urb) until
ep->prepare_data_urb is set by SNDRV_PCM_TRIGGER_START in
snd_usb_substream_playback_trigger.

I tried to moving the start_endpoints call from snd_usb_pcm_prepare to
snd_usb_substream_playback trigger's SNDRV_PCM_TRIGGER_START case (see
https://github.com/net147/linux/commit/276eae5481653a2d4034fbae56f0d5bc579ecf67
- it is enabled using start_playback_on_prepare=0 module option for
snd-usb-audio) but I get a kernel stall in some cases with the
following call trace:
_raw_spin_lock+0x2c/0x30
_snd_pcm_stream_lock_irqsave+0x31/0x60 [snd_pcm]
snd_pcm_period_elapsed+0x26/0xb0 [snd_pcm]
prepare_playback_urb+0x368/0x640 [snd_usb_audio]
? usb_submit_urb+0x3cb/0x590
snd_usb_endpoint_start+0x148/0x300 [snd_usb_audio]
start_endpoints+0x36/0x160 [snd_usb_audio]
snd_usb_substream_playback_trigger+0x152/0x1a0 [snd_usb_audio]
snd_pcm_action+0x117/0x150 [snd_pcm]
snd_pcm_common_ioctl+0x588/0xdb0 [snd_pcm]
? mprotect_fixup+0x1ec/0x2f0
snd_pcm_ioctl+0x23/0x30 [snd_pcm]
do_vfs_ioctl+0xa6/0x760
? syscall_trace_enter+0x1be/0x2b0
__x64_sys_ioctl+0x62/0x90
do_syscall_64+0x5b/0x170
entry_SYSCALL_64_after_hwframe+0x44/0xa9

Any ideas?

Thanks.

Regards,
Jonathan

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2019-04-22 10:50           ` Jonathan Liu
@ 2019-04-24 14:05             ` Takashi Iwai
  2019-04-30 14:38               ` Takashi Iwai
  0 siblings, 1 reply; 12+ messages in thread
From: Takashi Iwai @ 2019-04-24 14:05 UTC (permalink / raw)
  To: Jonathan Liu
  Cc: Clemens Ladisch, ALSA development, Alan Stern, Pierre-Louis Bossart

On Mon, 22 Apr 2019 12:50:15 +0200,
Jonathan Liu wrote:
> 
> On Wed, 24 Oct 2018 at 18:13, Takashi Iwai <tiwai@suse.de> wrote:
> >
> > On Tue, 23 Oct 2018 16:08:22 +0200,
> > Pierre-Louis Bossart wrote:
> > >
> > >
> > > >>>> Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> > > >>>> 16/2 32 + 80 ~ 2.333 ms
> > > >> What are these numbers?  Are these lines supposed to in the format
> > > >> expressed by the first formula above?  If they are, how come
> > > >> "block_size/periods" shows up as a pair of numbers "16/2" but
> > > >> "block_size*periods" shows up as a single number "32"?
> > > >>
> > > > To interpret "16/2 32 + 80 ~ 2.333 ms"
> > > > Block size: 16 samples
> > > > Periods: 2 (one period for playback + one period for recording when
> > > > determining round trip latency)
> > > > The minimum round trip latency is: 16 * 2 = 32 samples
> > > > However, I measured 112 samples round trip latency which is an
> > > > additional delay of 80 samples (32 + 80 = 112).
> > > > 112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
> > > > ms measured round trip latency.
> > >
> > > ok, so what problem are you trying to fix?
> > >
> > > Are you concerned about the latency numbers (but then they seem lower
> > > on Linux and latency concerns with large buffers are a self-negating
> > > proposition)? are you concerned about the variable delay that doesn't
> > > seem to exist on MacOS or Windows? Are you trying to match the
> > > performance of the RME driver on MacOS?
> > >
> > > I am not sure how this comparison is done btw, the delay includes both
> > > buffering on the device side before reaching the analog parts as well
> > > as buffering on the OS side. While the former should be constant, the
> > > latter depends a great deal on implementation, not sure there are
> > > direct lessons to be applied to ALSA. I also see
> > > inconsistent/non-linear results where with a larger block size the
> > > delay is smaller, e.g.
> > >
> > > 256/2 512 + 650 ~ 24.208 ms
> > > 2048/3 6144 + 633 ~ 141.188 ms
> 
> >
> > Independently from the measurement done in this thread, actually,
> > there is a known latency source in the playback path in USB-audio
> > driver code -- which I mentioned in the audio mini conf in the last
> > year: namely, the USB-audio driver starts streaming at prepare time
> > for playback, not at the trigger-START time.  This is a sort of
> > workaround to make the device looking similar to the standard
> > ring-buffer behavior.
> >
> > Maybe moving the start at trigger (like the capture direction) would
> > reduce this artificial latency, but it makes the driver behaving in an
> > unexpected manner.  Then it may wake up for period_elapsed soon after
> > the stream start with a large runtime->delay value, as the data in
> > in-flight URBs are seen as already "processed".
> 
> I observed that snd_usb_pcm_prepare calls start_endpoints which ends
> up submitting silent urbs (prepared by prepare_silent_urb) until
> ep->prepare_data_urb is set by SNDRV_PCM_TRIGGER_START in
> snd_usb_substream_playback_trigger.
> 
> I tried to moving the start_endpoints call from snd_usb_pcm_prepare to
> snd_usb_substream_playback trigger's SNDRV_PCM_TRIGGER_START case (see
> https://github.com/net147/linux/commit/276eae5481653a2d4034fbae56f0d5bc579ecf67
> - it is enabled using start_playback_on_prepare=0 module option for
> snd-usb-audio) but I get a kernel stall in some cases with the
> following call trace:
> _raw_spin_lock+0x2c/0x30
> _snd_pcm_stream_lock_irqsave+0x31/0x60 [snd_pcm]
> snd_pcm_period_elapsed+0x26/0xb0 [snd_pcm]
> prepare_playback_urb+0x368/0x640 [snd_usb_audio]
> ? usb_submit_urb+0x3cb/0x590
> snd_usb_endpoint_start+0x148/0x300 [snd_usb_audio]
> start_endpoints+0x36/0x160 [snd_usb_audio]
> snd_usb_substream_playback_trigger+0x152/0x1a0 [snd_usb_audio]
> snd_pcm_action+0x117/0x150 [snd_pcm]
> snd_pcm_common_ioctl+0x588/0xdb0 [snd_pcm]
> ? mprotect_fixup+0x1ec/0x2f0
> snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> do_vfs_ioctl+0xa6/0x760
> ? syscall_trace_enter+0x1be/0x2b0
> __x64_sys_ioctl+0x62/0x90
> do_syscall_64+0x5b/0x170
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Any ideas?

This is because snd_pcm_period_elapsed() is called from
prepare_data_urb callback that is called also at start_endpoints().

I guess we'd need to move the hwptr accounting and
snd_pcm_period_elapsed() call into retire_data_urb callback in the
case of start-at-trigger for playback.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: snd-usb-audio Buffer Sizes and Round Trip Latency
  2019-04-24 14:05             ` Takashi Iwai
@ 2019-04-30 14:38               ` Takashi Iwai
  0 siblings, 0 replies; 12+ messages in thread
From: Takashi Iwai @ 2019-04-30 14:38 UTC (permalink / raw)
  To: Jonathan Liu
  Cc: Clemens Ladisch, ALSA development, Alan Stern, Pierre-Louis Bossart

On Wed, 24 Apr 2019 16:05:53 +0200,
Takashi Iwai wrote:
> 
> On Mon, 22 Apr 2019 12:50:15 +0200,
> Jonathan Liu wrote:
> > 
> > On Wed, 24 Oct 2018 at 18:13, Takashi Iwai <tiwai@suse.de> wrote:
> > >
> > > On Tue, 23 Oct 2018 16:08:22 +0200,
> > > Pierre-Louis Bossart wrote:
> > > >
> > > >
> > > > >>>> Linux 4.17.14, Class Compliant Mode (snd-usb-audio, ALSA backend):
> > > > >>>> 16/2 32 + 80 ~ 2.333 ms
> > > > >> What are these numbers?  Are these lines supposed to in the format
> > > > >> expressed by the first formula above?  If they are, how come
> > > > >> "block_size/periods" shows up as a pair of numbers "16/2" but
> > > > >> "block_size*periods" shows up as a single number "32"?
> > > > >>
> > > > > To interpret "16/2 32 + 80 ~ 2.333 ms"
> > > > > Block size: 16 samples
> > > > > Periods: 2 (one period for playback + one period for recording when
> > > > > determining round trip latency)
> > > > > The minimum round trip latency is: 16 * 2 = 32 samples
> > > > > However, I measured 112 samples round trip latency which is an
> > > > > additional delay of 80 samples (32 + 80 = 112).
> > > > > 112 samples at 48000 Hz is 112 / 48000 * 1000 is approximately 2.333
> > > > > ms measured round trip latency.
> > > >
> > > > ok, so what problem are you trying to fix?
> > > >
> > > > Are you concerned about the latency numbers (but then they seem lower
> > > > on Linux and latency concerns with large buffers are a self-negating
> > > > proposition)? are you concerned about the variable delay that doesn't
> > > > seem to exist on MacOS or Windows? Are you trying to match the
> > > > performance of the RME driver on MacOS?
> > > >
> > > > I am not sure how this comparison is done btw, the delay includes both
> > > > buffering on the device side before reaching the analog parts as well
> > > > as buffering on the OS side. While the former should be constant, the
> > > > latter depends a great deal on implementation, not sure there are
> > > > direct lessons to be applied to ALSA. I also see
> > > > inconsistent/non-linear results where with a larger block size the
> > > > delay is smaller, e.g.
> > > >
> > > > 256/2 512 + 650 ~ 24.208 ms
> > > > 2048/3 6144 + 633 ~ 141.188 ms
> > 
> > >
> > > Independently from the measurement done in this thread, actually,
> > > there is a known latency source in the playback path in USB-audio
> > > driver code -- which I mentioned in the audio mini conf in the last
> > > year: namely, the USB-audio driver starts streaming at prepare time
> > > for playback, not at the trigger-START time.  This is a sort of
> > > workaround to make the device looking similar to the standard
> > > ring-buffer behavior.
> > >
> > > Maybe moving the start at trigger (like the capture direction) would
> > > reduce this artificial latency, but it makes the driver behaving in an
> > > unexpected manner.  Then it may wake up for period_elapsed soon after
> > > the stream start with a large runtime->delay value, as the data in
> > > in-flight URBs are seen as already "processed".
> > 
> > I observed that snd_usb_pcm_prepare calls start_endpoints which ends
> > up submitting silent urbs (prepared by prepare_silent_urb) until
> > ep->prepare_data_urb is set by SNDRV_PCM_TRIGGER_START in
> > snd_usb_substream_playback_trigger.
> > 
> > I tried to moving the start_endpoints call from snd_usb_pcm_prepare to
> > snd_usb_substream_playback trigger's SNDRV_PCM_TRIGGER_START case (see
> > https://github.com/net147/linux/commit/276eae5481653a2d4034fbae56f0d5bc579ecf67
> > - it is enabled using start_playback_on_prepare=0 module option for
> > snd-usb-audio) but I get a kernel stall in some cases with the
> > following call trace:
> > _raw_spin_lock+0x2c/0x30
> > _snd_pcm_stream_lock_irqsave+0x31/0x60 [snd_pcm]
> > snd_pcm_period_elapsed+0x26/0xb0 [snd_pcm]
> > prepare_playback_urb+0x368/0x640 [snd_usb_audio]
> > ? usb_submit_urb+0x3cb/0x590
> > snd_usb_endpoint_start+0x148/0x300 [snd_usb_audio]
> > start_endpoints+0x36/0x160 [snd_usb_audio]
> > snd_usb_substream_playback_trigger+0x152/0x1a0 [snd_usb_audio]
> > snd_pcm_action+0x117/0x150 [snd_pcm]
> > snd_pcm_common_ioctl+0x588/0xdb0 [snd_pcm]
> > ? mprotect_fixup+0x1ec/0x2f0
> > snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > do_vfs_ioctl+0xa6/0x760
> > ? syscall_trace_enter+0x1be/0x2b0
> > __x64_sys_ioctl+0x62/0x90
> > do_syscall_64+0x5b/0x170
> > entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > Any ideas?
> 
> This is because snd_pcm_period_elapsed() is called from
> prepare_data_urb callback that is called also at start_endpoints().
> 
> I guess we'd need to move the hwptr accounting and
> snd_pcm_period_elapsed() call into retire_data_urb callback in the
> case of start-at-trigger for playback.

I meant something like below.  Only lightly tested.


Takashi

--- a/sound/usb/card.c
+++ b/sound/usb/card.c
@@ -84,6 +84,7 @@ static int pid[SNDRV_CARDS] = { [0 ... (SNDRV_CARDS-1)] = -1 };
 static int device_setup[SNDRV_CARDS]; /* device parameter for this card */
 static bool ignore_ctl_error;
 static bool autoclock = true;
+static bool lowlatency;
 static char *quirk_alias[SNDRV_CARDS];
 
 bool snd_usb_use_vmalloc = true;
@@ -105,6 +106,8 @@ MODULE_PARM_DESC(ignore_ctl_error,
 		 "Ignore errors from USB controller for mixer interfaces.");
 module_param(autoclock, bool, 0444);
 MODULE_PARM_DESC(autoclock, "Enable auto-clock selection for UAC2 devices (default: yes).");
+module_param(lowlatency, bool, 0444);
+MODULE_PARM_DESC(lowlatency, "Low latency playback");
 module_param_array(quirk_alias, charp, NULL, 0444);
 MODULE_PARM_DESC(quirk_alias, "Quirk aliases, e.g. 0123abcd:5678beef.");
 module_param_named(use_vmalloc, snd_usb_use_vmalloc, bool, 0444);
@@ -487,6 +490,7 @@ static int snd_usb_audio_create(struct usb_interface *intf,
 	chip->card = card;
 	chip->setup = device_setup[idx];
 	chip->autoclock = autoclock;
+	chip->lowlatency = lowlatency;
 	atomic_set(&chip->active, 1); /* avoid autopm during probing */
 	atomic_set(&chip->usage_count, 0);
 	atomic_set(&chip->shutdown, 0);
diff --git a/sound/usb/card.h b/sound/usb/card.h
index 79fa2a19fb7b..244c80ff8e33 100644
--- a/sound/usb/card.h
+++ b/sound/usb/card.h
@@ -48,6 +48,7 @@ struct snd_urb_ctx {
 	int index;	/* index for urb array */
 	int packets;	/* number of packets per urb */
 	int packet_size[MAX_PACKS_HS]; /* size of packets for next submission */
+	bool period_elapsed;
 	struct list_head ready_list;
 };
 
diff --git a/sound/usb/pcm.c b/sound/usb/pcm.c
index 056af0a57b22..165bf2de6a37 100644
--- a/sound/usb/pcm.c
+++ b/sound/usb/pcm.c
@@ -927,7 +927,8 @@ static int snd_usb_pcm_prepare(struct snd_pcm_substream *substream)
 
 	/* for playback, submit the URBs now; otherwise, the first hwptr_done
 	 * updates for all URBs would happen at the same time when starting */
-	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK)
+	if (subs->direction == SNDRV_PCM_STREAM_PLAYBACK &&
+	    !subs->stream->chip->lowlatency)
 		ret = start_endpoints(subs);
 
  unlock:
@@ -1542,7 +1543,7 @@ static void prepare_playback_urb(struct snd_usb_substream *subs,
 	struct snd_usb_endpoint *ep = subs->data_endpoint;
 	struct snd_urb_ctx *ctx = urb->context;
 	unsigned int counts, frames, bytes;
-	int i, stride, period_elapsed = 0;
+	int i, stride;
 	unsigned long flags;
 
 	stride = runtime->frame_bits >> 3;
@@ -1551,6 +1552,7 @@ static void prepare_playback_urb(struct snd_usb_substream *subs,
 	urb->number_of_packets = 0;
 	spin_lock_irqsave(&subs->lock, flags);
 	subs->frame_limit += ep->max_urb_frames;
+	ctx->period_elapsed = 0;
 	for (i = 0; i < ctx->packets; i++) {
 		if (ctx->packet_size[i])
 			counts = ctx->packet_size[i];
@@ -1566,7 +1568,7 @@ static void prepare_playback_urb(struct snd_usb_substream *subs,
 		if (subs->transfer_done >= runtime->period_size) {
 			subs->transfer_done -= runtime->period_size;
 			subs->frame_limit = 0;
-			period_elapsed = 1;
+			ctx->period_elapsed = 1;
 			if (subs->fmt_type == UAC_FORMAT_TYPE_II) {
 				if (subs->transfer_done > 0) {
 					/* FIXME: fill-max mode is not
@@ -1589,7 +1591,7 @@ static void prepare_playback_urb(struct snd_usb_substream *subs,
 			}
 		}
 		/* finish at the period boundary or after enough frames */
-		if ((period_elapsed ||
+		if ((ctx->period_elapsed ||
 				subs->transfer_done >= subs->frame_limit) &&
 		    !snd_usb_endpoint_implicit_feedback_sink(ep))
 			break;
@@ -1640,7 +1642,7 @@ static void prepare_playback_urb(struct snd_usb_substream *subs,
 
 	spin_unlock_irqrestore(&subs->lock, flags);
 	urb->transfer_buffer_length = bytes;
-	if (period_elapsed)
+	if (!subs->stream->chip->lowlatency && ctx->period_elapsed)
 		snd_pcm_period_elapsed(subs->pcm_substream);
 }
 
@@ -1654,6 +1656,7 @@ static void retire_playback_urb(struct snd_usb_substream *subs,
 	unsigned long flags;
 	struct snd_pcm_runtime *runtime = subs->pcm_substream->runtime;
 	struct snd_usb_endpoint *ep = subs->data_endpoint;
+	struct snd_urb_ctx *ctx = urb->context;
 	int processed = urb->transfer_buffer_length / ep->stride;
 	int est_delay;
 
@@ -1695,12 +1698,16 @@ static void retire_playback_urb(struct snd_usb_substream *subs,
 
  out:
 	spin_unlock_irqrestore(&subs->lock, flags);
+
+	if (subs->stream->chip->lowlatency && ctx->period_elapsed)
+		snd_pcm_period_elapsed(subs->pcm_substream);
 }
 
 static int snd_usb_substream_playback_trigger(struct snd_pcm_substream *substream,
 					      int cmd)
 {
 	struct snd_usb_substream *subs = substream->runtime->private_data;
+	int err;
 
 	switch (cmd) {
 	case SNDRV_PCM_TRIGGER_START:
@@ -1709,6 +1716,14 @@ static int snd_usb_substream_playback_trigger(struct snd_pcm_substream *substrea
 	case SNDRV_PCM_TRIGGER_PAUSE_RELEASE:
 		subs->data_endpoint->prepare_data_urb = prepare_playback_urb;
 		subs->data_endpoint->retire_data_urb = retire_playback_urb;
+		if (subs->stream->chip->lowlatency) {
+			err = start_endpoints(subs);
+			if (err < 0) {
+				subs->data_endpoint->prepare_data_urb = NULL;
+				subs->data_endpoint->retire_data_urb = NULL;
+				return err;
+			}
+		}
 		subs->running = 1;
 		return 0;
 	case SNDRV_PCM_TRIGGER_STOP:
diff --git a/sound/usb/usbaudio.h b/sound/usb/usbaudio.h
index b9faeca645fd..71bc58b11ca0 100644
--- a/sound/usb/usbaudio.h
+++ b/sound/usb/usbaudio.h
@@ -64,6 +64,7 @@ struct snd_usb_audio {
 	bool keep_iface;		/* keep interface/altset after closing
 					 * or parameter change
 					 */
+	bool lowlatency;
 
 	struct usb_host_interface *ctrl_intf;	/* the audio control interface */
 };

^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-04-30 14:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-17 12:58 snd-usb-audio Buffer Sizes and Round Trip Latency Jonathan Liu
2018-10-22 14:06 ` Pierre-Louis Bossart
2018-10-22 15:40   ` Alan Stern
2018-10-23 11:59     ` Jonathan Liu
2018-10-23 14:08       ` Pierre-Louis Bossart
2018-10-24  7:13         ` Takashi Iwai
2019-04-22 10:50           ` Jonathan Liu
2019-04-24 14:05             ` Takashi Iwai
2019-04-30 14:38               ` Takashi Iwai
2018-10-23 15:10       ` Alan Stern
2018-10-24  9:29         ` Jonathan Liu
2018-10-24 14:20           ` Alan Stern

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.