All of lore.kernel.org
 help / color / mirror / Atom feed
* dmix optimization
@ 2021-08-02  9:25 Giuliano Zannetti - ART S.p.A.
  2021-08-02 16:27 ` Geraldo Nascimento
  2021-08-03  7:26 ` I: " Giuliano Zannetti - ART S.p.A.
  0 siblings, 2 replies; 4+ messages in thread
From: Giuliano Zannetti - ART S.p.A. @ 2021-08-02  9:25 UTC (permalink / raw)
  To: alsa-devel

Hi,

I'm trying to optimize the dmix because I'm working with a big number of channels (up to 16) and in this case the dmix has not a negligible impact on performance.

I'm working with ALSA 1.1.9. I gave my first look to the generic_mix_areas_16_native function (https://github.com/alsa-project/alsa-lib/blob/v1.1.9/src/pcm/pcm_dmix_generic.c#L130).

I would ask you if I can avoid to check, for each loop iteration, if the current dst sample is not 0.

    for (;;) {
        sample = *src;
        if (! *dst) {
            *sum = sample;
            *dst = *src;
        } else {
            sample += *sum;
            *sum = sample;
            if (sample > 0x7fff)
                sample = 0x7fff;
            else if (sample < -0x8000)
                sample = -0x8000;
            *dst = sample;
        }
        if (!--size)
            return;
        src = (signed short *) ((char *)src + src_step);
        dst = (signed short *) ((char *)dst + dst_step);
        sum = (signed int *)   ((char *)sum + sum_step);
    }

Could it be possible check for the first sample of the period only, as reported in the code below? My assumption is that if dst[0] is 0 also dst[1] ... dst[period-1] will be 0, and I don't need to check every time. This is already an optimization, but it could be also a starting point for other optimization based on my HW. But, first of all, I would ask to you if my assumption is right.

    if (! *dst) {
        for (;;) {
            sample = *src;
            *sum = sample;
            *dst = *src;

            if (!--size)
                return;

            src = (signed short *) ((char *)src + src_step);
            dst = (signed short *) ((char *)dst + dst_step);
            sum = (signed int *)   ((char *)sum + sum_step);
        }

    } else {
        for (;;) {
            sample = *src;
            sample += *sum;
            *sum = sample;

            if (sample > 0x7fff)
                sample = 0x7fff;
            else if (sample < -0x8000)
                sample = -0x8000;
            *dst = sample;

            if (!--size)
                return;

            src = (signed short *) ((char *)src + src_step);
            dst = (signed short *) ((char *)dst + dst_step);
            sum = (signed int *)   ((char *)sum + sum_step);
        }
    }

Thank you!

Best Regards,
Giuliano

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: dmix optimization
  2021-08-02  9:25 dmix optimization Giuliano Zannetti - ART S.p.A.
@ 2021-08-02 16:27 ` Geraldo Nascimento
  2021-08-03  7:26 ` I: " Giuliano Zannetti - ART S.p.A.
  1 sibling, 0 replies; 4+ messages in thread
From: Geraldo Nascimento @ 2021-08-02 16:27 UTC (permalink / raw)
  To: Giuliano Zannetti - ART S.p.A.; +Cc: alsa-devel

Hello Giuliano,

My suggestion is to repost your question with Cc: to Jaroslav Kysela
and Takashi Iwai.

I *think* Takashi Iwai deals more with kernelspace and Jaroslav Kysela
is the one that can help you with alsa-lib.

This is a high volume list and sometimes mail will get ignored.

But Cc: the Maintainers and they will take notice and hopefully you'll
have your optimization feedback.

Thanks,
Geraldo Nascimento


On Mon, Aug 2, 2021 at 9:26 AM Giuliano Zannetti - ART S.p.A.
<giuliano.zannetti@artgroup-spa.com> wrote:
>
> Hi,
>
> I'm trying to optimize the dmix because I'm working with a big number of channels (up to 16) and in this case the dmix has not a negligible impact on performance.
>
> I'm working with ALSA 1.1.9. I gave my first look to the generic_mix_areas_16_native function (https://github.com/alsa-project/alsa-lib/blob/v1.1.9/src/pcm/pcm_dmix_generic.c#L130).
>
> I would ask you if I can avoid to check, for each loop iteration, if the current dst sample is not 0.
>
>     for (;;) {
>         sample = *src;
>         if (! *dst) {
>             *sum = sample;
>             *dst = *src;
>         } else {
>             sample += *sum;
>             *sum = sample;
>             if (sample > 0x7fff)
>                 sample = 0x7fff;
>             else if (sample < -0x8000)
>                 sample = -0x8000;
>             *dst = sample;
>         }
>         if (!--size)
>             return;
>         src = (signed short *) ((char *)src + src_step);
>         dst = (signed short *) ((char *)dst + dst_step);
>         sum = (signed int *)   ((char *)sum + sum_step);
>     }
>
> Could it be possible check for the first sample of the period only, as reported in the code below? My assumption is that if dst[0] is 0 also dst[1] ... dst[period-1] will be 0, and I don't need to check every time. This is already an optimization, but it could be also a starting point for other optimization based on my HW. But, first of all, I would ask to you if my assumption is right.
>
>     if (! *dst) {
>         for (;;) {
>             sample = *src;
>             *sum = sample;
>             *dst = *src;
>
>             if (!--size)
>                 return;
>
>             src = (signed short *) ((char *)src + src_step);
>             dst = (signed short *) ((char *)dst + dst_step);
>             sum = (signed int *)   ((char *)sum + sum_step);
>         }
>
>     } else {
>         for (;;) {
>             sample = *src;
>             sample += *sum;
>             *sum = sample;
>
>             if (sample > 0x7fff)
>                 sample = 0x7fff;
>             else if (sample < -0x8000)
>                 sample = -0x8000;
>             *dst = sample;
>
>             if (!--size)
>                 return;
>
>             src = (signed short *) ((char *)src + src_step);
>             dst = (signed short *) ((char *)dst + dst_step);
>             sum = (signed int *)   ((char *)sum + sum_step);
>         }
>     }
>
> Thank you!
>
> Best Regards,
> Giuliano

^ permalink raw reply	[flat|nested] 4+ messages in thread

* I: dmix optimization
  2021-08-02  9:25 dmix optimization Giuliano Zannetti - ART S.p.A.
  2021-08-02 16:27 ` Geraldo Nascimento
@ 2021-08-03  7:26 ` Giuliano Zannetti - ART S.p.A.
  2021-08-03  7:34   ` Takashi Iwai
  1 sibling, 1 reply; 4+ messages in thread
From: Giuliano Zannetti - ART S.p.A. @ 2021-08-03  7:26 UTC (permalink / raw)
  To: alsa-devel; +Cc: tiwai

Hi,

I'm trying to optimize the dmix because I'm working with a big number of channels (up to 16) and in this case the dmix has not a negligible impact on performance.

I'm working with ALSA 1.1.9. I gave my first look to the generic_mix_areas_16_native function (https://github.com/alsa-project/alsa-lib/blob/v1.1.9/src/pcm/pcm_dmix_generic.c#L130).

I would ask you if I can avoid to check, for each loop iteration, if the current dst sample is not 0.

    for (;;) {
        sample = *src;
        if (! *dst) {
            *sum = sample;
            *dst = *src;
        } else {
            sample += *sum;
            *sum = sample;
            if (sample > 0x7fff)
                sample = 0x7fff;
            else if (sample < -0x8000)
                sample = -0x8000;
            *dst = sample;
        }
        if (!--size)
            return;
        src = (signed short *) ((char *)src + src_step);
        dst = (signed short *) ((char *)dst + dst_step);
        sum = (signed int *)   ((char *)sum + sum_step);
    }

Could it be possible check for the first sample of the period only, as reported in the code below? My assumption is that if dst[0] is 0 also dst[1] ... dst[period-1] will be 0, and I don't need to check every time. This is already an optimization, but it could be also a starting point for other optimization based on my HW. But, first of all, I would ask to you if my assumption is right.

    if (! *dst) {
        for (;;) {
            sample = *src;
            *sum = sample;
            *dst = *src;

            if (!--size)
                return;

            src = (signed short *) ((char *)src + src_step);
            dst = (signed short *) ((char *)dst + dst_step);
            sum = (signed int *)   ((char *)sum + sum_step);
        }

    } else {
        for (;;) {
            sample = *src;
            sample += *sum;
            *sum = sample;

            if (sample > 0x7fff)
                sample = 0x7fff;
            else if (sample < -0x8000)
                sample = -0x8000;
            *dst = sample;

            if (!--size)
                return;

            src = (signed short *) ((char *)src + src_step);
            dst = (signed short *) ((char *)dst + dst_step);
            sum = (signed int *)   ((char *)sum + sum_step);
        }
    }

Thank you!

Best Regards,
Giuliano

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: I: dmix optimization
  2021-08-03  7:26 ` I: " Giuliano Zannetti - ART S.p.A.
@ 2021-08-03  7:34   ` Takashi Iwai
  0 siblings, 0 replies; 4+ messages in thread
From: Takashi Iwai @ 2021-08-03  7:34 UTC (permalink / raw)
  To: Giuliano Zannetti - ART S.p.A.; +Cc: alsa-devel, tiwai

On Tue, 03 Aug 2021 09:26:38 +0200,
Giuliano Zannetti - ART S.p.A. wrote:
> 
(snip) 
> Could it be possible check for the first sample of the period only, as
> reported in the code below?

No, unfortunately your suggested optimization won't work reliably, I'm
afraid.

Each application may write samples partially, not always in period
size.  Also, the hardware may clear the buffer right after the hwptr
is updated, and again, it's not always in period size.  So, just
checking the first sample doesn't guarantee the rest period size is
also zero or non-zero.


thanks,

Takashi

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-03  7:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-02  9:25 dmix optimization Giuliano Zannetti - ART S.p.A.
2021-08-02 16:27 ` Geraldo Nascimento
2021-08-03  7:26 ` I: " Giuliano Zannetti - ART S.p.A.
2021-08-03  7:34   ` Takashi Iwai

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.