[RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding

All of lore.kernel.org
 help / color / mirror / Atom feed

* [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
@ 2008-11-28 13:35 Jaska Uimonen
  2008-11-28 14:18 ` Marcel Holtmann
  2008-12-02 20:15 ` Jim Carter
  0 siblings, 2 replies; 35+ messages in thread
From: Jaska Uimonen @ 2008-11-28 13:35 UTC (permalink / raw)
  To: linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 925 bytes --]

Hi All,

I did some testing on the current 8 band fixed point 
encoding and it seems to attenuate frequencies below 800Hz 
and above 18kHz. There might be some other stuff happening 
also, because at least to me the bass seemed to lack some 
"definition".

I didn't quite understand how the current tables are calculated
and how the filtering works so I wrote a new filtering function
and calculated new filter tables for it. It is written
using 16 bit fixed point without any platform specific optimizations.
I only unrolled some loops etc. I tried to follow the
flow chart in MPEG-1 annex c.

With this new filtering the low and high frequencies are there, but
I haven't done any more thorough testing. At least it sounds
a little bit better to my ears :)

br,
Jaska Uimonen

P.S. I could have done some discussion with the list members 
about the current implementation, but I kind of got carried away. 
Sorry about that.

[-- Attachment #2: 0001-New-function-and-coefficients-for-8-band-fixed-point.patch --]
[-- Type: application/mbox, Size: 7292 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-11-28 13:35 [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding Jaska Uimonen
@ 2008-11-28 14:18 ` Marcel Holtmann
  2008-11-28 14:24   ` Jelle de Jong
  2008-11-28 15:14   ` Jaska Uimonen
  2008-12-02 20:15 ` Jim Carter
  1 sibling, 2 replies; 35+ messages in thread
From: Marcel Holtmann @ 2008-11-28 14:18 UTC (permalink / raw)
  To: Jaska Uimonen; +Cc: linux-bluetooth

Hi Jaska,

> I did some testing on the current 8 band fixed point 
> encoding and it seems to attenuate frequencies below 800Hz 
> and above 18kHz. There might be some other stuff happening 
> also, because at least to me the bass seemed to lack some 
> "definition".
> 
> I didn't quite understand how the current tables are calculated
> and how the filtering works so I wrote a new filtering function
> and calculated new filter tables for it. It is written
> using 16 bit fixed point without any platform specific optimizations.
> I only unrolled some loops etc. I tried to follow the
> flow chart in MPEG-1 annex c.
> 
> With this new filtering the low and high frequencies are there, but
> I haven't done any more thorough testing. At least it sounds
> a little bit better to my ears :)

thanks for looking at it. I am seriously lost when it comes to audio
codecs and my ears normally don't count for much.

So do you think we should throw all away any you start over providing a
correct implementation with fixed point integer and then we start
optimizing step by step (while testing against SBC conformance) or how
should we continue. For sure we have to fix our codec.

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-11-28 14:18 ` Marcel Holtmann
@ 2008-11-28 14:24   ` Jelle de Jong
  2008-11-28 15:20     ` Jaska Uimonen
  2008-11-28 15:14   ` Jaska Uimonen
  1 sibling, 1 reply; 35+ messages in thread
From: Jelle de Jong @ 2008-11-28 14:24 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: Jaska Uimonen, linux-bluetooth

Marcel Holtmann wrote:
> Hi Jaska,
> 
>> I did some testing on the current 8 band fixed point 
>> encoding and it seems to attenuate frequencies below 800Hz 
>> and above 18kHz. There might be some other stuff happening 
>> also, because at least to me the bass seemed to lack some 
>> "definition".
>>
>> I didn't quite understand how the current tables are calculated
>> and how the filtering works so I wrote a new filtering function
>> and calculated new filter tables for it. It is written
>> using 16 bit fixed point without any platform specific optimizations.
>> I only unrolled some loops etc. I tried to follow the
>> flow chart in MPEG-1 annex c.
>>
>> With this new filtering the low and high frequencies are there, but
>> I haven't done any more thorough testing. At least it sounds
>> a little bit better to my ears :)
> 
> thanks for looking at it. I am seriously lost when it comes to audio
> codecs and my ears normally don't count for much.
> 
> So do you think we should throw all away any you start over providing a
> correct implementation with fixed point integer and then we start
> optimizing step by step (while testing against SBC conformance) or how
> should we continue. For sure we have to fix our codec.
> 
> Regards
> 
> Marcel

Hi Jaska, Thank you so much for improving the codecs :-D it was on my
long standing wish list. You want to receive a few Dutch stroopwafels
for your efforts :-D

Is the patch you provided working against the latest git? Marcel would
you be willing to review the patch for code style, hidden issues etcetera.

I would love to test this patch this weekend and apply it to the latest git.

I am an heavy stereo bluetooth user and will notice glitches and quality
distortions on my bluetooth speakers and headsets.

Best regards,

Jelle

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-11-28 14:24   ` Jelle de Jong
@ 2008-11-28 15:20     ` Jaska Uimonen
  2008-11-28 18:13       ` David Sainty
  0 siblings, 1 reply; 35+ messages in thread
From: Jaska Uimonen @ 2008-11-28 15:20 UTC (permalink / raw)
  To: ext Jelle de Jong; +Cc: linux-bluetooth

Hi,

I did the git clone this morning and made the patch on top of that.

Johan already came to tell me that I have whitespace instead 
of tabs in the changed parts. Sorry about that, I'll fix it. 
He said it will take 1 minutes to fix that but it was 
at least 1/2 hour with emacs :)

br,
Jaska



On Fri, 2008-11-28 at 15:24 +0100, ext Jelle de Jong wrote:
> Marcel Holtmann wrote:
> > Hi Jaska,
> > 
> >> I did some testing on the current 8 band fixed point 
> >> encoding and it seems to attenuate frequencies below 800Hz 
> >> and above 18kHz. There might be some other stuff happening 
> >> also, because at least to me the bass seemed to lack some 
> >> "definition".
> >>
> >> I didn't quite understand how the current tables are calculated
> >> and how the filtering works so I wrote a new filtering function
> >> and calculated new filter tables for it. It is written
> >> using 16 bit fixed point without any platform specific optimizations.
> >> I only unrolled some loops etc. I tried to follow the
> >> flow chart in MPEG-1 annex c.
> >>
> >> With this new filtering the low and high frequencies are there, but
> >> I haven't done any more thorough testing. At least it sounds
> >> a little bit better to my ears :)
> > 
> > thanks for looking at it. I am seriously lost when it comes to audio
> > codecs and my ears normally don't count for much.
> > 
> > So do you think we should throw all away any you start over providing a
> > correct implementation with fixed point integer and then we start
> > optimizing step by step (while testing against SBC conformance) or how
> > should we continue. For sure we have to fix our codec.
> > 
> > Regards
> > 
> > Marcel
> 
> Hi Jaska, Thank you so much for improving the codecs :-D it was on my
> long standing wish list. You want to receive a few Dutch stroopwafels
> for your efforts :-D
> 
> Is the patch you provided working against the latest git? Marcel would
> you be willing to review the patch for code style, hidden issues etcetera.
> 
> I would love to test this patch this weekend and apply it to the latest git.
> 
> I am an heavy stereo bluetooth user and will notice glitches and quality
> distortions on my bluetooth speakers and headsets.
> 
> Best regards,
> 
> Jelle


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-11-28 15:20     ` Jaska Uimonen
@ 2008-11-28 18:13       ` David Sainty
  0 siblings, 0 replies; 35+ messages in thread
From: David Sainty @ 2008-11-28 18:13 UTC (permalink / raw)
  To: Jaska Uimonen; +Cc: ext Jelle de Jong, linux-bluetooth

Jaska Uimonen wrote:
> Hi,
>
> I did the git clone this morning and made the patch on top of that.
>
> Johan already came to tell me that I have whitespace instead 
> of tabs in the changed parts. Sorry about that, I'll fix it. 
> He said it will take 1 minutes to fix that but it was 
> at least 1/2 hour with emacs :)
>
> br,
> Jaska
>
>   

M-x tabify ?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-11-28 14:18 ` Marcel Holtmann
  2008-11-28 14:24   ` Jelle de Jong
@ 2008-11-28 15:14   ` Jaska Uimonen
  1 sibling, 0 replies; 35+ messages in thread
From: Jaska Uimonen @ 2008-11-28 15:14 UTC (permalink / raw)
  To: ext Marcel Holtmann; +Cc: linux-bluetooth

Hi,

So in general I would just like to hear comments 
about the quality on this one. So does it fix the 
quality issues or is there still some problems. 

If someone just has the measurement system ready and would 
be able to run some sound through... This is all 
16 bit fixed point and we have an option to go to 32 bit 
if necessary.

About the optimizations... I myself use a vectorizing 
compiler, which optimizes this kind of C-code very 
well. So we could keep the "reference" fix point 
in the code base (if it actually works now...) and 
then do optimizations to different platform if 
one wishes. But I'm really open to how you would like 
to do it.

br,
Jaska
  


On Fri, 2008-11-28 at 15:18 +0100, ext Marcel Holtmann wrote:
> Hi Jaska,
> 
> > I did some testing on the current 8 band fixed point 
> > encoding and it seems to attenuate frequencies below 800Hz 
> > and above 18kHz. There might be some other stuff happening 
> > also, because at least to me the bass seemed to lack some 
> > "definition".
> > 
> > I didn't quite understand how the current tables are calculated
> > and how the filtering works so I wrote a new filtering function
> > and calculated new filter tables for it. It is written
> > using 16 bit fixed point without any platform specific optimizations.
> > I only unrolled some loops etc. I tried to follow the
> > flow chart in MPEG-1 annex c.
> > 
> > With this new filtering the low and high frequencies are there, but
> > I haven't done any more thorough testing. At least it sounds
> > a little bit better to my ears :)
> 
> thanks for looking at it. I am seriously lost when it comes to audio
> codecs and my ears normally don't count for much.
> 
> So do you think we should throw all away any you start over providing a
> correct implementation with fixed point integer and then we start
> optimizing step by step (while testing against SBC conformance) or how
> should we continue. For sure we have to fix our codec.
> 
> Regards
> 
> Marcel
> 
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-11-28 13:35 [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding Jaska Uimonen
  2008-11-28 14:18 ` Marcel Holtmann
@ 2008-12-02 20:15 ` Jim Carter
  2008-12-12 17:14   ` Siarhei Siamashka
  1 sibling, 1 reply; 35+ messages in thread
From: Jim Carter @ 2008-12-02 20:15 UTC (permalink / raw)
  To: Jaska Uimonen; +Cc: linux-bluetooth

On Fri, 28 Nov 2008, Jaska Uimonen wrote:

> I did some testing on the current 8 band fixed point 
> encoding and it seems to attenuate frequencies below 800Hz 
> and above 18kHz. There might be some other stuff happening 
> also, because at least to me the bass seemed to lack some 
> "definition".

I have the same subjective impression: on (inexpensive) Motorola HT-820 
phones, A2DP is noticeably anemic compared to the same phones with the 
provided wire.

> I didn't quite understand how the current tables are calculated
> and how the filtering works so I wrote a new filtering function
> and calculated new filter tables for it. It is written
> using 16 bit fixed point without any platform specific optimizations.
> I only unrolled some loops etc. I tried to follow the
> flow chart in MPEG-1 annex c.

This is very exciting!  But it's been about 1.5 years since I read the A2DP 
spec so I can't remember: does the headphone require standard band 
breakpoints?  Or are the breakpoints implicit in the encoding, so if you 
adjust them to make better use of the Bluetooth bandwidth, the headphone 
will automatically go along?


James F. Carter          Voice 310 825 2897    FAX 310 206 6673
UCLA-Mathnet;  6115 MSA; 405 Hilgard Ave.; Los Angeles, CA, USA 90095-1555
Email: jimc@math.ucla.edu  http://www.math.ucla.edu/~jimc (q.v. for PGP key)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-02 20:15 ` Jim Carter
@ 2008-12-12 17:14   ` Siarhei Siamashka
  2008-12-12 19:19     ` Brad Midgley
  0 siblings, 1 reply; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-12 17:14 UTC (permalink / raw)
  To: Jaska Uimonen; +Cc: linux-bluetooth

On Fri, 28 Nov 2008, Jaska Uimonen wrote:
> I did some testing on the current 8 band fixed point
> encoding and it seems to attenuate frequencies below 800Hz
> and above 18kHz. There might be some other stuff happening
> also, because at least to me the bass seemed to lack some
> "definition".
> I didn't quite understand how the current tables are calculated
> and how the filtering works so I wrote a new filtering function
> and calculated new filter tables for it. It is written
> using 16 bit fixed point without any platform specific optimizations.
> I only unrolled some loops etc. I tried to follow the
> flow chart in MPEG-1 annex c.

> +/*
> + * to produce this Q15 format table:
> + *
> + * Get the filter coeffs from the spec and multiply them by 2^15.
> + */
> +static const signed short _sbc_proto_fixed8[80] = {
> +       0, 5, 11, 18, 26, 37, 48, 58, 65, 68,
> +       65, 52, 29, -5, -54, -114, 185, 263, 342, 417,
> +       480, 521, 531, 501, 423, 290, 95, -161, -479, -855,
> +       -1280, -1742, 2228, 2719, 3197, 3643, 4039, 4366, 4612, 4764,
> +       4815, 4764, 4612, 4366, 4039, 3643, 3197, 2719, -2228, -1742,
> +       -1280, -855, -479, -161, 95, 290, 423, 501, 531, 521,
> +       480, 417, 342, 263, -185, -114, -54, -5, 29, 52,
> +       65, 68, 65, 58, 48, 37, 26, 18, 11, 5
> +};

Just remembered to check this. Precision and audio quality should be a bit
better if the original floating values are not truncated, but rounded when put
into tables.

For example, the fifth element is 26 in _sbc_proto_fixed8 table, but it was
26.998194372608 after multiplication and before conversion to integer.
Using 27 would have been a bit more correct here.

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-12 17:14   ` Siarhei Siamashka
@ 2008-12-12 19:19     ` Brad Midgley
  2008-12-15 12:54       ` Siarhei Siamashka
  0 siblings, 1 reply; 35+ messages in thread
From: Brad Midgley @ 2008-12-12 19:19 UTC (permalink / raw)
  To: Siarhei Siamashka; +Cc: Jaska Uimonen, linux-bluetooth

Guys

One mistake we made was not keeping track of what functions we were
applying to the published tables as we worked things to look less and
less like the pseudocode in the spec. So if you start again from one
of the tables in the spec, keep a comment like

/* ourtable(x) = (int32)(table1(x) << 16) */

Which in this case would mean that in ourtable we've shifted the
original table1 float value left 16 bits and truncated it to an int32.

I also combined tables or split tables to simplify our loop logic or
eliminate operations; an explanation in a comment would have been
appropriate.

Brad

On Fri, Dec 12, 2008 at 10:14 AM, Siarhei Siamashka
<siarhei.siamashka@nokia.com> wrote:
> On Fri, 28 Nov 2008, Jaska Uimonen wrote:
>> I did some testing on the current 8 band fixed point
>> encoding and it seems to attenuate frequencies below 800Hz
>> and above 18kHz. There might be some other stuff happening
>> also, because at least to me the bass seemed to lack some
>> "definition".
>> I didn't quite understand how the current tables are calculated
>> and how the filtering works so I wrote a new filtering function
>> and calculated new filter tables for it. It is written
>> using 16 bit fixed point without any platform specific optimizations.
>> I only unrolled some loops etc. I tried to follow the
>> flow chart in MPEG-1 annex c.
>
>> +/*
>> + * to produce this Q15 format table:
>> + *
>> + * Get the filter coeffs from the spec and multiply them by 2^15.
>> + */
>> +static const signed short _sbc_proto_fixed8[80] = {
>> +       0, 5, 11, 18, 26, 37, 48, 58, 65, 68,
>> +       65, 52, 29, -5, -54, -114, 185, 263, 342, 417,
>> +       480, 521, 531, 501, 423, 290, 95, -161, -479, -855,
>> +       -1280, -1742, 2228, 2719, 3197, 3643, 4039, 4366, 4612, 4764,
>> +       4815, 4764, 4612, 4366, 4039, 3643, 3197, 2719, -2228, -1742,
>> +       -1280, -855, -479, -161, 95, 290, 423, 501, 531, 521,
>> +       480, 417, 342, 263, -185, -114, -54, -5, 29, 52,
>> +       65, 68, 65, 58, 48, 37, 26, 18, 11, 5
>> +};
>
> Just remembered to check this. Precision and audio quality should be a bit
> better if the original floating values are not truncated, but rounded when put
> into tables.
>
> For example, the fifth element is 26 in _sbc_proto_fixed8 table, but it was
> 26.998194372608 after multiplication and before conversion to integer.
> Using 27 would have been a bit more correct here.
>
> --
> Best regards,
> Siarhei Siamashka
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Brad Midgley

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-12 19:19     ` Brad Midgley
@ 2008-12-15 12:54       ` Siarhei Siamashka
  2008-12-15 15:16         ` Brad Midgley
  0 siblings, 1 reply; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-15 12:54 UTC (permalink / raw)
  To: ext Brad Midgley; +Cc: Jaska Uimonen, linux-bluetooth

On Friday 12 December 2008 21:19:20 ext Brad Midgley wrote:
> Guys
>
> One mistake we made was not keeping track of what functions we were
> applying to the published tables as we worked things to look less and
> less like the pseudocode in the spec. So if you start again from one
> of the tables in the spec, keep a comment like
>
> /* ourtable(x) = (int32)(table1(x) << 16) */

This part can be probably done just by having a macro, something like
#define F_TO_Q15(x) (int16_t)((x>0) ? ((x)*(1<<15)+0.5) : ((x)*(1<<15)-0.5))
And then using plain floating point numbers from the SBC specification in the
table, wrapped into this macro. Though I wonder if it is possible to use such
conditional expression in the static table initializer list with all versions
of gcc/other compilers.

> Which in this case would mean that in ourtable we've shifted the
> original table1 float value left 16 bits and truncated it to an int32.
>
> I also combined tables or split tables to simplify our loop logic or
> eliminate operations; an explanation in a comment would have been
> appropriate.

Yes, any transformations or simplifications should be extensively commented.
So that it will be always possible to reproduce them or verify their
correctness.

Can anybody try to remember/explain what transformations were applied to
the existing fixed point implementation?

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-15 12:54       ` Siarhei Siamashka
@ 2008-12-15 15:16         ` Brad Midgley
  2008-12-16 22:37           ` Siarhei Siamashka
  0 siblings, 1 reply; 35+ messages in thread
From: Brad Midgley @ 2008-12-15 15:16 UTC (permalink / raw)
  To: Siarhei Siamashka; +Cc: Jaska Uimonen, linux-bluetooth

Siarhei

I like your idea of using a macro with the original floating point
tables, as long as we know it is done at compile time, not runtime :)

> Can anybody try to remember/explain what transformations were applied to
> the existing fixed point implementation?

it was done by several people and the only record we have is in cvs.
(part of it is in the old btsco project's cvs)

-- 
Brad Midgley

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-15 15:16         ` Brad Midgley
@ 2008-12-16 22:37           ` Siarhei Siamashka
  2008-12-17  8:16             ` Jaska Uimonen
  2008-12-19 22:12             ` Siarhei Siamashka
  0 siblings, 2 replies; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-16 22:37 UTC (permalink / raw)
  To: ext Brad Midgley; +Cc: Jaska Uimonen, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 1667 bytes --]

On Monday 15 December 2008 17:16:58 ext Brad Midgley wrote:
> I like your idea of using a macro with the original floating point
> tables, as long as we know it is done at compile time, not runtime :)

What about something like this modification to Jaska's patch? It contains
floating point constants wrapped into a macro.

This version is using 16-bit multiplications only (additional natural change
would be just to convert 'sbc_encoder_state->' to int16_t because it does not
need to be int32_t), which is good for performance for the platforms with fast
16-bit integer multiplication. But it is also flexible enough to be changed to
use 32x32->64 multiplications just by replacing FIXED_A and FIXED_T types
to int64_t and int32_t respectively (for better precision or experiments with
conformance testing).

> > Can anybody try to remember/explain what transformations were applied to
> > the existing fixed point implementation?
>
> it was done by several people and the only record we have is in cvs.
> (part of it is in the old btsco project's cvs)

Regarding the code optimizations. Looking at the tables, It can be seen that
'cos_table_fixed_8[0+hop]' is always equal to 'cos_table_fixed_8[8+hop]'.
The same is true for 'cos_table_fixed_8[1+hop]' and 'cos_table_fixed_8[7+hop]'
So it is possible to join 't1[0] + t1[8]', 't1[1]+ t1[7]' and the other such
pairs, effectively halving the number of counters. This looks very much like
the optimization that was applied to the current fixed point code :)

But now it would be very interesting to see if the conformance tests pass
rate is better with the new filtering function.


Best regards,
Siarhei Siamashka

[-- Attachment #2: analyze_eight_modified.diff --]
[-- Type: text/x-diff, Size: 13939 bytes --]

diff --git a/sbc/sbc.c b/sbc/sbc.c
index 7072673..656abe5 100644
--- a/sbc/sbc.c
+++ b/sbc/sbc.c
@@ -40,6 +40,7 @@
 #include <string.h>
 #include <stdlib.h>
 #include <sys/types.h>
+#include <limits.h>
 
 #include "sbc_math.h"
 #include "sbc_tables.h"
@@ -742,124 +743,107 @@ static inline void sbc_analyze_four(struct sbc_encoder_state *state,
 
 static inline void _sbc_analyze_eight(const int32_t *in, int32_t *out)
 {
-	sbc_fixed_t t[8], s[8];
-
-	t[0] = SCALE8_STAGE1( /* Q10 */
-		MULA(_sbc_proto_8[0], (in[16] - in[64]), /* Q18 = Q18 * Q0 */
-		MULA(_sbc_proto_8[1], (in[32] - in[48]),
-		MULA(_sbc_proto_8[2], in[4],
-		MULA(_sbc_proto_8[3], in[20],
-		MULA(_sbc_proto_8[4], in[36],
-		MUL( _sbc_proto_8[5], in[52])))))));
-
-	t[1] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[6], in[2],
-		MULA(_sbc_proto_8[7], in[18],
-		MULA(_sbc_proto_8[8], in[34],
-		MULA(_sbc_proto_8[9], in[50],
-		MUL(_sbc_proto_8[10], in[66]))))));
-
-	t[2] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[11], in[1],
-		MULA(_sbc_proto_8[12], in[17],
-		MULA(_sbc_proto_8[13], in[33],
-		MULA(_sbc_proto_8[14], in[49],
-		MULA(_sbc_proto_8[15], in[65],
-		MULA(_sbc_proto_8[16], in[3],
-		MULA(_sbc_proto_8[17], in[19],
-		MULA(_sbc_proto_8[18], in[35],
-		MULA(_sbc_proto_8[19], in[51],
-		MUL( _sbc_proto_8[20], in[67])))))))))));
-
-	t[3] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[21], in[5],
-		MULA( _sbc_proto_8[22], in[21],
-		MULA( _sbc_proto_8[23], in[37],
-		MULA( _sbc_proto_8[24], in[53],
-		MULA( _sbc_proto_8[25], in[69],
-		MULA(-_sbc_proto_8[15], in[15],
-		MULA(-_sbc_proto_8[14], in[31],
-		MULA(-_sbc_proto_8[13], in[47],
-		MULA(-_sbc_proto_8[12], in[63],
-		MUL( -_sbc_proto_8[11], in[79])))))))))));
-
-	t[4] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[26], in[6],
-		MULA( _sbc_proto_8[27], in[22],
-		MULA( _sbc_proto_8[28], in[38],
-		MULA( _sbc_proto_8[29], in[54],
-		MULA( _sbc_proto_8[30], in[70],
-		MULA(-_sbc_proto_8[10], in[14],
-		MULA(-_sbc_proto_8[9], in[30],
-		MULA(-_sbc_proto_8[8], in[46],
-		MULA(-_sbc_proto_8[7], in[62],
-		MUL( -_sbc_proto_8[6], in[78])))))))))));
-
-	t[5] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[31], in[7],
-		MULA( _sbc_proto_8[32], in[23],
-		MULA( _sbc_proto_8[33], in[39],
-		MULA( _sbc_proto_8[34], in[55],
-		MULA( _sbc_proto_8[35], in[71],
-		MULA(-_sbc_proto_8[20], in[13],
-		MULA(-_sbc_proto_8[19], in[29],
-		MULA(-_sbc_proto_8[18], in[45],
-		MULA(-_sbc_proto_8[17], in[61],
-		MUL( -_sbc_proto_8[16], in[77])))))))))));
-
-	t[6] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[36], (in[8] + in[72]),
-		MULA( _sbc_proto_8[37], (in[24] + in[56]),
-		MULA( _sbc_proto_8[38], in[40],
-		MULA(-_sbc_proto_8[39], in[12],
-		MULA(-_sbc_proto_8[5], in[28],
-		MULA(-_sbc_proto_8[4], in[44],
-		MULA(-_sbc_proto_8[3], in[60],
-		MUL( -_sbc_proto_8[2], in[76])))))))));
-
-	t[7] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[35], in[9],
-		MULA( _sbc_proto_8[34], in[25],
-		MULA( _sbc_proto_8[33], in[41],
-		MULA( _sbc_proto_8[32], in[57],
-		MULA( _sbc_proto_8[31], in[73],
-		MULA(-_sbc_proto_8[25], in[11],
-		MULA(-_sbc_proto_8[24], in[27],
-		MULA(-_sbc_proto_8[23], in[43],
-		MULA(-_sbc_proto_8[22], in[59],
-		MUL( -_sbc_proto_8[21], in[75])))))))))));
-
-	s[0] = MULA(  _anamatrix8[0], t[0],
-		MUL(  _anamatrix8[1], t[6]));
-	s[1] = MUL(   _anamatrix8[7], t[1]);
-	s[2] = MULA(  _anamatrix8[2], t[2],
-		MULA( _anamatrix8[3], t[3],
-		MULA( _anamatrix8[4], t[5],
-		MUL(  _anamatrix8[5], t[7]))));
-	s[3] = MUL(   _anamatrix8[6], t[4]);
-	s[4] = MULA(  _anamatrix8[3], t[2],
-		MULA(-_anamatrix8[5], t[3],
-		MULA(-_anamatrix8[2], t[5],
-		MUL( -_anamatrix8[4], t[7]))));
-	s[5] = MULA(  _anamatrix8[4], t[2],
-		MULA(-_anamatrix8[2], t[3],
-		MULA( _anamatrix8[5], t[5],
-		MUL(  _anamatrix8[3], t[7]))));
-	s[6] = MULA(  _anamatrix8[1], t[0],
-		MUL( -_anamatrix8[0], t[6]));
-	s[7] = MULA(  _anamatrix8[5], t[2],
-		MULA(-_anamatrix8[4], t[3],
-		MULA( _anamatrix8[3], t[5],
-		MUL( -_anamatrix8[2], t[7]))));
-
-	out[0] = SCALE8_STAGE2( s[0] + s[1] + s[2] + s[3]);
-	out[1] = SCALE8_STAGE2( s[1] - s[3] + s[4] + s[6]);
-	out[2] = SCALE8_STAGE2( s[1] - s[3] + s[5] - s[6]);
-	out[3] = SCALE8_STAGE2(-s[0] + s[1] + s[3] + s[7]);
-	out[4] = SCALE8_STAGE2(-s[0] + s[1] + s[3] - s[7]);
-	out[5] = SCALE8_STAGE2( s[1] - s[3] - s[5] - s[6]);
-	out[6] = SCALE8_STAGE2( s[1] - s[3] - s[4] + s[6]);
-	out[7] = SCALE8_STAGE2( s[0] + s[1] - s[2] + s[3]);
+	FIXED_A t1[16];
+	FIXED_T t2[16];
+	int i = 0, hop = 0, R = 0;
+
+	/* rounding coefficient */
+	R = 1 << (SBC_PROTO_FIXED8_SCALE-1);
+
+	/* low pass polyphase filter */
+	t1[0] =  (FIXED_A)in[0] * _sbc_proto_fixed8[0];
+	t1[1] =  (FIXED_A)in[1] * _sbc_proto_fixed8[1];
+	t1[2] =  (FIXED_A)in[2] * _sbc_proto_fixed8[2];
+	t1[3] =  (FIXED_A)in[3] * _sbc_proto_fixed8[3];
+	t1[4] =  (FIXED_A)in[4] * _sbc_proto_fixed8[4];
+	t1[5] =  (FIXED_A)in[5] * _sbc_proto_fixed8[5];
+	t1[6] =  (FIXED_A)in[6] * _sbc_proto_fixed8[6];
+	t1[7] =  (FIXED_A)in[7] * _sbc_proto_fixed8[7];
+	t1[8] =  (FIXED_A)in[8] * _sbc_proto_fixed8[8];
+	t1[9] =  (FIXED_A)in[9] * _sbc_proto_fixed8[9];
+	t1[10] = (FIXED_A)in[10] * _sbc_proto_fixed8[10];
+	t1[11] = (FIXED_A)in[11] * _sbc_proto_fixed8[11];
+	/* t1[12] = (FIXED_A)in[12] * _sbc_proto_fixed8[12]; */
+	t1[13] = (FIXED_A)in[13] * _sbc_proto_fixed8[13];
+	t1[14] = (FIXED_A)in[14] * _sbc_proto_fixed8[14];
+	t1[15] = (FIXED_A)in[15] * _sbc_proto_fixed8[15];
+
+	hop = 16;
+	for (i = 0; i < 4; i++) {
+		t1[0] +=  (FIXED_A)in[hop] * _sbc_proto_fixed8[hop];
+		t1[1] +=  (FIXED_A)in[hop + 1] * _sbc_proto_fixed8[hop + 1];
+		t1[2] +=  (FIXED_A)in[hop + 2] * _sbc_proto_fixed8[hop + 2];
+		t1[3] +=  (FIXED_A)in[hop + 3] * _sbc_proto_fixed8[hop + 3];
+		t1[4] +=  (FIXED_A)in[hop + 4] * _sbc_proto_fixed8[hop + 4];
+		t1[5] +=  (FIXED_A)in[hop + 5] * _sbc_proto_fixed8[hop + 5];
+		t1[6] +=  (FIXED_A)in[hop + 6] * _sbc_proto_fixed8[hop + 6];
+		t1[7] +=  (FIXED_A)in[hop + 7] * _sbc_proto_fixed8[hop + 7];
+		t1[8] +=  (FIXED_A)in[hop + 8] * _sbc_proto_fixed8[hop + 8];
+		t1[9] +=  (FIXED_A)in[hop + 9] * _sbc_proto_fixed8[hop + 9];
+		t1[10] += (FIXED_A)in[hop + 10] * _sbc_proto_fixed8[hop + 10];
+		t1[11] += (FIXED_A)in[hop + 11] * _sbc_proto_fixed8[hop + 11];
+		/* t1[12] += (FIXED_A)in[hop + 12] * _sbc_proto_fixed8[hop + 12]; */
+		t1[13] += (FIXED_A)in[hop + 13] * _sbc_proto_fixed8[hop + 13];
+		t1[14] += (FIXED_A)in[hop + 14] * _sbc_proto_fixed8[hop + 14];
+		t1[15] += (FIXED_A)in[hop + 15] * _sbc_proto_fixed8[hop + 15];
+		
+		hop += 16;
+	}
+
+	/* scaling */
+	t2[0] = (t1[0] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[1] = (t1[1] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[2] = (t1[2] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[3] = (t1[3] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[4] = (t1[4] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[5] = (t1[5] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[6] = (t1[6] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[7] = (t1[7] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[8] = (t1[8] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[9] = (t1[9] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[10] = (t1[10] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[11] = (t1[11] + R) >> SBC_PROTO_FIXED8_SCALE;
+	/* t2[12] = (t1[12] + R) >> SBC_PROTO_FIXED8_SCALE; */
+	t2[13] = (t1[13] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[14] = (t1[14] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[15] = (t1[15] + R) >> SBC_PROTO_FIXED8_SCALE;
+
+	R = 1 << (SBC_COS_TABLE_FIXED8_SCALE-1);
+
+	/* do the cos transform */
+	hop = 0;
+	for (i = 0; i < 8; i++) {
+		t1[i]  = (FIXED_A)t2[0] * cos_table_fixed_8[0 + hop];
+		t1[i] += (FIXED_A)t2[1] * cos_table_fixed_8[1 + hop];
+		t1[i] += (FIXED_A)t2[2] * cos_table_fixed_8[2 + hop];
+		t1[i] += (FIXED_A)t2[3] * cos_table_fixed_8[3 + hop];
+		/* cos_table_fixed_8[4 + hop] = 1.0 */
+		t1[i] += (FIXED_A)t2[4] << SBC_COS_TABLE_FIXED8_SCALE;
+		t1[i] += (FIXED_A)t2[5] * cos_table_fixed_8[5 + hop];
+		t1[i] += (FIXED_A)t2[6] * cos_table_fixed_8[6 + hop];
+		t1[i] += (FIXED_A)t2[7] * cos_table_fixed_8[7 + hop];
+		t1[i] += (FIXED_A)t2[8] * cos_table_fixed_8[8 + hop];
+		t1[i] += (FIXED_A)t2[9] * cos_table_fixed_8[9 + hop];
+		t1[i] += (FIXED_A)t2[10] * cos_table_fixed_8[10 + hop];
+		t1[i] += (FIXED_A)t2[11] * cos_table_fixed_8[11 + hop];
+		/* cos_table_fixed_8[12 + hop] = 0.0 */
+		/* t1[i] += (FIXED_A)t2[12] * cos_table_fixed_8[12 + hop]; */
+		t1[i] += (FIXED_A)t2[13] * cos_table_fixed_8[13 + hop];
+		t1[i] += (FIXED_A)t2[14] * cos_table_fixed_8[14 + hop];
+		t1[i] += (FIXED_A)t2[15] * cos_table_fixed_8[15 + hop];
+
+		hop += 16;
+	}
+
+	/* scaling */
+	out[0] = (t1[0] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
+	out[1] = (t1[1] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
+	out[2] = (t1[2] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
+	out[3] = (t1[3] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
+	out[4] = (t1[4] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
+	out[5] = (t1[5] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
+	out[6] = (t1[6] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
+	out[7] = (t1[7] + R) >> SBC_COS_TABLE_FIXED8_SCALE;
 }
 
 static inline void sbc_analyze_eight(struct sbc_encoder_state *state,
diff --git a/sbc/sbc_tables.h b/sbc/sbc_tables.h
index f5daaa7..6e63f48 100644
--- a/sbc/sbc_tables.h
+++ b/sbc/sbc_tables.h
@@ -166,3 +166,81 @@ static const int32_t synmatrix8[16][8] = {
 	{ SN8(0xf9592678), SN8(0x018f8b84), SN8(0x07d8a5f0), SN8(0x0471ced0),
 	  SN8(0xfb8e3130), SN8(0xf8275a10), SN8(0xfe70747c), SN8(0x06a6d988) }
 };
+
+/* data type for fixed point accumulator */
+#define FIXED_A int32_t
+/* data type for fixed point constants */
+#define FIXED_T int16_t
+
+/* A2DP specification: Section 12.8 Tables */
+#define SBC_PROTO_FIXED8_SCALE (sizeof(FIXED_T)*CHAR_BIT - 1)
+#define F(x) (FIXED_T)((x)*((FIXED_A)1<<SBC_PROTO_FIXED8_SCALE)+0.5)
+static const FIXED_T _sbc_proto_fixed8[80] = {
+	 F(0.00000000E+00), F(1.56575398E-04), F(3.43256425E-04), F(5.54620202E-04),
+	 F(8.23919506E-04), F(1.13992507E-03), F(1.47640169E-03), F(1.78371725E-03),
+	 F(2.01182542E-03), F(2.10371989E-03), F(1.99454554E-03), F(1.61656283E-03),
+	 F(9.02154502E-04),-F(1.78805361E-04),-F(1.64973098E-03),-F(3.49717454E-03),
+	 F(5.65949473E-03), F(8.02941163E-03), F(1.04584443E-02), F(1.27472335E-02),
+	 F(1.46525263E-02), F(1.59045603E-02), F(1.62208471E-02), F(1.53184106E-02),
+	 F(1.29371806E-02), F(8.85757540E-03), F(2.92408442E-03),-F(4.91578024E-03),
+	-F(1.46404076E-02),-F(2.61098752E-02),-F(3.90751381E-02),-F(5.31873032E-02),
+	 F(6.79989431E-02), F(8.29847578E-02), F(9.75753918E-02), F(1.11196689E-01),
+	 F(1.23264548E-01), F(1.33264415E-01), F(1.40753505E-01), F(1.45389847E-01),
+	 F(1.46955068E-01), F(1.45389847E-01), F(1.40753505E-01), F(1.33264415E-01),
+	 F(1.23264548E-01), F(1.11196689E-01), F(9.75753918E-02), F(8.29847578E-02),
+	-F(6.79989431E-02),-F(5.31873032E-02),-F(3.90751381E-02),-F(2.61098752E-02),
+	-F(1.46404076E-02),-F(4.91578024E-03), F(2.92408442E-03), F(8.85757540E-03),
+	 F(1.29371806E-02), F(1.53184106E-02), F(1.62208471E-02), F(1.59045603E-02),
+	 F(1.46525263E-02), F(1.27472335E-02), F(1.04584443E-02), F(8.02941163E-03),
+	-F(5.65949473E-03),-F(3.49717454E-03),-F(1.64973098E-03),-F(1.78805361E-04),
+	 F(9.02154502E-04), F(1.61656283E-03), F(1.99454554E-03), F(2.10371989E-03),
+	 F(2.01182542E-03), F(1.78371725E-03), F(1.47640169E-03), F(1.13992507E-03),
+	 F(8.23919506E-04), F(5.54620202E-04), F(3.43256425E-04), F(1.56575398E-04),
+};
+#undef F
+
+/*
+ * To produce this cosine matrix in Octave:
+ *
+ * b = zeros(8, 16);
+ * for i = 0:7 for j = 0:15 b(i+1, j+1) = cos( (i + 0.5) * (j - 4) * (pi/8) ) endfor endfor;
+ * printf("%.10f, ", b');
+ *
+ */
+#define SBC_COS_TABLE_FIXED8_SCALE (sizeof(FIXED_T)*CHAR_BIT - 1)
+#define F(x) (FIXED_T)((x)*((FIXED_A)1<<SBC_COS_TABLE_FIXED8_SCALE)+0.5)
+static const FIXED_T cos_table_fixed_8[128] = {
+	 F(0.7071067812), F(0.8314696123), F(0.9238795325), F(0.9807852804),
+	 F(1.0000000000), F(0.9807852804), F(0.9238795325), F(0.8314696123),
+	 F(0.7071067812), F(0.5555702330), F(0.3826834324), F(0.1950903220),
+	 F(0.0000000000),-F(0.1950903220),-F(0.3826834324),-F(0.5555702330),
+	-F(0.7071067812),-F(0.1950903220), F(0.3826834324), F(0.8314696123),
+	 F(1.0000000000), F(0.8314696123), F(0.3826834324),-F(0.1950903220),
+	-F(0.7071067812),-F(0.9807852804),-F(0.9238795325),-F(0.5555702330),
+	-F(0.0000000000), F(0.5555702330), F(0.9238795325), F(0.9807852804),
+	-F(0.7071067812),-F(0.9807852804),-F(0.3826834324), F(0.5555702330),
+	 F(1.0000000000), F(0.5555702330),-F(0.3826834324),-F(0.9807852804),
+	-F(0.7071067812), F(0.1950903220), F(0.9238795325), F(0.8314696123),
+	 F(0.0000000000),-F(0.8314696123),-F(0.9238795325),-F(0.1950903220),
+	 F(0.7071067812),-F(0.5555702330),-F(0.9238795325), F(0.1950903220),
+	 F(1.0000000000), F(0.1950903220),-F(0.9238795325),-F(0.5555702330),
+	 F(0.7071067812), F(0.8314696123),-F(0.3826834324),-F(0.9807852804),
+	-F(0.0000000000), F(0.9807852804), F(0.3826834324),-F(0.8314696123),
+	 F(0.7071067812), F(0.5555702330),-F(0.9238795325),-F(0.1950903220),
+	 F(1.0000000000),-F(0.1950903220),-F(0.9238795325), F(0.5555702330),
+	 F(0.7071067812),-F(0.8314696123),-F(0.3826834324), F(0.9807852804),
+	 F(0.0000000000),-F(0.9807852804), F(0.3826834324), F(0.8314696123),
+	-F(0.7071067812), F(0.9807852804),-F(0.3826834324),-F(0.5555702330),
+	 F(1.0000000000),-F(0.5555702330),-F(0.3826834324), F(0.9807852804),
+	-F(0.7071067812),-F(0.1950903220), F(0.9238795325),-F(0.8314696123),
+	-F(0.0000000000), F(0.8314696123),-F(0.9238795325), F(0.1950903220),
+	-F(0.7071067812), F(0.1950903220), F(0.3826834324),-F(0.8314696123),
+	 F(1.0000000000),-F(0.8314696123), F(0.3826834324), F(0.1950903220),
+	-F(0.7071067812), F(0.9807852804),-F(0.9238795325), F(0.5555702330),
+	-F(0.0000000000),-F(0.5555702330), F(0.9238795325),-F(0.9807852804),
+	 F(0.7071067812),-F(0.8314696123), F(0.9238795325),-F(0.9807852804),
+	 F(1.0000000000),-F(0.9807852804), F(0.9238795325),-F(0.8314696123),
+	 F(0.7071067812),-F(0.5555702330), F(0.3826834324),-F(0.1950903220),
+	-F(0.0000000000), F(0.1950903220),-F(0.3826834324), F(0.5555702330),
+};
+#undef F

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-16 22:37           ` Siarhei Siamashka
@ 2008-12-17  8:16             ` Jaska Uimonen
  2008-12-19 22:12             ` Siarhei Siamashka
  1 sibling, 0 replies; 35+ messages in thread
From: Jaska Uimonen @ 2008-12-17  8:16 UTC (permalink / raw)
  To: Siarhei Siamashka; +Cc: ext Brad Midgley, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 2905 bytes --]

Hello All,

Sorry about the silence, I was away from the office for 
couple of weeks.

Anyway here is the original patch modified to suit the 
bluez coding conventions. There might still be something 
(I'm quite bad with these issues...)

I also added Siarhei's rounding stuff to the filter 
tables. Good point Siarhei!

Concerning Brad's comments I think the calculation 
of tables should now be quite clear. There are 
comments on the tables how to produce them in Octave and 
the filtering function is very basic fixed point Q15 
calculation. 

I agree that including the original float values to the 
code would probably be a good idea. Then we could just 
use some macro at compile time to modify them to suitable 
Q format and original values would always be there to 
be viewed and compared to.

I would still be careful about the optimization of the 
filtering, so that we don't obfuscate the code too much. 
The filter and cosine table are quite small so we should 
consider how much we save in computation. Anyway if the savings 
are significant let's just add them there.

br,
Jaska Uimonen

On Wed, 2008-12-17 at 00:37 +0200, Siarhei Siamashka wrote:
> On Monday 15 December 2008 17:16:58 ext Brad Midgley wrote:
> > I like your idea of using a macro with the original floating point
> > tables, as long as we know it is done at compile time, not runtime :)
> 
> What about something like this modification to Jaska's patch? It contains
> floating point constants wrapped into a macro.
> 
> This version is using 16-bit multiplications only (additional natural change
> would be just to convert 'sbc_encoder_state->' to int16_t because it does not
> need to be int32_t), which is good for performance for the platforms with fast
> 16-bit integer multiplication. But it is also flexible enough to be changed to
> use 32x32->64 multiplications just by replacing FIXED_A and FIXED_T types
> to int64_t and int32_t respectively (for better precision or experiments with
> conformance testing).
> 
> > > Can anybody try to remember/explain what transformations were applied to
> > > the existing fixed point implementation?
> >
> > it was done by several people and the only record we have is in cvs.
> > (part of it is in the old btsco project's cvs)
> 
> Regarding the code optimizations. Looking at the tables, It can be seen that
> 'cos_table_fixed_8[0+hop]' is always equal to 'cos_table_fixed_8[8+hop]'.
> The same is true for 'cos_table_fixed_8[1+hop]' and 'cos_table_fixed_8[7+hop]'
> So it is possible to join 't1[0] + t1[8]', 't1[1]+ t1[7]' and the other such
> pairs, effectively halving the number of counters. This looks very much like
> the optimization that was applied to the current fixed point code :)
> 
> But now it would be very interesting to see if the conformance tests pass
> rate is better with the new filtering function.
> 
> 
> Best regards,
> Siarhei Siamashka

[-- Attachment #2: 0001-New-function-and-tables-for-8-band-fixed-point-analy.patch --]
[-- Type: application/mbox, Size: 7221 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-16 22:37           ` Siarhei Siamashka
  2008-12-17  8:16             ` Jaska Uimonen
@ 2008-12-19 22:12             ` Siarhei Siamashka
  2008-12-22 23:30               ` Siarhei Siamashka
  1 sibling, 1 reply; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-19 22:12 UTC (permalink / raw)
  To: ext Brad Midgley; +Cc: Jaska Uimonen, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 3524 bytes --]

On Wednesday 17 December 2008 00:37:48 ext Siarhei Siamashka wrote:
> On Monday 15 December 2008 17:16:58 ext Brad Midgley wrote:
> > I like your idea of using a macro with the original floating point
> > tables, as long as we know it is done at compile time, not runtime :)
>
> What about something like this modification to Jaska's patch? It contains
> floating point constants wrapped into a macro.
>
> This version is using 16-bit multiplications only (additional natural
> change would be just to convert 'sbc_encoder_state->' to int16_t because it
> does not need to be int32_t), which is good for performance for the
> platforms with fast 16-bit integer multiplication. But it is also flexible
> enough to be changed to use 32x32->64 multiplications just by replacing
> FIXED_A and FIXED_T types to int64_t and int32_t respectively (for better
> precision or experiments with conformance testing).
>
> > > Can anybody try to remember/explain what transformations were applied
> > > to the existing fixed point implementation?
> >
> > it was done by several people and the only record we have is in cvs.
> > (part of it is in the old btsco project's cvs)
>
> Regarding the code optimizations. Looking at the tables, It can be seen
> that 'cos_table_fixed_8[0+hop]' is always equal to
> 'cos_table_fixed_8[8+hop]'. The same is true for 'cos_table_fixed_8[1+hop]'
> and 'cos_table_fixed_8[7+hop]' So it is possible to join 't1[0] + t1[8]',
> 't1[1]+ t1[7]' and the other such pairs, effectively halving the number of
> counters. This looks very much like the optimization that was applied to
> the current fixed point code :)
>
> But now it would be very interesting to see if the conformance tests pass
> rate is better with the new filtering function.

Here is one more attempt at improving filtering function. Now I tried to get
the best possible audio quality (using 32-bit fixed point).

16-bit version of filtering function can be enabled by just commenting
out '#define SBC_HIGH_PRECISION' line

The improvements include fixing a problem in scalefactors processing code.
Here we don't want to use the absolute value just because it is possible to
encode more negative values than positive values with the same number
of bits:
-	while (scalefactor[ch][sb] < fabs(frame->sb_sample_f[blk][ch][sb])) {
+	while ((scalefactor[ch][sb] << SCALE_OUT_BITS) <= 
neginv(frame->sb_sample_f[blk][ch][sb])) {

Another quality improvement is achieved by keeping more bits in the output of
filtering function, thus avoiding unnecessary precision loss on quantizing
stage.

Both of these changes also naturally improve audio quality for the 16-bit
variant.

We had a talk with Jaska Uimonen here, and now I'm kind of delegated to finish
the work on this filtering function for SBC encoder (including the final
addition of ARM assembly optimizations).  He provided me with his last variant
of code, which contains some more optimizations to reduce the number of
operations and also loops unrolling. I will add his changes to the patch on
next iteration.

Now the question is how to best integrate a fixed filtering function to git
repository? If I just continue adding changes to the patch in order to make it
a faster, it will be also not so obvious to see how we got to these code
transformations just from the commit log.

I intentionally keep posting work-in-progress variants just to keep track of
the history at least in this mailing list archive :)

As always, feedback is very much welcome.

Best regards,
Siarhei Siamashka

[-- Attachment #2: analyze_eight_modified_32bit.diff --]
[-- Type: text/x-diff, Size: 17214 bytes --]

diff --git a/sbc/sbc.c b/sbc/sbc.c
index 5411893..873c370 100644
--- a/sbc/sbc.c
+++ b/sbc/sbc.c
@@ -40,6 +40,7 @@
 #include <string.h>
 #include <stdlib.h>
 #include <sys/types.h>
+#include <limits.h>
 
 #include "sbc_math.h"
 #include "sbc_tables.h"
@@ -742,124 +743,108 @@ static inline void sbc_analyze_four(struct sbc_encoder_state *state,
 
 static inline void _sbc_analyze_eight(const int32_t *in, int32_t *out)
 {
-	sbc_fixed_t t[8], s[8];
-
-	t[0] = SCALE8_STAGE1( /* Q10 */
-		MULA(_sbc_proto_8[0], (in[16] - in[64]), /* Q18 = Q18 * Q0 */
-		MULA(_sbc_proto_8[1], (in[32] - in[48]),
-		MULA(_sbc_proto_8[2], in[4],
-		MULA(_sbc_proto_8[3], in[20],
-		MULA(_sbc_proto_8[4], in[36],
-		MUL( _sbc_proto_8[5], in[52])))))));
-
-	t[1] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[6], in[2],
-		MULA(_sbc_proto_8[7], in[18],
-		MULA(_sbc_proto_8[8], in[34],
-		MULA(_sbc_proto_8[9], in[50],
-		MUL(_sbc_proto_8[10], in[66]))))));
-
-	t[2] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[11], in[1],
-		MULA(_sbc_proto_8[12], in[17],
-		MULA(_sbc_proto_8[13], in[33],
-		MULA(_sbc_proto_8[14], in[49],
-		MULA(_sbc_proto_8[15], in[65],
-		MULA(_sbc_proto_8[16], in[3],
-		MULA(_sbc_proto_8[17], in[19],
-		MULA(_sbc_proto_8[18], in[35],
-		MULA(_sbc_proto_8[19], in[51],
-		MUL( _sbc_proto_8[20], in[67])))))))))));
-
-	t[3] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[21], in[5],
-		MULA( _sbc_proto_8[22], in[21],
-		MULA( _sbc_proto_8[23], in[37],
-		MULA( _sbc_proto_8[24], in[53],
-		MULA( _sbc_proto_8[25], in[69],
-		MULA(-_sbc_proto_8[15], in[15],
-		MULA(-_sbc_proto_8[14], in[31],
-		MULA(-_sbc_proto_8[13], in[47],
-		MULA(-_sbc_proto_8[12], in[63],
-		MUL( -_sbc_proto_8[11], in[79])))))))))));
-
-	t[4] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[26], in[6],
-		MULA( _sbc_proto_8[27], in[22],
-		MULA( _sbc_proto_8[28], in[38],
-		MULA( _sbc_proto_8[29], in[54],
-		MULA( _sbc_proto_8[30], in[70],
-		MULA(-_sbc_proto_8[10], in[14],
-		MULA(-_sbc_proto_8[9], in[30],
-		MULA(-_sbc_proto_8[8], in[46],
-		MULA(-_sbc_proto_8[7], in[62],
-		MUL( -_sbc_proto_8[6], in[78])))))))))));
-
-	t[5] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[31], in[7],
-		MULA( _sbc_proto_8[32], in[23],
-		MULA( _sbc_proto_8[33], in[39],
-		MULA( _sbc_proto_8[34], in[55],
-		MULA( _sbc_proto_8[35], in[71],
-		MULA(-_sbc_proto_8[20], in[13],
-		MULA(-_sbc_proto_8[19], in[29],
-		MULA(-_sbc_proto_8[18], in[45],
-		MULA(-_sbc_proto_8[17], in[61],
-		MUL( -_sbc_proto_8[16], in[77])))))))))));
-
-	t[6] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[36], (in[8] + in[72]),
-		MULA( _sbc_proto_8[37], (in[24] + in[56]),
-		MULA( _sbc_proto_8[38], in[40],
-		MULA(-_sbc_proto_8[39], in[12],
-		MULA(-_sbc_proto_8[5], in[28],
-		MULA(-_sbc_proto_8[4], in[44],
-		MULA(-_sbc_proto_8[3], in[60],
-		MUL( -_sbc_proto_8[2], in[76])))))))));
-
-	t[7] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[35], in[9],
-		MULA( _sbc_proto_8[34], in[25],
-		MULA( _sbc_proto_8[33], in[41],
-		MULA( _sbc_proto_8[32], in[57],
-		MULA( _sbc_proto_8[31], in[73],
-		MULA(-_sbc_proto_8[25], in[11],
-		MULA(-_sbc_proto_8[24], in[27],
-		MULA(-_sbc_proto_8[23], in[43],
-		MULA(-_sbc_proto_8[22], in[59],
-		MUL( -_sbc_proto_8[21], in[75])))))))))));
-
-	s[0] = MULA(  _anamatrix8[0], t[0],
-		MUL(  _anamatrix8[1], t[6]));
-	s[1] = MUL(   _anamatrix8[7], t[1]);
-	s[2] = MULA(  _anamatrix8[2], t[2],
-		MULA( _anamatrix8[3], t[3],
-		MULA( _anamatrix8[4], t[5],
-		MUL(  _anamatrix8[5], t[7]))));
-	s[3] = MUL(   _anamatrix8[6], t[4]);
-	s[4] = MULA(  _anamatrix8[3], t[2],
-		MULA(-_anamatrix8[5], t[3],
-		MULA(-_anamatrix8[2], t[5],
-		MUL( -_anamatrix8[4], t[7]))));
-	s[5] = MULA(  _anamatrix8[4], t[2],
-		MULA(-_anamatrix8[2], t[3],
-		MULA( _anamatrix8[5], t[5],
-		MUL(  _anamatrix8[3], t[7]))));
-	s[6] = MULA(  _anamatrix8[1], t[0],
-		MUL( -_anamatrix8[0], t[6]));
-	s[7] = MULA(  _anamatrix8[5], t[2],
-		MULA(-_anamatrix8[4], t[3],
-		MULA( _anamatrix8[3], t[5],
-		MUL( -_anamatrix8[2], t[7]))));
-
-	out[0] = SCALE8_STAGE2( s[0] + s[1] + s[2] + s[3]);
-	out[1] = SCALE8_STAGE2( s[1] - s[3] + s[4] + s[6]);
-	out[2] = SCALE8_STAGE2( s[1] - s[3] + s[5] - s[6]);
-	out[3] = SCALE8_STAGE2(-s[0] + s[1] + s[3] + s[7]);
-	out[4] = SCALE8_STAGE2(-s[0] + s[1] + s[3] - s[7]);
-	out[5] = SCALE8_STAGE2( s[1] - s[3] - s[5] - s[6]);
-	out[6] = SCALE8_STAGE2( s[1] - s[3] - s[4] + s[6]);
-	out[7] = SCALE8_STAGE2( s[0] + s[1] - s[2] + s[3]);
+	FIXED_A t1[16];
+	FIXED_T t2[16];
+	FIXED_A R;
+	int i, hop;
+
+	/* rounding coefficient */
+	R = (FIXED_A)1 << (SBC_PROTO_FIXED8_SCALE-1);
+
+	/* low pass polyphase filter */
+	t1[0] =  (FIXED_A)in[0] * _sbc_proto_fixed8[0];
+	t1[1] =  (FIXED_A)in[1] * _sbc_proto_fixed8[1];
+	t1[2] =  (FIXED_A)in[2] * _sbc_proto_fixed8[2];
+	t1[3] =  (FIXED_A)in[3] * _sbc_proto_fixed8[3];
+	t1[4] =  (FIXED_A)in[4] * _sbc_proto_fixed8[4];
+	t1[5] =  (FIXED_A)in[5] * _sbc_proto_fixed8[5];
+	t1[6] =  (FIXED_A)in[6] * _sbc_proto_fixed8[6];
+	t1[7] =  (FIXED_A)in[7] * _sbc_proto_fixed8[7];
+	t1[8] =  (FIXED_A)in[8] * _sbc_proto_fixed8[8];
+	t1[9] =  (FIXED_A)in[9] * _sbc_proto_fixed8[9];
+	t1[10] = (FIXED_A)in[10] * _sbc_proto_fixed8[10];
+	t1[11] = (FIXED_A)in[11] * _sbc_proto_fixed8[11];
+	/* t1[12] = (FIXED_A)in[12] * _sbc_proto_fixed8[12]; */
+	t1[13] = (FIXED_A)in[13] * _sbc_proto_fixed8[13];
+	t1[14] = (FIXED_A)in[14] * _sbc_proto_fixed8[14];
+	t1[15] = (FIXED_A)in[15] * _sbc_proto_fixed8[15];
+
+	hop = 16;
+	for (i = 0; i < 4; i++) {
+		t1[0] +=  (FIXED_A)in[hop] * _sbc_proto_fixed8[hop];
+		t1[1] +=  (FIXED_A)in[hop + 1] * _sbc_proto_fixed8[hop + 1];
+		t1[2] +=  (FIXED_A)in[hop + 2] * _sbc_proto_fixed8[hop + 2];
+		t1[3] +=  (FIXED_A)in[hop + 3] * _sbc_proto_fixed8[hop + 3];
+		t1[4] +=  (FIXED_A)in[hop + 4] * _sbc_proto_fixed8[hop + 4];
+		t1[5] +=  (FIXED_A)in[hop + 5] * _sbc_proto_fixed8[hop + 5];
+		t1[6] +=  (FIXED_A)in[hop + 6] * _sbc_proto_fixed8[hop + 6];
+		t1[7] +=  (FIXED_A)in[hop + 7] * _sbc_proto_fixed8[hop + 7];
+		t1[8] +=  (FIXED_A)in[hop + 8] * _sbc_proto_fixed8[hop + 8];
+		t1[9] +=  (FIXED_A)in[hop + 9] * _sbc_proto_fixed8[hop + 9];
+		t1[10] += (FIXED_A)in[hop + 10] * _sbc_proto_fixed8[hop + 10];
+		t1[11] += (FIXED_A)in[hop + 11] * _sbc_proto_fixed8[hop + 11];
+		/* t1[12] += (FIXED_A)in[hop + 12] * _sbc_proto_fixed8[hop + 12]; */
+		t1[13] += (FIXED_A)in[hop + 13] * _sbc_proto_fixed8[hop + 13];
+		t1[14] += (FIXED_A)in[hop + 14] * _sbc_proto_fixed8[hop + 14];
+		t1[15] += (FIXED_A)in[hop + 15] * _sbc_proto_fixed8[hop + 15];
+		
+		hop += 16;
+	}
+
+	/* scaling */
+	t2[0] = (t1[0] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[1] = (t1[1] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[2] = (t1[2] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[3] = (t1[3] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[4] = (t1[4] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[5] = (t1[5] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[6] = (t1[6] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[7] = (t1[7] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[8] = (t1[8] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[9] = (t1[9] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[10] = (t1[10] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[11] = (t1[11] + R) >> SBC_PROTO_FIXED8_SCALE;
+	/* t2[12] = (t1[12] + R) >> SBC_PROTO_FIXED8_SCALE; */
+	t2[13] = (t1[13] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[14] = (t1[14] + R) >> SBC_PROTO_FIXED8_SCALE;
+	t2[15] = (t1[15] + R) >> SBC_PROTO_FIXED8_SCALE;
+
+	R = (FIXED_A)1 << (SBC_COS_TABLE_FIXED8_SCALE-1-SCALE_OUT_BITS);
+
+	/* do the cos transform */
+	hop = 0;
+	for (i = 0; i < 8; i++) {
+		t1[i]  = (FIXED_A)t2[0] * cos_table_fixed_8[0 + hop];
+		t1[i] += (FIXED_A)t2[1] * cos_table_fixed_8[1 + hop];
+		t1[i] += (FIXED_A)t2[2] * cos_table_fixed_8[2 + hop];
+		t1[i] += (FIXED_A)t2[3] * cos_table_fixed_8[3 + hop];
+		/* cos_table_fixed_8[4 + hop] = 1.0 */
+		t1[i] += (FIXED_A)t2[4] << (sizeof(FIXED_T)*CHAR_BIT-1);
+		t1[i] += (FIXED_A)t2[5] * cos_table_fixed_8[5 + hop];
+		t1[i] += (FIXED_A)t2[6] * cos_table_fixed_8[6 + hop];
+		t1[i] += (FIXED_A)t2[7] * cos_table_fixed_8[7 + hop];
+		t1[i] += (FIXED_A)t2[8] * cos_table_fixed_8[8 + hop];
+		t1[i] += (FIXED_A)t2[9] * cos_table_fixed_8[9 + hop];
+		t1[i] += (FIXED_A)t2[10] * cos_table_fixed_8[10 + hop];
+		t1[i] += (FIXED_A)t2[11] * cos_table_fixed_8[11 + hop];
+		/* cos_table_fixed_8[12 + hop] = 0.0 */
+		/* t1[i] += (FIXED_A)t2[12] * cos_table_fixed_8[12 + hop]; */
+		t1[i] += (FIXED_A)t2[13] * cos_table_fixed_8[13 + hop];
+		t1[i] += (FIXED_A)t2[14] * cos_table_fixed_8[14 + hop];
+		t1[i] += (FIXED_A)t2[15] * cos_table_fixed_8[15 + hop];
+
+		hop += 16;
+	}
+
+	/* scaling */
+	out[0] = (t1[0] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[1] = (t1[1] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[2] = (t1[2] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[3] = (t1[3] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[4] = (t1[4] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[5] = (t1[5] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[6] = (t1[6] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[7] = (t1[7] + R) >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
 }
 
 static inline void sbc_analyze_eight(struct sbc_encoder_state *state,
@@ -1006,7 +991,7 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 			frame->scale_factor[ch][sb] = 0;
 			scalefactor[ch][sb] = 2;
 			for (blk = 0; blk < frame->blocks; blk++) {
-				while (scalefactor[ch][sb] < fabs(frame->sb_sample_f[blk][ch][sb])) {
+				while ((scalefactor[ch][sb] << SCALE_OUT_BITS) <= neginv(frame->sb_sample_f[blk][ch][sb])) {
 					frame->scale_factor[ch][sb]++;
 					scalefactor[ch][sb] *= 2;
 				}
@@ -1040,11 +1025,11 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 						frame->sb_sample_f[blk][1][sb]) >> 1;
 
 				/* calculate scale_factor_j and scalefactor_j for joint case */
-				while (scalefactor_j[0] < fabs(sb_sample_j[blk][0])) {
+				while ((scalefactor_j[0] << SCALE_OUT_BITS) <= neginv(sb_sample_j[blk][0])) {
 					scale_factor_j[0]++;
 					scalefactor_j[0] *= 2;
 				}
-				while (scalefactor_j[1] < fabs(sb_sample_j[blk][1])) {
+				while ((scalefactor_j[1] << SCALE_OUT_BITS) <= neginv(sb_sample_j[blk][1])) {
 					scale_factor_j[1]++;
 					scalefactor_j[1] *= 2;
 				}
@@ -1100,11 +1085,11 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 		for (ch = 0; ch < frame->channels; ch++) {
 			for (sb = 0; sb < frame->subbands; sb++) {
 				if (levels[ch][sb] > 0) {
-					audio_sample =
-						(uint16_t) (((((int64_t)frame->sb_sample_f[blk][ch][sb]*levels[ch][sb]) >>
-									(frame->scale_factor[ch][sb] + 1)) +
-								levels[ch][sb]) >> 1);
-					PUT_BITS(audio_sample & levels[ch][sb], bits[ch][sb]);
+					int32_t sample = frame->sb_sample_f[blk][ch][sb];
+					int32_t s_shift = (frame->scale_factor[ch][sb] + 1 + SCALE_OUT_BITS);
+					int32_t ls = levels[ch][sb];
+					audio_sample = ((((int64_t)1 << s_shift) + sample) * ls) >> (s_shift + 1);
+					PUT_BITS(audio_sample, bits[ch][sb]);
 				}
 			}
 		}
diff --git a/sbc/sbc_math.h b/sbc/sbc_math.h
index b3d87a6..b53e3d1 100644
--- a/sbc/sbc_math.h
+++ b/sbc/sbc_math.h
@@ -23,12 +23,14 @@
  *
  */
 
-#define fabs(x) ((x) < 0 ? -(x) : (x))
+#define neginv(x) ((x) < 0 ? ~(x) : (x))
 /* C does not provide an explicit arithmetic shift right but this will
    always be correct and every compiler *should* generate optimal code */
 #define ASR(val, bits) ((-2 >> 1 == -1) ? \
 		 ((int32_t)(val)) >> (bits) : ((int32_t) (val)) / (1 << (bits)))
 
+#define SCALE_OUT_BITS 14
+
 #define SCALE_PROTO4_TBL	15
 #define SCALE_ANA4_TBL		17
 #define SCALE_PROTO8_TBL	16
@@ -38,7 +40,7 @@
 #define SCALE_NPROTO4_TBL	11
 #define SCALE_NPROTO8_TBL	11
 #define SCALE4_STAGE1_BITS	15
-#define SCALE4_STAGE2_BITS	16
+#define SCALE4_STAGE2_BITS	(16-SCALE_OUT_BITS)
 #define SCALE4_STAGED1_BITS	15
 #define SCALE4_STAGED2_BITS	16
 #define SCALE8_STAGE1_BITS	15
diff --git a/sbc/sbc_tables.h b/sbc/sbc_tables.h
index f5daaa7..eeea7b7 100644
--- a/sbc/sbc_tables.h
+++ b/sbc/sbc_tables.h
@@ -166,3 +166,88 @@ static const int32_t synmatrix8[16][8] = {
 	{ SN8(0xf9592678), SN8(0x018f8b84), SN8(0x07d8a5f0), SN8(0x0471ced0),
 	  SN8(0xfb8e3130), SN8(0xf8275a10), SN8(0xfe70747c), SN8(0x06a6d988) }
 };
+
+#define SBC_HIGH_PRECISION
+
+#ifdef SBC_HIGH_PRECISION
+# define FIXED_A int64_t /* data type for fixed point accumulator */
+# define FIXED_T int32_t /* data type for fixed point constants */
+# define SBC_FIXED8_EXTRA_BITS 15
+#else
+# define FIXED_A int32_t /* data type for fixed point accumulator */
+# define FIXED_T int16_t /* data type for fixed point constants */
+# define SBC_FIXED8_EXTRA_BITS 0
+#endif
+
+/* A2DP specification: Section 12.8 Tables */
+#define SBC_PROTO_FIXED8_SCALE (sizeof(FIXED_T)*CHAR_BIT-1-SBC_FIXED8_EXTRA_BITS)
+#define F(x) (FIXED_T)(FIXED_A)((x)*((FIXED_A)1<<(sizeof(FIXED_T)*CHAR_BIT-1))+0.5)
+static const FIXED_T _sbc_proto_fixed8[80] = {
+	 F(0.00000000E+00), F(1.56575398E-04), F(3.43256425E-04), F(5.54620202E-04),
+	 F(8.23919506E-04), F(1.13992507E-03), F(1.47640169E-03), F(1.78371725E-03),
+	 F(2.01182542E-03), F(2.10371989E-03), F(1.99454554E-03), F(1.61656283E-03),
+	 F(9.02154502E-04),-F(1.78805361E-04),-F(1.64973098E-03),-F(3.49717454E-03),
+	 F(5.65949473E-03), F(8.02941163E-03), F(1.04584443E-02), F(1.27472335E-02),
+	 F(1.46525263E-02), F(1.59045603E-02), F(1.62208471E-02), F(1.53184106E-02),
+	 F(1.29371806E-02), F(8.85757540E-03), F(2.92408442E-03),-F(4.91578024E-03),
+	-F(1.46404076E-02),-F(2.61098752E-02),-F(3.90751381E-02),-F(5.31873032E-02),
+	 F(6.79989431E-02), F(8.29847578E-02), F(9.75753918E-02), F(1.11196689E-01),
+	 F(1.23264548E-01), F(1.33264415E-01), F(1.40753505E-01), F(1.45389847E-01),
+	 F(1.46955068E-01), F(1.45389847E-01), F(1.40753505E-01), F(1.33264415E-01),
+	 F(1.23264548E-01), F(1.11196689E-01), F(9.75753918E-02), F(8.29847578E-02),
+	-F(6.79989431E-02),-F(5.31873032E-02),-F(3.90751381E-02),-F(2.61098752E-02),
+	-F(1.46404076E-02),-F(4.91578024E-03), F(2.92408442E-03), F(8.85757540E-03),
+	 F(1.29371806E-02), F(1.53184106E-02), F(1.62208471E-02), F(1.59045603E-02),
+	 F(1.46525263E-02), F(1.27472335E-02), F(1.04584443E-02), F(8.02941163E-03),
+	-F(5.65949473E-03),-F(3.49717454E-03),-F(1.64973098E-03),-F(1.78805361E-04),
+	 F(9.02154502E-04), F(1.61656283E-03), F(1.99454554E-03), F(2.10371989E-03),
+	 F(2.01182542E-03), F(1.78371725E-03), F(1.47640169E-03), F(1.13992507E-03),
+	 F(8.23919506E-04), F(5.54620202E-04), F(3.43256425E-04), F(1.56575398E-04),
+};
+#undef F
+
+/*
+ * To produce this cosine matrix in Octave:
+ *
+ * b = zeros(8, 16);
+ * for i = 0:7 for j = 0:15 b(i+1, j+1) = cos( (i + 0.5) * (j - 4) * (pi/8) ) endfor endfor;
+ * printf("%.10f, ", b');
+ *
+ */
+#define SBC_COS_TABLE_FIXED8_SCALE (sizeof(FIXED_T)*CHAR_BIT-1+SBC_FIXED8_EXTRA_BITS)
+#define F(x) (FIXED_T)(FIXED_A)((x)*((FIXED_A)1<<(sizeof(FIXED_T)*CHAR_BIT-1))+0.5)
+static const FIXED_T cos_table_fixed_8[128] = {
+	 F(0.7071067812), F(0.8314696123), F(0.9238795325), F(0.9807852804),
+	 F(1.0000000000), F(0.9807852804), F(0.9238795325), F(0.8314696123),
+	 F(0.7071067812), F(0.5555702330), F(0.3826834324), F(0.1950903220),
+	 F(0.0000000000),-F(0.1950903220),-F(0.3826834324),-F(0.5555702330),
+	-F(0.7071067812),-F(0.1950903220), F(0.3826834324), F(0.8314696123),
+	 F(1.0000000000), F(0.8314696123), F(0.3826834324),-F(0.1950903220),
+	-F(0.7071067812),-F(0.9807852804),-F(0.9238795325),-F(0.5555702330),
+	-F(0.0000000000), F(0.5555702330), F(0.9238795325), F(0.9807852804),
+	-F(0.7071067812),-F(0.9807852804),-F(0.3826834324), F(0.5555702330),
+	 F(1.0000000000), F(0.5555702330),-F(0.3826834324),-F(0.9807852804),
+	-F(0.7071067812), F(0.1950903220), F(0.9238795325), F(0.8314696123),
+	 F(0.0000000000),-F(0.8314696123),-F(0.9238795325),-F(0.1950903220),
+	 F(0.7071067812),-F(0.5555702330),-F(0.9238795325), F(0.1950903220),
+	 F(1.0000000000), F(0.1950903220),-F(0.9238795325),-F(0.5555702330),
+	 F(0.7071067812), F(0.8314696123),-F(0.3826834324),-F(0.9807852804),
+	-F(0.0000000000), F(0.9807852804), F(0.3826834324),-F(0.8314696123),
+	 F(0.7071067812), F(0.5555702330),-F(0.9238795325),-F(0.1950903220),
+	 F(1.0000000000),-F(0.1950903220),-F(0.9238795325), F(0.5555702330),
+	 F(0.7071067812),-F(0.8314696123),-F(0.3826834324), F(0.9807852804),
+	 F(0.0000000000),-F(0.9807852804), F(0.3826834324), F(0.8314696123),
+	-F(0.7071067812), F(0.9807852804),-F(0.3826834324),-F(0.5555702330),
+	 F(1.0000000000),-F(0.5555702330),-F(0.3826834324), F(0.9807852804),
+	-F(0.7071067812),-F(0.1950903220), F(0.9238795325),-F(0.8314696123),
+	-F(0.0000000000), F(0.8314696123),-F(0.9238795325), F(0.1950903220),
+	-F(0.7071067812), F(0.1950903220), F(0.3826834324),-F(0.8314696123),
+	 F(1.0000000000),-F(0.8314696123), F(0.3826834324), F(0.1950903220),
+	-F(0.7071067812), F(0.9807852804),-F(0.9238795325), F(0.5555702330),
+	-F(0.0000000000),-F(0.5555702330), F(0.9238795325),-F(0.9807852804),
+	 F(0.7071067812),-F(0.8314696123), F(0.9238795325),-F(0.9807852804),
+	 F(1.0000000000),-F(0.9807852804), F(0.9238795325),-F(0.8314696123),
+	 F(0.7071067812),-F(0.5555702330), F(0.3826834324),-F(0.1950903220),
+	-F(0.0000000000), F(0.1950903220),-F(0.3826834324), F(0.5555702330),
+};
+#undef F

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-19 22:12             ` Siarhei Siamashka
@ 2008-12-22 23:30               ` Siarhei Siamashka
  2008-12-23  1:00                 ` Marcel Holtmann
  0 siblings, 1 reply; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-22 23:30 UTC (permalink / raw)
  To: ext Brad Midgley; +Cc: Jaska Uimonen, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 1086 bytes --]

On Saturday 20 December 2008 00:12:08 ext Siarhei Siamashka wrote:
[...]
> We had a talk with Jaska Uimonen here, and now I'm kind of delegated to
> finish the work on this filtering function for SBC encoder (including the
> final addition of ARM assembly optimizations).  He provided me with his
> last variant of code, which contains some more optimizations to reduce the
> number of operations and also loops unrolling. I will add his changes to
> the patch on next iteration.
>
> Now the question is how to best integrate a fixed filtering function to git
> repository? If I just continue adding changes to the patch in order to make
> it a faster, it will be also not so obvious to see how we got to these code
> transformations just from the commit log.

Next iteration done.  Added support for 4 subbands, number of arithmetic
operations reduced (but without loop unrolling for better code readability),
precision improved for both 16-bit and 32-bit fixed point, 'neginv' macro is
now more portable and faster. The rest is in the code comments.


Best regards,
Siarhei Siamashka

[-- Attachment #2: sbc_analyze_modified.diff --]
[-- Type: text/x-diff, Size: 26435 bytes --]

diff --git a/sbc/sbc.c b/sbc/sbc.c
index 5411893..3d6a412 100644
--- a/sbc/sbc.c
+++ b/sbc/sbc.c
@@ -40,6 +40,7 @@
 #include <string.h>
 #include <stdlib.h>
 #include <sys/types.h>
+#include <limits.h>
 
 #include "sbc_math.h"
 #include "sbc_tables.h"
@@ -93,7 +94,7 @@ struct sbc_decoder_state {
 struct sbc_encoder_state {
 	int subbands;
 	int position[2];
-	int32_t X[2][160];
+	int16_t X[2][160];
 };
 
 /*
@@ -656,75 +657,56 @@ static void sbc_encoder_init(struct sbc_encoder_state *state,
 	state->position[0] = state->position[1] = 9 * frame->subbands;
 }
 
-static inline void _sbc_analyze_four(const int32_t *in, int32_t *out)
+static inline void _sbc_analyze_four(const int16_t *in, int32_t *out)
 {
-	sbc_fixed_t t[8], s[5];
-
-	t[0] = SCALE4_STAGE1( /* Q8 */
-		MULA(_sbc_proto_4[0], in[8] - in[32], /* Q18 */
-		MUL( _sbc_proto_4[1], in[16] - in[24])));
-
-	t[1] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[2], in[1],
-		MULA(_sbc_proto_4[3], in[9],
-		MULA(_sbc_proto_4[4], in[17],
-		MULA(_sbc_proto_4[5], in[25],
-		MUL( _sbc_proto_4[6], in[33]))))));
-
-	t[2] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[7], in[2],
-		MULA(_sbc_proto_4[8], in[10],
-		MULA(_sbc_proto_4[9], in[18],
-		MULA(_sbc_proto_4[10], in[26],
-		MUL( _sbc_proto_4[11], in[34]))))));
-
-	t[3] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[12], in[3],
-		MULA(_sbc_proto_4[13], in[11],
-		MULA(_sbc_proto_4[14], in[19],
-		MULA(_sbc_proto_4[15], in[27],
-		MUL( _sbc_proto_4[16], in[35]))))));
-
-	t[4] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[17], in[4] + in[36],
-		MULA(_sbc_proto_4[18], in[12] + in[28],
-		MUL( _sbc_proto_4[19], in[20]))));
-
-	t[5] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[16], in[5],
-		MULA(_sbc_proto_4[15], in[13],
-		MULA(_sbc_proto_4[14], in[21],
-		MULA(_sbc_proto_4[13], in[29],
-		MUL( _sbc_proto_4[12], in[37]))))));
-
-	/* don't compute t[6]... this term always multiplies
-	 * with cos(pi/2) = 0 */
-
-	t[7] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[6], in[7],
-		MULA(_sbc_proto_4[5], in[15],
-		MULA(_sbc_proto_4[4], in[23],
-		MULA(_sbc_proto_4[3], in[31],
-		MUL( _sbc_proto_4[2], in[39]))))));
-
-	s[0] = MUL( _anamatrix4[0], t[0] + t[4]);
-	s[1] = MUL( _anamatrix4[2], t[2]);
-	s[2] = MULA(_anamatrix4[1], t[1] + t[3],
-		MUL(_anamatrix4[3], t[5]));
-	s[3] = MULA(_anamatrix4[3], t[1] + t[3],
-		MUL(_anamatrix4[1], -t[5] + t[7]));
-	s[4] = MUL( _anamatrix4[3], t[7]);
-
-	out[0] = SCALE4_STAGE2( s[0] + s[1] + s[2] + s[4]); /* Q0 */
-	out[1] = SCALE4_STAGE2(-s[0] + s[1] + s[3]);
-	out[2] = SCALE4_STAGE2(-s[0] + s[1] - s[3]);
-	out[3] = SCALE4_STAGE2( s[0] + s[1] - s[2] - s[4]);
+	FIXED_A t1[4];
+	FIXED_T t2[4];
+	int i = 0, hop = 0;
+
+	/* rounding coefficient */
+
+	t1[0] = t1[1] = t1[2] = t1[3] =
+		(FIXED_A)1 << (SBC_PROTO_FIXED4_SCALE-1);
+
+	/* low pass polyphase filter */
+	for (hop = 0; hop < 40; hop += 8) {
+		t1[0] +=  (FIXED_A)in[hop] * _sbc_proto_fixed4[hop];
+		t1[1] +=  (FIXED_A)in[hop + 1] * _sbc_proto_fixed4[hop + 1];
+		t1[2] +=  (FIXED_A)in[hop + 2] * _sbc_proto_fixed4[hop + 2];
+		t1[1] +=  (FIXED_A)in[hop + 3] * _sbc_proto_fixed4[hop + 3];
+		t1[0] +=  (FIXED_A)in[hop + 4] * _sbc_proto_fixed4[hop + 4];
+		t1[3] +=  (FIXED_A)in[hop + 5] * _sbc_proto_fixed4[hop + 5];
+		t1[3] +=  (FIXED_A)in[hop + 7] * _sbc_proto_fixed4[hop + 7];
+	}
+
+	/* scaling */
+	t2[0] = t1[0] >> SBC_PROTO_FIXED4_SCALE;
+	t2[1] = t1[1] >> SBC_PROTO_FIXED4_SCALE;
+	t2[2] = t1[2] >> SBC_PROTO_FIXED4_SCALE;
+	t2[3] = t1[3] >> SBC_PROTO_FIXED4_SCALE;
+
+	/* do the cos transform */
+	for (i = 0, hop = 0; i < 4; hop += 8, i++) {
+		/* rounding coefficient */
+		t1[i]  = (FIXED_A)1 << (SBC_COS_TABLE_FIXED4_SCALE-1-SCALE_OUT_BITS);
+		
+		t1[i] += (FIXED_A)t2[0] * cos_table_fixed_4[0 + hop];
+		t1[i] += (FIXED_A)t2[1] * cos_table_fixed_4[1 + hop];
+		t1[i] += (FIXED_A)t2[2] * cos_table_fixed_4[2 + hop];
+		t1[i] += (FIXED_A)t2[3] * cos_table_fixed_4[5 + hop];
+	}
+
+	/* scaling */
+	out[0] = t1[0] >> (SBC_COS_TABLE_FIXED4_SCALE-SCALE_OUT_BITS);
+	out[1] = t1[1] >> (SBC_COS_TABLE_FIXED4_SCALE-SCALE_OUT_BITS);
+	out[2] = t1[2] >> (SBC_COS_TABLE_FIXED4_SCALE-SCALE_OUT_BITS);
+	out[3] = t1[3] >> (SBC_COS_TABLE_FIXED4_SCALE-SCALE_OUT_BITS);
 }
 
 static inline void sbc_analyze_four(struct sbc_encoder_state *state,
 				struct sbc_frame *frame, int ch, int blk)
 {
-	int32_t *x = &state->X[ch][state->position[ch]];
+	int16_t *x = &state->X[ch][state->position[ch]];
 	int16_t *pcm = &frame->pcm_sample[ch][blk * 4];
 
 	/* Input 4 Audio Samples */
@@ -740,133 +722,80 @@ static inline void sbc_analyze_four(struct sbc_encoder_state *state,
 		state->position[ch] = 36;
 }
 
-static inline void _sbc_analyze_eight(const int32_t *in, int32_t *out)
+static inline void _sbc_analyze_eight(const int16_t *in, int32_t *out)
 {
-	sbc_fixed_t t[8], s[8];
-
-	t[0] = SCALE8_STAGE1( /* Q10 */
-		MULA(_sbc_proto_8[0], (in[16] - in[64]), /* Q18 = Q18 * Q0 */
-		MULA(_sbc_proto_8[1], (in[32] - in[48]),
-		MULA(_sbc_proto_8[2], in[4],
-		MULA(_sbc_proto_8[3], in[20],
-		MULA(_sbc_proto_8[4], in[36],
-		MUL( _sbc_proto_8[5], in[52])))))));
-
-	t[1] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[6], in[2],
-		MULA(_sbc_proto_8[7], in[18],
-		MULA(_sbc_proto_8[8], in[34],
-		MULA(_sbc_proto_8[9], in[50],
-		MUL(_sbc_proto_8[10], in[66]))))));
-
-	t[2] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[11], in[1],
-		MULA(_sbc_proto_8[12], in[17],
-		MULA(_sbc_proto_8[13], in[33],
-		MULA(_sbc_proto_8[14], in[49],
-		MULA(_sbc_proto_8[15], in[65],
-		MULA(_sbc_proto_8[16], in[3],
-		MULA(_sbc_proto_8[17], in[19],
-		MULA(_sbc_proto_8[18], in[35],
-		MULA(_sbc_proto_8[19], in[51],
-		MUL( _sbc_proto_8[20], in[67])))))))))));
-
-	t[3] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[21], in[5],
-		MULA( _sbc_proto_8[22], in[21],
-		MULA( _sbc_proto_8[23], in[37],
-		MULA( _sbc_proto_8[24], in[53],
-		MULA( _sbc_proto_8[25], in[69],
-		MULA(-_sbc_proto_8[15], in[15],
-		MULA(-_sbc_proto_8[14], in[31],
-		MULA(-_sbc_proto_8[13], in[47],
-		MULA(-_sbc_proto_8[12], in[63],
-		MUL( -_sbc_proto_8[11], in[79])))))))))));
-
-	t[4] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[26], in[6],
-		MULA( _sbc_proto_8[27], in[22],
-		MULA( _sbc_proto_8[28], in[38],
-		MULA( _sbc_proto_8[29], in[54],
-		MULA( _sbc_proto_8[30], in[70],
-		MULA(-_sbc_proto_8[10], in[14],
-		MULA(-_sbc_proto_8[9], in[30],
-		MULA(-_sbc_proto_8[8], in[46],
-		MULA(-_sbc_proto_8[7], in[62],
-		MUL( -_sbc_proto_8[6], in[78])))))))))));
-
-	t[5] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[31], in[7],
-		MULA( _sbc_proto_8[32], in[23],
-		MULA( _sbc_proto_8[33], in[39],
-		MULA( _sbc_proto_8[34], in[55],
-		MULA( _sbc_proto_8[35], in[71],
-		MULA(-_sbc_proto_8[20], in[13],
-		MULA(-_sbc_proto_8[19], in[29],
-		MULA(-_sbc_proto_8[18], in[45],
-		MULA(-_sbc_proto_8[17], in[61],
-		MUL( -_sbc_proto_8[16], in[77])))))))))));
-
-	t[6] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[36], (in[8] + in[72]),
-		MULA( _sbc_proto_8[37], (in[24] + in[56]),
-		MULA( _sbc_proto_8[38], in[40],
-		MULA(-_sbc_proto_8[39], in[12],
-		MULA(-_sbc_proto_8[5], in[28],
-		MULA(-_sbc_proto_8[4], in[44],
-		MULA(-_sbc_proto_8[3], in[60],
-		MUL( -_sbc_proto_8[2], in[76])))))))));
-
-	t[7] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[35], in[9],
-		MULA( _sbc_proto_8[34], in[25],
-		MULA( _sbc_proto_8[33], in[41],
-		MULA( _sbc_proto_8[32], in[57],
-		MULA( _sbc_proto_8[31], in[73],
-		MULA(-_sbc_proto_8[25], in[11],
-		MULA(-_sbc_proto_8[24], in[27],
-		MULA(-_sbc_proto_8[23], in[43],
-		MULA(-_sbc_proto_8[22], in[59],
-		MUL( -_sbc_proto_8[21], in[75])))))))))));
-
-	s[0] = MULA(  _anamatrix8[0], t[0],
-		MUL(  _anamatrix8[1], t[6]));
-	s[1] = MUL(   _anamatrix8[7], t[1]);
-	s[2] = MULA(  _anamatrix8[2], t[2],
-		MULA( _anamatrix8[3], t[3],
-		MULA( _anamatrix8[4], t[5],
-		MUL(  _anamatrix8[5], t[7]))));
-	s[3] = MUL(   _anamatrix8[6], t[4]);
-	s[4] = MULA(  _anamatrix8[3], t[2],
-		MULA(-_anamatrix8[5], t[3],
-		MULA(-_anamatrix8[2], t[5],
-		MUL( -_anamatrix8[4], t[7]))));
-	s[5] = MULA(  _anamatrix8[4], t[2],
-		MULA(-_anamatrix8[2], t[3],
-		MULA( _anamatrix8[5], t[5],
-		MUL(  _anamatrix8[3], t[7]))));
-	s[6] = MULA(  _anamatrix8[1], t[0],
-		MUL( -_anamatrix8[0], t[6]));
-	s[7] = MULA(  _anamatrix8[5], t[2],
-		MULA(-_anamatrix8[4], t[3],
-		MULA( _anamatrix8[3], t[5],
-		MUL( -_anamatrix8[2], t[7]))));
-
-	out[0] = SCALE8_STAGE2( s[0] + s[1] + s[2] + s[3]);
-	out[1] = SCALE8_STAGE2( s[1] - s[3] + s[4] + s[6]);
-	out[2] = SCALE8_STAGE2( s[1] - s[3] + s[5] - s[6]);
-	out[3] = SCALE8_STAGE2(-s[0] + s[1] + s[3] + s[7]);
-	out[4] = SCALE8_STAGE2(-s[0] + s[1] + s[3] - s[7]);
-	out[5] = SCALE8_STAGE2( s[1] - s[3] - s[5] - s[6]);
-	out[6] = SCALE8_STAGE2( s[1] - s[3] - s[4] + s[6]);
-	out[7] = SCALE8_STAGE2( s[0] + s[1] - s[2] + s[3]);
+	FIXED_A t1[8];
+	FIXED_T t2[8];
+	int i, hop;
+
+	/* rounding coefficient */
+	t1[0] = t1[1] = t1[2] = t1[3] = t1[4] = t1[5] = t1[6] = t1[7] =
+		(FIXED_A)1 << (SBC_PROTO_FIXED8_SCALE-1);
+
+	/* low pass polyphase filter */
+	for (hop = 0; hop < 80; hop += 16) {
+		t1[0] += (FIXED_A)in[hop] * _sbc_proto_fixed8[hop];
+		t1[1] += (FIXED_A)in[hop + 1] * _sbc_proto_fixed8[hop + 1];
+		t1[2] += (FIXED_A)in[hop + 2] * _sbc_proto_fixed8[hop + 2];
+		t1[3] += (FIXED_A)in[hop + 3] * _sbc_proto_fixed8[hop + 3];
+
+		t1[4] += (FIXED_A)in[hop + 4] * _sbc_proto_fixed8[hop + 4];
+
+		t1[0] += (FIXED_A)in[hop + 8] * _sbc_proto_fixed8[hop + 8];
+		t1[1] += (FIXED_A)in[hop + 7] * _sbc_proto_fixed8[hop + 7];
+		t1[2] += (FIXED_A)in[hop + 6] * _sbc_proto_fixed8[hop + 6];
+		t1[3] += (FIXED_A)in[hop + 5] * _sbc_proto_fixed8[hop + 5];
+
+		t1[5] += (FIXED_A)in[hop + 9] * _sbc_proto_fixed8[hop + 9];
+		t1[6] += (FIXED_A)in[hop + 10] * _sbc_proto_fixed8[hop + 10];
+		t1[7] += (FIXED_A)in[hop + 11] * _sbc_proto_fixed8[hop + 11];
+
+		t1[5] += (FIXED_A)in[hop + 15] * _sbc_proto_fixed8[hop + 15];
+		t1[6] += (FIXED_A)in[hop + 14] * _sbc_proto_fixed8[hop + 14];
+		t1[7] += (FIXED_A)in[hop + 13] * _sbc_proto_fixed8[hop + 13];
+	}
+
+	/* scaling */
+	t2[0] = t1[0] >> SBC_PROTO_FIXED8_SCALE;
+	t2[1] = t1[1] >> SBC_PROTO_FIXED8_SCALE;
+	t2[2] = t1[2] >> SBC_PROTO_FIXED8_SCALE;
+	t2[3] = t1[3] >> SBC_PROTO_FIXED8_SCALE;
+	t2[4] = t1[4] >> SBC_PROTO_FIXED8_SCALE;
+	t2[5] = t1[5] >> SBC_PROTO_FIXED8_SCALE;
+	t2[6] = t1[6] >> SBC_PROTO_FIXED8_SCALE;
+	t2[7] = t1[7] >> SBC_PROTO_FIXED8_SCALE;
+
+	/* do the cos transform */
+	for (i = 0, hop = 0; i < 8; hop += 16, i++) {
+		/* rounding coefficient */
+		t1[i] = (FIXED_A)1 << (SBC_COS_TABLE_FIXED8_SCALE-1-SCALE_OUT_BITS);
+
+		t1[i] += (FIXED_A)t2[0] * cos_table_fixed_8[0 + hop];
+		t1[i] += (FIXED_A)t2[1] * cos_table_fixed_8[1 + hop];
+		t1[i] += (FIXED_A)t2[2] * cos_table_fixed_8[2 + hop];
+		t1[i] += (FIXED_A)t2[3] * cos_table_fixed_8[3 + hop];
+		t1[i] += (FIXED_A)t2[4] * cos_table_fixed_8[4 + hop];
+		t1[i] += (FIXED_A)t2[5] * cos_table_fixed_8[9 + hop];
+		t1[i] += (FIXED_A)t2[6] * cos_table_fixed_8[10 + hop];
+		t1[i] += (FIXED_A)t2[7] * cos_table_fixed_8[11 + hop];
+	}
+
+	/* scaling */
+	out[0] = t1[0] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[1] = t1[1] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[2] = t1[2] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[3] = t1[3] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[4] = t1[4] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[5] = t1[5] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[6] = t1[6] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
+	out[7] = t1[7] >> (SBC_COS_TABLE_FIXED8_SCALE-SCALE_OUT_BITS);
 }
 
 static inline void sbc_analyze_eight(struct sbc_encoder_state *state,
 					struct sbc_frame *frame, int ch,
 					int blk)
 {
-	int32_t *x = &state->X[ch][state->position[ch]];
+	int16_t *x = &state->X[ch][state->position[ch]];
 	int16_t *pcm = &frame->pcm_sample[ch][blk * 8];
 
 	/* Input 8 Audio Samples */
@@ -1006,7 +935,7 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 			frame->scale_factor[ch][sb] = 0;
 			scalefactor[ch][sb] = 2;
 			for (blk = 0; blk < frame->blocks; blk++) {
-				while (scalefactor[ch][sb] < fabs(frame->sb_sample_f[blk][ch][sb])) {
+				while ((scalefactor[ch][sb] << SCALE_OUT_BITS) <= neginv(frame->sb_sample_f[blk][ch][sb])) {
 					frame->scale_factor[ch][sb]++;
 					scalefactor[ch][sb] *= 2;
 				}
@@ -1040,11 +969,11 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 						frame->sb_sample_f[blk][1][sb]) >> 1;
 
 				/* calculate scale_factor_j and scalefactor_j for joint case */
-				while (scalefactor_j[0] < fabs(sb_sample_j[blk][0])) {
+				while ((scalefactor_j[0] << SCALE_OUT_BITS) <= neginv(sb_sample_j[blk][0])) {
 					scale_factor_j[0]++;
 					scalefactor_j[0] *= 2;
 				}
-				while (scalefactor_j[1] < fabs(sb_sample_j[blk][1])) {
+				while ((scalefactor_j[1] << SCALE_OUT_BITS) <= neginv(sb_sample_j[blk][1])) {
 					scale_factor_j[1]++;
 					scalefactor_j[1] *= 2;
 				}
@@ -1100,11 +1029,11 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 		for (ch = 0; ch < frame->channels; ch++) {
 			for (sb = 0; sb < frame->subbands; sb++) {
 				if (levels[ch][sb] > 0) {
-					audio_sample =
-						(uint16_t) (((((int64_t)frame->sb_sample_f[blk][ch][sb]*levels[ch][sb]) >>
-									(frame->scale_factor[ch][sb] + 1)) +
-								levels[ch][sb]) >> 1);
-					PUT_BITS(audio_sample & levels[ch][sb], bits[ch][sb]);
+					int32_t sample = frame->sb_sample_f[blk][ch][sb];
+					int32_t s_shift = (frame->scale_factor[ch][sb] + 1 + SCALE_OUT_BITS);
+					int32_t ls = levels[ch][sb];
+					audio_sample = ((((int64_t)1 << s_shift) + sample) * ls) >> (s_shift + 1);
+					PUT_BITS(audio_sample, bits[ch][sb]);
 				}
 			}
 		}
diff --git a/sbc/sbc_math.h b/sbc/sbc_math.h
index b3d87a6..f3937ce 100644
--- a/sbc/sbc_math.h
+++ b/sbc/sbc_math.h
@@ -23,37 +23,30 @@
  *
  */
 
-#define fabs(x) ((x) < 0 ? -(x) : (x))
 /* C does not provide an explicit arithmetic shift right but this will
    always be correct and every compiler *should* generate optimal code */
 #define ASR(val, bits) ((-2 >> 1 == -1) ? \
 		 ((int32_t)(val)) >> (bits) : ((int32_t) (val)) / (1 << (bits)))
 
-#define SCALE_PROTO4_TBL	15
-#define SCALE_ANA4_TBL		17
-#define SCALE_PROTO8_TBL	16
-#define SCALE_ANA8_TBL		17
+#define neginv(x) ((-2 >> 1 == -1) ? \
+		 ((((int32_t)(x)) >> 31) ^ (int32_t)(x)) : \
+		 ((x >= 0) ? (x) : -(x)-1))
+
+#define SCALE_OUT_BITS 14
+
 #define SCALE_SPROTO4_TBL	12
 #define SCALE_SPROTO8_TBL	14
 #define SCALE_NPROTO4_TBL	11
 #define SCALE_NPROTO8_TBL	11
-#define SCALE4_STAGE1_BITS	15
-#define SCALE4_STAGE2_BITS	16
 #define SCALE4_STAGED1_BITS	15
 #define SCALE4_STAGED2_BITS	16
-#define SCALE8_STAGE1_BITS	15
-#define SCALE8_STAGE2_BITS	15
 #define SCALE8_STAGED1_BITS	15
 #define SCALE8_STAGED2_BITS	16
 
 typedef int32_t sbc_fixed_t;
 
-#define SCALE4_STAGE1(src)  ASR(src, SCALE4_STAGE1_BITS)
-#define SCALE4_STAGE2(src)  ASR(src, SCALE4_STAGE2_BITS)
 #define SCALE4_STAGED1(src) ASR(src, SCALE4_STAGED1_BITS)
 #define SCALE4_STAGED2(src) ASR(src, SCALE4_STAGED2_BITS)
-#define SCALE8_STAGE1(src)  ASR(src, SCALE8_STAGE1_BITS)
-#define SCALE8_STAGE2(src)  ASR(src, SCALE8_STAGE2_BITS)
 #define SCALE8_STAGED1(src) ASR(src, SCALE8_STAGED1_BITS)
 #define SCALE8_STAGED2(src) ASR(src, SCALE8_STAGED2_BITS)
 
diff --git a/sbc/sbc_tables.h b/sbc/sbc_tables.h
index f5daaa7..9c96732 100644
--- a/sbc/sbc_tables.h
+++ b/sbc/sbc_tables.h
@@ -40,40 +40,11 @@ static const int sbc_offset8[4][8] = {
 };
 
 
-#define SP4(val) (((int32_t)(val))/17658) /* Used to be #define SP4(val) ASR(val, SCALE_PROTO4_TBL) but causes wrong gain */
-#define SA4(val) ASR(val, SCALE_ANA4_TBL)
-#define SP8(val) (((int32_t)(val))/57740) /* Used to be #define SP8(val) ASR(val, SCALE_PROTO8_TBL) but causes wrong gain */
-#define SA8(val) ASR(val, SCALE_ANA8_TBL)
 #define SS4(val) ASR(val, SCALE_SPROTO4_TBL)
 #define SS8(val) ASR(val, SCALE_SPROTO8_TBL)
 #define SN4(val) ASR(val, SCALE_NPROTO4_TBL)
 #define SN8(val) ASR(val, SCALE_NPROTO8_TBL)
 
-static const int32_t _sbc_proto_4[20] = {
-	SP4(0x02cb3e8c), SP4(0x22b63dc0), SP4(0x002329cc), SP4(0x053b7548),
-	SP4(0x31eab940), SP4(0xec1f5e60), SP4(0xff3773a8), SP4(0x0061c5a7),
-	SP4(0x07646680), SP4(0x3f239480), SP4(0xf89f23a8), SP4(0x007a4737),
-	SP4(0x00b32807), SP4(0x083ddc80), SP4(0x4825e480), SP4(0x0191e578),
-	SP4(0x00ff11ca), SP4(0x00fb7991), SP4(0x069fdc58), SP4(0x4b584000)
-};
-
-static const int32_t _anamatrix4[4] = {
-	SA4(0x2d413cc0), SA4(0x3b20d780), SA4(0x40000000), SA4(0x187de2a0)
-};
-
-static const int32_t _sbc_proto_8[40] = {
-	SP8(0x02e5cd20), SP8(0x22d0c200), SP8(0x006bfe27), SP8(0x07808930),
-	SP8(0x3f1c8800), SP8(0xf8810d70), SP8(0x002cfdc6), SP8(0x055acf28),
-	SP8(0x31f566c0), SP8(0xebfe57e0), SP8(0xff27c437), SP8(0x001485cc),
-	SP8(0x041c6e58), SP8(0x2a7cfa80), SP8(0xe4c4a240), SP8(0xfe359e4c),
-	SP8(0x0048b1f8), SP8(0x0686ce30), SP8(0x38eec5c0), SP8(0xf2a1b9f0),
-	SP8(0xffe8904a), SP8(0x0095698a), SP8(0x0824a480), SP8(0x443b3c00),
-	SP8(0xfd7badc8), SP8(0x00d3e2d9), SP8(0x00c183d2), SP8(0x084e1950),
-	SP8(0x4810d800), SP8(0x017f43fe), SP8(0x01056dd8), SP8(0x00e9cb9f),
-	SP8(0x07d7d090), SP8(0x4a708980), SP8(0x0488fae8), SP8(0x0113bd20),
-	SP8(0x0107b1a8), SP8(0x069fb3c0), SP8(0x4b3db200), SP8(0x00763f48)
-};
-
 static const int32_t sbc_proto_4_40m0[] = {
 	SS4(0x00000000), SS4(0xffa6982f), SS4(0xfba93848), SS4(0x0456c7b8),
 	SS4(0x005967d1), SS4(0xfffb9ac7), SS4(0xff589157), SS4(0xf9c2a8d8),
@@ -116,11 +87,6 @@ static const int32_t sbc_proto_8_80m1[] = {
 	SS8(0x0d9daee0), SS8(0xeac182c0), SS8(0xfdf1c8d4), SS8(0xfff5bd1a)
 };
 
-static const int32_t _anamatrix8[8] = {
-	SA8(0x3b20d780), SA8(0x187de2a0), SA8(0x3ec52f80), SA8(0x3536cc40),
-	SA8(0x238e7680), SA8(0x0c7c5c20), SA8(0x2d413cc0), SA8(0x40000000)
-};
-
 static const int32_t synmatrix4[8][4] = {
 	{ SN4(0x05a82798), SN4(0xfa57d868), SN4(0xfa57d868), SN4(0x05a82798) },
 	{ SN4(0x030fbc54), SN4(0xf89be510), SN4(0x07641af0), SN4(0xfcf043ac) },
@@ -166,3 +132,169 @@ static const int32_t synmatrix8[16][8] = {
 	{ SN8(0xf9592678), SN8(0x018f8b84), SN8(0x07d8a5f0), SN8(0x0471ced0),
 	  SN8(0xfb8e3130), SN8(0xf8275a10), SN8(0xfe70747c), SN8(0x06a6d988) }
 };
+
+//#define SBC_HIGH_PRECISION
+
+#ifdef SBC_HIGH_PRECISION
+# define FIXED_A int64_t /* data type for fixed point accumulator */
+# define FIXED_T int32_t /* data type for fixed point constants */
+# define SBC_FIXED8_EXTRA_BITS 16
+#else
+# define FIXED_A int32_t /* data type for fixed point accumulator */
+# define FIXED_T int16_t /* data type for fixed point constants */
+# define SBC_FIXED8_EXTRA_BITS 0
+#endif
+
+/* A2DP specification: Section 12.8 Tables 
+ *
+ * Original values are premultiplied by 4 for better precision (that is the
+ * maximum which is possible without overflows)
+ *
+ * Note: in each block of 16 numbers sign was changed for elements 4, 13, 14, 15
+ * in order to compensate the same change applied to cos_table_fixed_8
+ */
+#define SBC_PROTO_FIXED8_SCALE (sizeof(FIXED_T)*CHAR_BIT-1-SBC_FIXED8_EXTRA_BITS+2)
+#define F(x) (FIXED_A)((x*4)*((FIXED_A)1<<(sizeof(FIXED_T)*CHAR_BIT-1))+0.5)
+static const FIXED_T _sbc_proto_fixed8[80] = {
+	 F(0.00000000E+00), F(1.56575398E-04), F(3.43256425E-04), F(5.54620202E-04),
+	-F(8.23919506E-04), F(1.13992507E-03), F(1.47640169E-03), F(1.78371725E-03),
+	 F(2.01182542E-03), F(2.10371989E-03), F(1.99454554E-03), F(1.61656283E-03),
+	 F(9.02154502E-04), F(1.78805361E-04), F(1.64973098E-03), F(3.49717454E-03),
+
+	 F(5.65949473E-03), F(8.02941163E-03), F(1.04584443E-02), F(1.27472335E-02),
+	-F(1.46525263E-02), F(1.59045603E-02), F(1.62208471E-02), F(1.53184106E-02),
+	 F(1.29371806E-02), F(8.85757540E-03), F(2.92408442E-03),-F(4.91578024E-03),
+	-F(1.46404076E-02), F(2.61098752E-02), F(3.90751381E-02), F(5.31873032E-02),
+
+	 F(6.79989431E-02), F(8.29847578E-02), F(9.75753918E-02), F(1.11196689E-01),
+	-F(1.23264548E-01), F(1.33264415E-01), F(1.40753505E-01), F(1.45389847E-01),
+	 F(1.46955068E-01), F(1.45389847E-01), F(1.40753505E-01), F(1.33264415E-01),
+	 F(1.23264548E-01),-F(1.11196689E-01),-F(9.75753918E-02),-F(8.29847578E-02),
+
+	-F(6.79989431E-02),-F(5.31873032E-02),-F(3.90751381E-02),-F(2.61098752E-02),
+	 F(1.46404076E-02),-F(4.91578024E-03), F(2.92408442E-03), F(8.85757540E-03),
+	 F(1.29371806E-02), F(1.53184106E-02), F(1.62208471E-02), F(1.59045603E-02),
+	 F(1.46525263E-02),-F(1.27472335E-02),-F(1.04584443E-02),-F(8.02941163E-03),
+
+	-F(5.65949473E-03),-F(3.49717454E-03),-F(1.64973098E-03),-F(1.78805361E-04),
+	-F(9.02154502E-04), F(1.61656283E-03), F(1.99454554E-03), F(2.10371989E-03),
+	 F(2.01182542E-03), F(1.78371725E-03), F(1.47640169E-03), F(1.13992507E-03),
+	 F(8.23919506E-04),-F(5.54620202E-04),-F(3.43256425E-04),-F(1.56575398E-04),
+};
+#undef F
+
+/*
+ * To produce this cosine matrix in Octave:
+ *
+ * b = zeros(8, 16);
+ * for i = 0:7 for j = 0:15 b(i+1, j+1) = cos( (i + 0.5) * (j - 4) * (pi/8) ) endfor endfor;
+ * printf("%.10f, ", b');
+ *
+ * Note: in each block of 16 numbers sign was changed for elements 4, 13, 14, 15.
+ * Change of sign for element 4 allows to replace constant 1.0 (not representable
+ * in Q15 format) with -1.0 (fine with Q15).
+ * Changed signs for elements 13, 14, 15 allow to have more similar constants
+ * and simplify subband filter function code.
+ */
+#define SBC_COS_TABLE_FIXED8_SCALE (sizeof(FIXED_T)*CHAR_BIT-1+SBC_FIXED8_EXTRA_BITS)
+#define F(x) (FIXED_A)((x)*((FIXED_A)1<<(sizeof(FIXED_T)*CHAR_BIT-1))+0.5)
+static const FIXED_T cos_table_fixed_8[128] = {
+	 F(0.7071067812), F(0.8314696123), F(0.9238795325), F(0.9807852804),
+	-F(1.0000000000), F(0.9807852804), F(0.9238795325), F(0.8314696123),
+	 F(0.7071067812), F(0.5555702330), F(0.3826834324), F(0.1950903220),
+	 F(0.0000000000), F(0.1950903220), F(0.3826834324), F(0.5555702330),
+
+	-F(0.7071067812),-F(0.1950903220), F(0.3826834324), F(0.8314696123),
+	-F(1.0000000000), F(0.8314696123), F(0.3826834324),-F(0.1950903220),
+	-F(0.7071067812),-F(0.9807852804),-F(0.9238795325),-F(0.5555702330),
+	-F(0.0000000000),-F(0.5555702330),-F(0.9238795325),-F(0.9807852804),
+
+	-F(0.7071067812),-F(0.9807852804),-F(0.3826834324), F(0.5555702330),
+	-F(1.0000000000), F(0.5555702330),-F(0.3826834324),-F(0.9807852804),
+	-F(0.7071067812), F(0.1950903220), F(0.9238795325), F(0.8314696123),
+	 F(0.0000000000), F(0.8314696123), F(0.9238795325), F(0.1950903220),
+
+	 F(0.7071067812),-F(0.5555702330),-F(0.9238795325), F(0.1950903220),
+	-F(1.0000000000), F(0.1950903220),-F(0.9238795325),-F(0.5555702330),
+	 F(0.7071067812), F(0.8314696123),-F(0.3826834324),-F(0.9807852804),
+	-F(0.0000000000),-F(0.9807852804),-F(0.3826834324), F(0.8314696123),
+
+	 F(0.7071067812), F(0.5555702330),-F(0.9238795325),-F(0.1950903220),
+	-F(1.0000000000),-F(0.1950903220),-F(0.9238795325), F(0.5555702330),
+	 F(0.7071067812),-F(0.8314696123),-F(0.3826834324), F(0.9807852804),
+	 F(0.0000000000), F(0.9807852804),-F(0.3826834324),-F(0.8314696123),
+
+	-F(0.7071067812), F(0.9807852804),-F(0.3826834324),-F(0.5555702330),
+	-F(1.0000000000),-F(0.5555702330),-F(0.3826834324), F(0.9807852804),
+	-F(0.7071067812),-F(0.1950903220), F(0.9238795325),-F(0.8314696123),
+	-F(0.0000000000),-F(0.8314696123), F(0.9238795325),-F(0.1950903220),
+
+	-F(0.7071067812), F(0.1950903220), F(0.3826834324),-F(0.8314696123),
+	-F(1.0000000000),-F(0.8314696123), F(0.3826834324), F(0.1950903220),
+	-F(0.7071067812), F(0.9807852804),-F(0.9238795325), F(0.5555702330),
+	-F(0.0000000000), F(0.5555702330),-F(0.9238795325), F(0.9807852804),
+
+	 F(0.7071067812),-F(0.8314696123), F(0.9238795325),-F(0.9807852804),
+	-F(1.0000000000),-F(0.9807852804), F(0.9238795325),-F(0.8314696123),
+	 F(0.7071067812),-F(0.5555702330), F(0.3826834324),-F(0.1950903220),
+	-F(0.0000000000),-F(0.1950903220), F(0.3826834324),-F(0.5555702330),
+};
+#undef F
+
+/* A2DP specification: Section 12.8 Tables
+ *
+ * Original values are premultiplied by 2 for better precision (that is the
+ * maximum which is possible without overflows)
+ *
+ * Note: in each block of 8 numbers sign was changed for elements 2 and 7
+ * in order to compensate the same change applied to cos_table_fixed_4
+ */
+#define SBC_PROTO_FIXED4_SCALE (sizeof(FIXED_T)*CHAR_BIT-1-SBC_FIXED8_EXTRA_BITS+1)
+#define F(x) (FIXED_A)((x*2)*((FIXED_A)1<<(sizeof(FIXED_T)*CHAR_BIT-1))+0.5)
+static const FIXED_T _sbc_proto_fixed4[40] = {
+	 F(0.00000000E+00), F(5.36548976E-04),-F(1.49188357E-03), F(2.73370904E-03),
+	 F(3.83720193E-03), F(3.89205149E-03), F(1.86581691E-03), F(3.06012286E-03),
+
+	 F(1.09137620E-02), F(2.04385087E-02),-F(2.88757392E-02), F(3.21939290E-02),
+	 F(2.58767811E-02), F(6.13245186E-03),-F(2.88217274E-02), F(7.76463494E-02),
+
+	 F(1.35593274E-01), F(1.94987841E-01),-F(2.46636662E-01), F(2.81828203E-01),
+	 F(2.94315332E-01), F(2.81828203E-01), F(2.46636662E-01),-F(1.94987841E-01),
+
+	-F(1.35593274E-01),-F(7.76463494E-02), F(2.88217274E-02), F(6.13245186E-03),
+	 F(2.58767811E-02), F(3.21939290E-02), F(2.88757392E-02),-F(2.04385087E-02),
+
+	-F(1.09137620E-02),-F(3.06012286E-03),-F(1.86581691E-03), F(3.89205149E-03),
+	 F(3.83720193E-03), F(2.73370904E-03), F(1.49188357E-03),-F(5.36548976E-04),
+};
+#undef F
+
+/*
+ * To produce this cosine matrix in Octave:
+ *
+ * b = zeros(4, 8);
+ * for i = 0:3 for j = 0:7 b(i+1, j+1) = cos( (i + 0.5) * (j - 2) * (pi/4) ) endfor endfor;
+ * printf("F(%.10f), ", b');
+ *
+ * Note: in each block of 8 numbers sign was changed for elements 2 and 7.
+ * Change of sign for element 2 allows to replace constant 1.0 (not representable
+ * in Q15 format) with -1.0 (fine with Q15).
+ * Changed sign for element 7 allows to have more similar constants
+ * and simplify subband filter function code.
+ */
+#define SBC_COS_TABLE_FIXED4_SCALE (sizeof(FIXED_T)*CHAR_BIT-1+SBC_FIXED8_EXTRA_BITS)
+#define F(x) (FIXED_A)((x)*((FIXED_A)1<<(sizeof(FIXED_T)*CHAR_BIT-1))+0.5)
+static const FIXED_T cos_table_fixed_4[32] = {
+	 F(0.7071067812), F(0.9238795325),-F(1.0000000000), F(0.9238795325),
+	 F(0.7071067812), F(0.3826834324), F(0.0000000000), F(0.3826834324),
+
+	-F(0.7071067812), F(0.3826834324),-F(1.0000000000), F(0.3826834324),
+	-F(0.7071067812),-F(0.9238795325),-F(0.0000000000),-F(0.9238795325),
+
+	-F(0.7071067812),-F(0.3826834324),-F(1.0000000000),-F(0.3826834324),
+	-F(0.7071067812), F(0.9238795325), F(0.0000000000), F(0.9238795325),
+
+	 F(0.7071067812),-F(0.9238795325),-F(1.0000000000),-F(0.9238795325),
+	 F(0.7071067812),-F(0.3826834324),-F(0.0000000000),-F(0.3826834324),
+};
+#undef F

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-22 23:30               ` Siarhei Siamashka
@ 2008-12-23  1:00                 ` Marcel Holtmann
  2008-12-23  8:20                   ` Jaska.Uimonen
  2008-12-23 10:45                   ` Siarhei Siamashka
  0 siblings, 2 replies; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-23  1:00 UTC (permalink / raw)
  To: Siarhei Siamashka; +Cc: ext Brad Midgley, Jaska Uimonen, linux-bluetooth

Hi Siarhei,

> > We had a talk with Jaska Uimonen here, and now I'm kind of delegated to
> > finish the work on this filtering function for SBC encoder (including the
> > final addition of ARM assembly optimizations).  He provided me with his
> > last variant of code, which contains some more optimizations to reduce the
> > number of operations and also loops unrolling. I will add his changes to
> > the patch on next iteration.
> >
> > Now the question is how to best integrate a fixed filtering function to git
> > repository? If I just continue adding changes to the patch in order to make
> > it a faster, it will be also not so obvious to see how we got to these code
> > transformations just from the commit log.
> 
> Next iteration done.  Added support for 4 subbands, number of arithmetic
> operations reduced (but without loop unrolling for better code readability),
> precision improved for both 16-bit and 32-bit fixed point, 'neginv' macro is
> now more portable and faster. The rest is in the code comments.

I don't mind having patches as attachment, but this makes it hard to
review and comment on them. Especially when it comes to stuff like
coding style (since I have no ideas about the rest).

+       t1[0] = t1[1] = t1[2] = t1[3] =
+               (FIXED_A)1 << (SBC_PROTO_FIXED4_SCALE-1);

Should be like this: (FIXED_A) 1 << (SBC_PROTO_FIXED4_SCALE - 1);

Also do you need the (FIXED_A) cast?

+       for (hop = 0; hop < 40; hop += 8) {
+               t1[0] +=  (FIXED_A)in[hop] * _sbc_proto_fixed4[hop];

Same here. There has to be a space after every case.

+               t1[i]  = (FIXED_A)1 << (SBC_COS_TABLE_FIXED4_SCALE-1-SCALE_OUT_BITS);

And between every operation there has to be a space: SCALE - 1 - SCALE.
In this case I would actually put the - 1 at the end, but that is pure
cosmetics and not a coding style violation.

Please fix all of these. There at least 8 or so.

+#define neginv(x) ((-2 >> 1 == -1) ? \
+                ((((int32_t)(x)) >> 31) ^ (int32_t)(x)) : \
+                ((x >= 0) ? (x) : -(x)-1))

Space after cast. Space before and after operator.

+#ifdef SBC_HIGH_PRECISION
+# define FIXED_A int64_t /* data type for fixed point accumulator */
+# define FIXED_T int32_t /* data type for fixed point constants */
+# define SBC_FIXED8_EXTRA_BITS 16
+#else
+# define FIXED_A int32_t /* data type for fixed point accumulator */
+# define FIXED_T int16_t /* data type for fixed point constants */
+# define SBC_FIXED8_EXTRA_BITS 0
+#endif

No space between # and define. I know that this is meant to improve
readability, but I don't see it.

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-23  1:00                 ` Marcel Holtmann
@ 2008-12-23  8:20                   ` Jaska.Uimonen
  2008-12-23 11:14                     ` Siarhei Siamashka
  2008-12-23 10:45                   ` Siarhei Siamashka
  1 sibling, 1 reply; 35+ messages in thread
From: Jaska.Uimonen @ 2008-12-23  8:20 UTC (permalink / raw)
  To: marcel, siarhei.siamashka; +Cc: bmidgley, linux-bluetooth

Hi Guys,

As Siarhei said, we had a talk last week about=20
The optimizations to the code.=20

I added the modifications to the code, which reduce=20
The amount of operations quite a lot. The problem=20
With this modification is that it also reduces the=20
Readability of the code.=20

So because of the redundancy in the cosine transform=20
it is possible to reduce the number of variables=20
and operations in the preceding filtering.=20
This is very hard to explain with comments in the code=20
(so you would write something like "here should be t[12], but=20
The table goes only to 8 because of redundancy in the following=20
Cosine transform). To really make the stuff=20
Understandable one should write some small=20
Wikipage or extensive comments to the code.=20

So this is the path the previous code also took, but=20
At some point there was a mistake. I really don't still=20
Get how the anamatrix stuff was calculated, but as=20
You see it takes time to reverse engineer :)=20

But I suggest me and Siarhei clean the code internally=20
And try really hard to follow the conventions. It starts=20
To get little bit messy, because we both have many=20
Versions of the code.

Br,
Jaska



>-----Original Message-----
>From: ext Marcel Holtmann [mailto:marcel@holtmann.org]=20
>Sent: 23 December, 2008 03:00
>To: Siamashka Siarhei (Nokia-D/Helsinki)
>Cc: ext Brad Midgley; Uimonen Jaska (Nokia-D/Helsinki);=20
>linux-bluetooth@vger.kernel.org
>Subject: Re: [RFC/PATCH] sbc: new filtering function for 8=20
>band fixed point encoding
>
>Hi Siarhei,
>
>> > We had a talk with Jaska Uimonen here, and now I'm kind of=20
>delegated=20
>> > to finish the work on this filtering function for SBC encoder=20
>> > (including the final addition of ARM assembly optimizations).  He=20
>> > provided me with his last variant of code, which contains=20
>some more=20
>> > optimizations to reduce the number of operations and also loops=20
>> > unrolling. I will add his changes to the patch on next iteration.
>> >
>> > Now the question is how to best integrate a fixed=20
>filtering function=20
>> > to git repository? If I just continue adding changes to=20
>the patch in=20
>> > order to make it a faster, it will be also not so obvious=20
>to see how=20
>> > we got to these code transformations just from the commit log.
>>=20
>> Next iteration done.  Added support for 4 subbands, number of=20
>> arithmetic operations reduced (but without loop unrolling for better=20
>> code readability), precision improved for both 16-bit and=20
>32-bit fixed=20
>> point, 'neginv' macro is now more portable and faster. The=20
>rest is in the code comments.
>
>I don't mind having patches as attachment, but this makes it=20
>hard to review and comment on them. Especially when it comes=20
>to stuff like coding style (since I have no ideas about the rest).
>
>+       t1[0] =3D t1[1] =3D t1[2] =3D t1[3] =3D
>+               (FIXED_A)1 << (SBC_PROTO_FIXED4_SCALE-1);
>
>Should be like this: (FIXED_A) 1 << (SBC_PROTO_FIXED4_SCALE - 1);
>
>Also do you need the (FIXED_A) cast?
>
>+       for (hop =3D 0; hop < 40; hop +=3D 8) {
>+               t1[0] +=3D  (FIXED_A)in[hop] * _sbc_proto_fixed4[hop];
>
>Same here. There has to be a space after every case.
>
>+               t1[i]  =3D (FIXED_A)1 <<=20
>+ (SBC_COS_TABLE_FIXED4_SCALE-1-SCALE_OUT_BITS);
>
>And between every operation there has to be a space: SCALE - 1 - SCALE.
>In this case I would actually put the - 1 at the end, but that=20
>is pure cosmetics and not a coding style violation.
>
>Please fix all of these. There at least 8 or so.
>
>+#define neginv(x) ((-2 >> 1 =3D=3D -1) ? \
>+                ((((int32_t)(x)) >> 31) ^ (int32_t)(x)) : \
>+                ((x >=3D 0) ? (x) : -(x)-1))
>
>Space after cast. Space before and after operator.
>
>+#ifdef SBC_HIGH_PRECISION
>+# define FIXED_A int64_t /* data type for fixed point=20
>accumulator */ #=20
>+define FIXED_T int32_t /* data type for fixed point constants */ #=20
>+define SBC_FIXED8_EXTRA_BITS 16 #else # define FIXED_A=20
>int32_t /* data=20
>+type for fixed point accumulator */ # define FIXED_T int16_t /* data=20
>+type for fixed point constants */ # define SBC_FIXED8_EXTRA_BITS 0=20
>+#endif
>
>No space between # and define. I know that this is meant to=20
>improve readability, but I don't see it.
>
>Regards
>
>Marcel
>
>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-23  8:20                   ` Jaska.Uimonen
@ 2008-12-23 11:14                     ` Siarhei Siamashka
  0 siblings, 0 replies; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-23 11:14 UTC (permalink / raw)
  To: Uimonen Jaska (Nokia-D/Helsinki)
  Cc: ext Marcel Holtmann, ext Brad Midgley, linux-bluetooth

On Tuesday 23 December 2008 10:20:25 Uimonen Jaska (Nokia-D/Helsinki) wrote:
> Hi Guys,
>
> As Siarhei said, we had a talk last week about
> The optimizations to the code.
>
> I added the modifications to the code, which reduce
> The amount of operations quite a lot.

Yes, thank you very much for doing this part of work, I included these
modifications into the last revision of my patch. And of course, your
initial patch to fix the SBC quality issues was invaluable. I hope that you
will not be forgotten to be credited properly when the filtering functions
are complete and committed.

> The problem 
> With this modification is that it also reduces the 
> Readability of the code.

Yes, that's one of the reasons why I'm trying to post different revisions
here, so all the history of modifications can be tracked if somebody is
interested.

> So because of the redundancy in the cosine transform
> it is possible to reduce the number of variables
> and operations in the preceding filtering.
> This is very hard to explain with comments in the code
> (so you would write something like "here should be t[12], but
> The table goes only to 8 because of redundancy in the following
> Cosine transform). To really make the stuff
> Understandable one should write some small
> Wikipage or extensive comments to the code.
>
> So this is the path the previous code also took, but
> At some point there was a mistake. I really don't still
> Get how the anamatrix stuff was calculated, but as
> You see it takes time to reverse engineer :)
>
> But I suggest me and Siarhei clean the code internally
> And try really hard to follow the conventions. It starts
> To get little bit messy, because we both have many
> Versions of the code.

I think that my last revision of patch is more or less complete and needs
only minor tweaks and cosmetic changes. It's not too obfuscated yet, and
the logic still can be seen (hopefully).

I intentionally decided not to include loops unrolling part as it actually
makes code harder to optimize further (for example add SIMD optimizations).
The idea is to have some kind of "reference" implementation, which is
reasonably compact and simple and also configurable for having high
precision mode (very useful for testing purposes).

Implementations, optimized for various platforms can be derived from it:
1. Platforms where multiplications are expensive and you want to avoid as many
of them as possible (reintroduce 'anamatrix' table, have shifts and additions
instead of multiplications, etc)
2. Platforms where the total number of operations is desired to be kept at
minimum (tables need to have unused and duplicate elements removed in
order to reduce the number of operations for loading constants)
3. Platforms where SIMD optimizations are possible (the code and tables should
have a regular structure, suitable for vectorizing, some table elements may
have to be be reordered)

As one can see, these platform specific optimizations have somewhat
contradictory requirements. I tried to arrange the code in such a way, that
any of such further optimizations can be easily introduced based on it.

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-23  1:00                 ` Marcel Holtmann
  2008-12-23  8:20                   ` Jaska.Uimonen
@ 2008-12-23 10:45                   ` Siarhei Siamashka
  2008-12-23 11:48                     ` Marcel Holtmann
  2008-12-29 10:46                     ` [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding Siarhei Siamashka
  1 sibling, 2 replies; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-23 10:45 UTC (permalink / raw)
  To: ext Marcel Holtmann; +Cc: ext Brad Midgley, Jaska Uimonen, linux-bluetooth

On Tuesday 23 December 2008 03:00:15 ext Marcel Holtmann wrote:
> Hi Siarhei,
>
> > > We had a talk with Jaska Uimonen here, and now I'm kind of delegated to
> > > finish the work on this filtering function for SBC encoder (including
> > > the final addition of ARM assembly optimizations).  He provided me with
> > > his last variant of code, which contains some more optimizations to
> > > reduce the number of operations and also loops unrolling. I will add
> > > his changes to the patch on next iteration.
> > >
> > > Now the question is how to best integrate a fixed filtering function to
> > > git repository? If I just continue adding changes to the patch in order
> > > to make it a faster, it will be also not so obvious to see how we got
> > > to these code transformations just from the commit log.
> >
> > Next iteration done.  Added support for 4 subbands, number of arithmetic
> > operations reduced (but without loop unrolling for better code
> > readability), precision improved for both 16-bit and 32-bit fixed point,
> > 'neginv' macro is now more portable and faster. The rest is in the code
> > comments.
>
> I don't mind having patches as attachment, but this makes it hard to
> review and comment on them.

I thought that as long as the attachments are plain text and 'suggest
automatic display' property is set, there should not be much problems
reviewing them.

I'm sorry for my incompetence in this part, but what do you recommend for
posting patches? A link to some guide is sufficient.

Well, it is good to always learn some new things.

> Especially when it comes to stuff like 
> coding style (since I have no ideas about the rest).
>
> +       t1[0] = t1[1] = t1[2] = t1[3] =
> +               (FIXED_A)1 << (SBC_PROTO_FIXED4_SCALE-1);
>
> Should be like this: (FIXED_A) 1 << (SBC_PROTO_FIXED4_SCALE - 1);

Do you have a reference to the coding style standard guide? This particular
requirement for casts and a space seems quite unusual and I have never seen
it before.

> Also do you need the (FIXED_A) cast?

Yes, it's good to have it for better portability. We need to make sure that
this bit is never shifted outside of the range of int type (constants have int
type by default). FIXED_A type can be 64-bit for high precision enabled build.
Even if SBC_FIXED8_EXTRA_BITS constant is currently set so that there is no
overflow, this kind of precaution is better to have in order not to have any
kind of unexpected nasty effect when experimenting with tuning various shift
values.

>
> +       for (hop = 0; hop < 40; hop += 8) {
> +               t1[0] +=  (FIXED_A)in[hop] * _sbc_proto_fixed4[hop];
>
> Same here. There has to be a space after every case.
>
> +               t1[i]  = (FIXED_A)1 <<
> (SBC_COS_TABLE_FIXED4_SCALE-1-SCALE_OUT_BITS);
>
> And between every operation there has to be a space: SCALE - 1 - SCALE.
> In this case I would actually put the - 1 at the end, but that is pure
> cosmetics and not a coding style violation.

This (SBC_COS_TABLE_FIXED4_SCALE-1-SCALE_OUT_BITS) expression can be
interpreted as ((SBC_COS_TABLE_FIXED4_SCALE - 1) - SCALE_OUT_BITS).
Putting -1 at the end is kind of obfuscation.

> Please fix all of these. There at least 8 or so.
>
> +#define neginv(x) ((-2 >> 1 == -1) ? \
> +                ((((int32_t)(x)) >> 31) ^ (int32_t)(x)) : \
> +                ((x >= 0) ? (x) : -(x)-1))
>
> Space after cast. Space before and after operator.
>
> +#ifdef SBC_HIGH_PRECISION
> +# define FIXED_A int64_t /* data type for fixed point accumulator */
> +# define FIXED_T int32_t /* data type for fixed point constants */
> +# define SBC_FIXED8_EXTRA_BITS 16
> +#else
> +# define FIXED_A int32_t /* data type for fixed point accumulator */
> +# define FIXED_T int16_t /* data type for fixed point constants */
> +# define SBC_FIXED8_EXTRA_BITS 0
> +#endif
>
> No space between # and define. I know that this is meant to improve
> readability, but I don't see it.

OK, I'll try to fix these. Getting rid of some spaces was done to try fitting 
some lines into 80 characters (that's also not always achieved yet).

Right now I'm also doublechecking correctness of the code to ensure that there
are no overflows of other problems related to audio quality.

-- 
Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-23 10:45                   ` Siarhei Siamashka
@ 2008-12-23 11:48                     ` Marcel Holtmann
  2008-12-29  9:16                       ` Testing SBC filtering functions Christian Hoene
  2008-12-29 10:46                     ` [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding Siarhei Siamashka
  1 sibling, 1 reply; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-23 11:48 UTC (permalink / raw)
  To: Siarhei Siamashka; +Cc: ext Brad Midgley, Jaska Uimonen, linux-bluetooth

Hi Siarhei,

> > > > We had a talk with Jaska Uimonen here, and now I'm kind of delegated to
> > > > finish the work on this filtering function for SBC encoder (including
> > > > the final addition of ARM assembly optimizations).  He provided me with
> > > > his last variant of code, which contains some more optimizations to
> > > > reduce the number of operations and also loops unrolling. I will add
> > > > his changes to the patch on next iteration.
> > > >
> > > > Now the question is how to best integrate a fixed filtering function to
> > > > git repository? If I just continue adding changes to the patch in order
> > > > to make it a faster, it will be also not so obvious to see how we got
> > > > to these code transformations just from the commit log.
> > >
> > > Next iteration done.  Added support for 4 subbands, number of arithmetic
> > > operations reduced (but without loop unrolling for better code
> > > readability), precision improved for both 16-bit and 32-bit fixed point,
> > > 'neginv' macro is now more portable and faster. The rest is in the code
> > > comments.
> >
> > I don't mind having patches as attachment, but this makes it hard to
> > review and comment on them.
> 
> I thought that as long as the attachments are plain text and 'suggest
> automatic display' property is set, there should not be much problems
> reviewing them.
> 
> I'm sorry for my incompetence in this part, but what do you recommend for
> posting patches? A link to some guide is sufficient.
> 
> Well, it is good to always learn some new things.

the kernel contains good documentation about how to send inline patches.
However as I said, I don't mind that much since I can easily apply them,
but for commenting on patches inline is easier.

> > Especially when it comes to stuff like 
> > coding style (since I have no ideas about the rest).
> >
> > +       t1[0] = t1[1] = t1[2] = t1[3] =
> > +               (FIXED_A)1 << (SBC_PROTO_FIXED4_SCALE-1);
> >
> > Should be like this: (FIXED_A) 1 << (SBC_PROTO_FIXED4_SCALE - 1);
> 
> Do you have a reference to the coding style standard guide? This
> particular
> requirement for casts and a space seems quite unusual and I have never
> seen
> it before.

It is the kernel coding style. Check Greg KH's paper from OLS some years
ago.

> OK, I'll try to fix these. Getting rid of some spaces was done to try fitting 
> some lines into 80 characters (that's also not always achieved yet).

This depends on the code. In normal code you could use continue and
break a lot, but within SBC this might not generate good binaries.

Don't try to omit whitespaces. Just don't.

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Testing SBC filtering functions
  2008-12-23 11:48                     ` Marcel Holtmann
@ 2008-12-29  9:16                       ` Christian Hoene
  2008-12-29 10:00                         ` Marcel Holtmann
  2008-12-29 11:06                         ` Testing SBC filtering functions Siarhei Siamashka
  0 siblings, 2 replies; 35+ messages in thread
From: Christian Hoene @ 2008-12-29  9:16 UTC (permalink / raw)
  To: 'Marcel Holtmann', 'Siarhei Siamashka'
  Cc: 'ext Brad Midgley', 'Jaska Uimonen', linux-bluetooth

Hello all,

Merry Christmas! 

I wrote this script to make some tests on the SBC decoder and encoder using
the recommended testing procedure with the reference bitstreams, the
reference codec and PEAQ. 
http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
I got little bit confused with all the latest patches and whether they are
included or not. Just send me an email on which patch you like to have
tested. Running the tests just takes half an hour; me to answer my emails
maybe a bit longer.

Greetings

 Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Testing SBC filtering functions
  2008-12-29  9:16                       ` Testing SBC filtering functions Christian Hoene
@ 2008-12-29 10:00                         ` Marcel Holtmann
  2008-12-29 10:55                           ` Christian Hoene
  2008-12-29 11:06                         ` Testing SBC filtering functions Siarhei Siamashka
  1 sibling, 1 reply; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-29 10:00 UTC (permalink / raw)
  To: hoene
  Cc: 'Siarhei Siamashka', 'ext Brad Midgley',
	'Jaska Uimonen',
	linux-bluetooth

Hi Christian,

> I wrote this script to make some tests on the SBC decoder and encoder using
> the recommended testing procedure with the reference bitstreams, the
> reference codec and PEAQ. 
> http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> I got little bit confused with all the latest patches and whether they are
> included or not. Just send me an email on which patch you like to have
> tested. Running the tests just takes half an hour; me to answer my emails
> maybe a bit longer.

I leave the patches to the guys that actually know what they are doing.

So I like the test script, but I would really prefer if we can use
sbctester.c program to verify the results. What do you think?

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Testing SBC filtering functions
  2008-12-29 10:00                         ` Marcel Holtmann
@ 2008-12-29 10:55                           ` Christian Hoene
  2008-12-29 12:03                             ` Marcel Holtmann
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Hoene @ 2008-12-29 10:55 UTC (permalink / raw)
  To: 'Marcel Holtmann'
  Cc: 'Siarhei Siamashka', 'ext Brad Midgley',
	'Jaska Uimonen',
	linux-bluetooth

Hello Marcel,

> I leave the patches to the guys that actually know what they are doing.
> 
> So I like the test script, but I would really prefer if we can use
> sbctester.c program to verify the results. What do you think?

The sbctester.c works fine for the decoder and is used in the testing
script. The testing of the encoder might also work with test procedure
implemented in sbctester.c. However, the official Bluetooth testing spec
requires that
"9.2.2.1 SBC Encoder ... [E2] The subjective quality (measured by
standardized way or by objective testing
methods, see [6]and [7]) shall be equivalent to that of reference encoder."
with " [6] Rec. ITU-R BS.1116-1, "METHODS FOR THE SUBJECTIVE ASSESSMENT OF
SMALL IMPAIRMENTS IN AUDIO SYSTEMS INCLUDING MULTICHANNEL SOUND SYSTEMS",
1994-1997
[7] Rec. ITU-R BS.1387-1, "METHOD FOR OBJECTIVE MEASUREMENTS OF PERCEIVED
AUDIO QUALITY", 2001".
Until the encoder works fine, we should use BS.1387-1 (PEAQ). If all errors
have been fixed, then sbctester.c might can be used instead of ITU-R
BS.1387-1. However, then, we might still need the reference implementations
or some pretranslated wav and sbc files.

Until the encoder is bug-fixed, it makes sense to work with this
intermediate solution.

With best regards
 Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Testing SBC filtering functions
  2008-12-29 10:55                           ` Christian Hoene
@ 2008-12-29 12:03                             ` Marcel Holtmann
  2008-12-29 12:31                               ` Christian Hoene
  0 siblings, 1 reply; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-29 12:03 UTC (permalink / raw)
  To: hoene
  Cc: 'Siarhei Siamashka', 'ext Brad Midgley',
	'Jaska Uimonen',
	linux-bluetooth

Hi Christian,

> > I leave the patches to the guys that actually know what they are doing.
> > 
> > So I like the test script, but I would really prefer if we can use
> > sbctester.c program to verify the results. What do you think?
> 
> The sbctester.c works fine for the decoder and is used in the testing
> script. The testing of the encoder might also work with test procedure
> implemented in sbctester.c. However, the official Bluetooth testing spec
> requires that
> "9.2.2.1 SBC Encoder ... [E2] The subjective quality (measured by
> standardized way or by objective testing
> methods, see [6]and [7]) shall be equivalent to that of reference encoder."
> with " [6] Rec. ITU-R BS.1116-1, "METHODS FOR THE SUBJECTIVE ASSESSMENT OF
> SMALL IMPAIRMENTS IN AUDIO SYSTEMS INCLUDING MULTICHANNEL SOUND SYSTEMS",
> 1994-1997
> [7] Rec. ITU-R BS.1387-1, "METHOD FOR OBJECTIVE MEASUREMENTS OF PERCEIVED
> AUDIO QUALITY", 2001".
> Until the encoder works fine, we should use BS.1387-1 (PEAQ). If all errors
> have been fixed, then sbctester.c might can be used instead of ITU-R
> BS.1387-1. However, then, we might still need the reference implementations
> or some pretranslated wav and sbc files.

I really prefer if we have an open source solution available to
everybody for the SBC conformance testing. If sbctester.c is not good
enough than we have to write one. I also don't mind using the reference
encoder/decoder and the reference files. Using wine is not optimal, but
I don't really mind it.

Another question that I have is how you use the sbc_enc_test_01.wav and
sbc_enc_test_02.wav files with Windows reference encoder? It turns into
a busy loop for me.

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Testing SBC filtering functions
  2008-12-29 12:03                             ` Marcel Holtmann
@ 2008-12-29 12:31                               ` Christian Hoene
  2008-12-29 12:41                                 ` Marcel Holtmann
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Hoene @ 2008-12-29 12:31 UTC (permalink / raw)
  To: 'Marcel Holtmann'; +Cc: linux-bluetooth

Hello Marcel,

> I really prefer if we have an open source solution available to
> everybody for the SBC conformance testing. If sbctester.c is not good
> enough than we have to write one. 

Sbctester.c is perfect for the decoder. For the encoder, we might also take
it or write something better to replace PEAQ.
BTW; then I will have spent 6000 EUR on buying one PEAQ license in vain ;-)

> I also don't mind using the reference
> encoder/decoder and the reference files. Using wine is not optimal, but
> I don't really mind it.

Fine. But can it be added to the repository? Or shall a description be given
on how to download it from the Bluetooth SIG web page.
> 
> Another question that I have is how you use the sbc_enc_test_01.wav and
> sbc_enc_test_02.wav files with Windows reference encoder? It turns into
> a busy loop for me.

Which file do are you referring too? I usually compress the original.wav
file (on the top of the second table). Which line options do you use for the
encoder?

Greetings

 Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Testing SBC filtering functions
  2008-12-29 12:31                               ` Christian Hoene
@ 2008-12-29 12:41                                 ` Marcel Holtmann
  2008-12-29 13:11                                   ` Christian Hoene
  0 siblings, 1 reply; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-29 12:41 UTC (permalink / raw)
  To: hoene; +Cc: linux-bluetooth

Hi Christian,

> > I really prefer if we have an open source solution available to
> > everybody for the SBC conformance testing. If sbctester.c is not good
> > enough than we have to write one. 
> 
> Sbctester.c is perfect for the decoder. For the encoder, we might also take
> it or write something better to replace PEAQ.

I am all for that since we need to be able to pass SBC conformance with
pure open source tools. Otherwise it becomes painful for some people.

> BTW; then I will have spent 6000 EUR on buying one PEAQ license in vain ;-)

I know multiple better ways of spending that money ;)

> > I also don't mind using the reference
> > encoder/decoder and the reference files. Using wine is not optimal, but
> > I don't really mind it.
> 
> Fine. But can it be added to the repository? Or shall a description be given
> on how to download it from the Bluetooth SIG web page.

No we can't add it. I would have to write something up on how to
download it and how testing is suppose to work.
 
> > Another question that I have is how you use the sbc_enc_test_01.wav and
> > sbc_enc_test_02.wav files with Windows reference encoder? It turns into
> > a busy loop for me.
> 
> Which file do are you referring too? I usually compress the original.wav
> file (on the top of the second table). Which line options do you use for the
> encoder?

The official files from the Bluetooth SIG and I don't know which options
would make them encode correctly. Just calling the reference encoder
with the filename makes it run into a busy loop. With wine and natively
running on Windows XP.

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Testing SBC filtering functions
  2008-12-29 12:41                                 ` Marcel Holtmann
@ 2008-12-29 13:11                                   ` Christian Hoene
  2008-12-29 13:17                                     ` Marcel Holtmann
  0 siblings, 1 reply; 35+ messages in thread
From: Christian Hoene @ 2008-12-29 13:11 UTC (permalink / raw)
  To: 'Marcel Holtmann'; +Cc: linux-bluetooth

Hi Marcel,

> The official files from the Bluetooth SIG and I don't know which options
> would make them encode correctly. Just calling the reference encoder
> with the filename makes it run into a busy loop. With wine and natively
> running on Windows XP.

No, the 28 official files from the Bluetooth SIG are already compressed. Use
the decoder to get the WAV files (or click on my web page on the links in
the first table). 

For testing the encoder, any kind of wav file can be considered. Usually, I
run my tests with about 20 test wav files used by the ITU-R. For my web
page, I only considered one stereo sound file (any other will do, too).

Greetings

 Christian

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: Testing SBC filtering functions
  2008-12-29 13:11                                   ` Christian Hoene
@ 2008-12-29 13:17                                     ` Marcel Holtmann
  2009-01-01 14:29                                       ` Testing SBC encoder correctness with sbctester works Christian Hoene
  0 siblings, 1 reply; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-29 13:17 UTC (permalink / raw)
  To: hoene; +Cc: linux-bluetooth

Hi Christian,

> > The official files from the Bluetooth SIG and I don't know which options
> > would make them encode correctly. Just calling the reference encoder
> > with the filename makes it run into a busy loop. With wine and natively
> > running on Windows XP.
> 
> No, the 28 official files from the Bluetooth SIG are already compressed. Use
> the decoder to get the WAV files (or click on my web page on the links in
> the first table). 

there are also *.wav files in one of the test specification zip files.

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Testing SBC encoder correctness with sbctester works...
  2008-12-29 13:17                                     ` Marcel Holtmann
@ 2009-01-01 14:29                                       ` Christian Hoene
  0 siblings, 0 replies; 35+ messages in thread
From: Christian Hoene @ 2009-01-01 14:29 UTC (permalink / raw)
  To: 'Marcel Holtmann'; +Cc: linux-bluetooth

Hallo Marcel,

Good news. We can use the sbctester for conformance testing of the encoder
using the procedure in my testing.sh script.
http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/testing.sh
Only for minor optimizations PEAQ might be a benefit.

Greetings

 Christian


> -----Original Message-----
> From: Marcel Holtmann [mailto:marcel@holtmann.org]
> Sent: Monday, December 29, 2008 2:18 PM
> To: hoene@uni-tuebingen.de
> Cc: linux-bluetooth@vger.kernel.org
> Subject: RE: Testing SBC filtering functions
> 
> Hi Christian,
> 
> > > The official files from the Bluetooth SIG and I don't know which
options
> > > would make them encode correctly. Just calling the reference encoder
> > > with the filename makes it run into a busy loop. With wine and
natively
> > > running on Windows XP.
> >
> > No, the 28 official files from the Bluetooth SIG are already compressed.
Use
> > the decoder to get the WAV files (or click on my web page on the links
in
> > the first table).
> 
> there are also *.wav files in one of the test specification zip files.
> 
> Regards
> 
> Marcel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Testing SBC filtering functions
  2008-12-29  9:16                       ` Testing SBC filtering functions Christian Hoene
  2008-12-29 10:00                         ` Marcel Holtmann
@ 2008-12-29 11:06                         ` Siarhei Siamashka
  2008-12-29 12:04                           ` Marcel Holtmann
  1 sibling, 1 reply; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-29 11:06 UTC (permalink / raw)
  To: hoene
  Cc: 'Marcel Holtmann', 'ext Brad Midgley',
	'Jaska Uimonen',
	linux-bluetooth

On Monday 29 December 2008 11:16:23 ext Christian Hoene wrote:
> Hello all,
> 
> Merry Christmas! 
> 
> I wrote this script to make some tests on the SBC decoder and encoder using
> the recommended testing procedure with the reference bitstreams, the
> reference codec and PEAQ. 
> http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> I got little bit confused with all the latest patches and whether they are
> included or not. Just send me an email on which patch you like to have
> tested. Running the tests just takes half an hour; me to answer my emails
> maybe a bit longer.

Please try the following patch (apply it to the latest git):
http://marc.info/?l=linux-bluetooth&m=123054787830678&w=2

You can try the patch "as is", and also with SBC_HIGH_PRECISION define
uncommented. High precision mode is naturally more likely to pass the
conformance tests.

I used my own script for testing with 'tiny_psnr' tool for comparing original file
before encoding and the final decoded result (it measures standard deviation
and PSNR). It would be interesting to see how our results correlate.

According to my tests, encoder should now be OK with this patch. Decoder
is definitely in a better shape than the current encoder from git. But the
decoder also seems to have bugs which degrade quality significantly
in some cases.

Best regards,
Siarhei Siamashka

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Testing SBC filtering functions
  2008-12-29 11:06                         ` Testing SBC filtering functions Siarhei Siamashka
@ 2008-12-29 12:04                           ` Marcel Holtmann
  2008-12-29 14:36                             ` Siarhei Siamashka
  0 siblings, 1 reply; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-29 12:04 UTC (permalink / raw)
  To: Siarhei Siamashka
  Cc: hoene, 'ext Brad Midgley', 'Jaska Uimonen',
	linux-bluetooth

Hi Siarhei,

> > I wrote this script to make some tests on the SBC decoder and encoder using
> > the recommended testing procedure with the reference bitstreams, the
> > reference codec and PEAQ. 
> > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> > I got little bit confused with all the latest patches and whether they are
> > included or not. Just send me an email on which patch you like to have
> > tested. Running the tests just takes half an hour; me to answer my emails
> > maybe a bit longer.
> 
> Please try the following patch (apply it to the latest git):
> http://marc.info/?l=linux-bluetooth&m=123054787830678&w=2
> 
> You can try the patch "as is", and also with SBC_HIGH_PRECISION define
> uncommented. High precision mode is naturally more likely to pass the
> conformance tests.
> 
> I used my own script for testing with 'tiny_psnr' tool for comparing original file
> before encoding and the final decoded result (it measures standard deviation
> and PSNR). It would be interesting to see how our results correlate.

can you open source tiny_psnr an we merge it into the BlueZ source?

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Testing SBC filtering functions
  2008-12-29 12:04                           ` Marcel Holtmann
@ 2008-12-29 14:36                             ` Siarhei Siamashka
  2008-12-29 15:04                               ` Siarhei Siamashka
  0 siblings, 1 reply; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-29 14:36 UTC (permalink / raw)
  To: ext Marcel Holtmann
  Cc: hoene, 'ext Brad Midgley', 'Jaska Uimonen',
	linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 4053 bytes --]

On Monday 29 December 2008 14:04:29 ext Marcel Holtmann wrote:
> Hi Siarhei,
> 
> > > I wrote this script to make some tests on the SBC decoder and encoder using
> > > the recommended testing procedure with the reference bitstreams, the
> > > reference codec and PEAQ. 
> > > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> > > I got little bit confused with all the latest patches and whether they are
> > > included or not. Just send me an email on which patch you like to have
> > > tested. Running the tests just takes half an hour; me to answer my emails
> > > maybe a bit longer.
> > 
> > Please try the following patch (apply it to the latest git):
> > http://marc.info/?l=linux-bluetooth&m=123054787830678&w=2
> > 
> > You can try the patch "as is", and also with SBC_HIGH_PRECISION define
> > uncommented. High precision mode is naturally more likely to pass the
> > conformance tests.
> > 
> > I used my own script for testing with 'tiny_psnr' tool for comparing original file
> > before encoding and the final decoded result (it measures standard deviation
> > and PSNR). It would be interesting to see how our results correlate.
> 
> can you open source tiny_psnr an we merge it into the BlueZ source?

It is already open source. This very simple tool is used in ffmpeg project
regression tests (and is part of ffmpeg distribution). It does not do
anything extraordinary, but just analyzes the difference between several
audio files and gets standard deviation and PSNR statistics. I just did not
feel like reinventing the wheel and used it for estimating quality :)

I only use the following patch on top of it (it can do automatic shift detection):
http://article.gmane.org/gmane.comp.video.ffmpeg.devel/74597
I find this patch quite useful (especially as it turns out for SBC), but ffmpeg
developers do not think so...

A crude ruby script (sorry, I don't speak shell scripting language) is attached.
I have been using it for testing audio quality.

For example, here is the test of a high precision sbcenc build (bitpool=255):

./sbc_encode_test.rb BigBuckBunny-stereo.flac
[2, 48000]
["-j -S -s8 -B16 -b255", "-j -l16 -n8 -r1569000"]
--- comparing original / sbcenc + sbcdec ---
stddev:  108.67 PSNR: 55.60 bytes:114519261/114520000

--- comparing original / sbcenc + sbc_decoder.exe ---
stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000

--- comparing original / sbc_encoder.exe + sbc_decoder.exe ---
stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000

--- comparing sbcenc + sbc_decoder.exe / sbc_encoder.exe + sbc_decoder.exe
stddev:    0.01 PSNR:130.99 bytes:114519552/114519552

Test of a standard sbcenc build (bitpool=255):

./sbc_encode_test.rb BigBuckBunny-stereo.flac
[2, 48000]
["-j -S -s8 -B16 -b255", "-j -l16 -n8 -r1569000"]
--- comparing original / sbcenc + sbcdec ---
stddev:  108.71 PSNR: 55.59 bytes:114519261/114520000

--- comparing original / sbcenc + sbc_decoder.exe ---
stddev:    2.07 PSNR: 89.98 bytes:114519260/114520000

--- comparing original / sbc_encoder.exe + sbc_decoder.exe ---
stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000

--- comparing sbcenc + sbc_decoder.exe / sbc_encoder.exe + sbc_decoder.exe
stddev:    1.77 PSNR: 91.34 bytes:114519552/114519552

Having high bitrate is useful for detecting bugs in analysis filter because
its contribution to quality loss is more visible in this configuration and
other factors play lesser role.

I tested a lot of various files and settings configurations. According to this
script, the quality of high precision sbcenc build matches the quality of
reference encoder pretty well. Standard 16-bit fixed point sbcenc build
introduces a very minor quality loss, noticeable only at extremely high
bitrates.

By the way, 'sbcdec' seems to introduce quite a noticeable distortion and
is orders of magnitude less precise than the encoder.

Of course it would be intertesting to see if the quality estimation done by
using tiny_psnr is adequate and if it can replace a 6000 EUR tool for this
particular purpose.

Best regards,
Siarhei Siamashka

[-- Attachment #2: sbc_encode_test.rb --]
[-- Type: application/x-ruby, Size: 2748 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: Testing SBC filtering functions
  2008-12-29 14:36                             ` Siarhei Siamashka
@ 2008-12-29 15:04                               ` Siarhei Siamashka
  0 siblings, 0 replies; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-29 15:04 UTC (permalink / raw)
  To: ext Marcel Holtmann
  Cc: hoene, 'ext Brad Midgley', 'Jaska Uimonen',
	linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 5248 bytes --]

On Monday 29 December 2008 16:36:15 ext Siarhei Siamashka wrote:
> On Monday 29 December 2008 14:04:29 ext Marcel Holtmann wrote:
> > Hi Siarhei,
> > 
> > > > I wrote this script to make some tests on the SBC decoder and encoder using
> > > > the recommended testing procedure with the reference bitstreams, the
> > > > reference codec and PEAQ. 
> > > > http://net.cs.uni-tuebingen.de/html/nexgenvoip/html/
> > > > I got little bit confused with all the latest patches and whether they are
> > > > included or not. Just send me an email on which patch you like to have
> > > > tested. Running the tests just takes half an hour; me to answer my emails
> > > > maybe a bit longer.
> > > 
> > > Please try the following patch (apply it to the latest git):
> > > http://marc.info/?l=linux-bluetooth&m=123054787830678&w=2
> > > 
> > > You can try the patch "as is", and also with SBC_HIGH_PRECISION define
> > > uncommented. High precision mode is naturally more likely to pass the
> > > conformance tests.
> > > 
> > > I used my own script for testing with 'tiny_psnr' tool for comparing original file
> > > before encoding and the final decoded result (it measures standard deviation
> > > and PSNR). It would be interesting to see how our results correlate.
> > 
> > can you open source tiny_psnr an we merge it into the BlueZ source?
> 
> It is already open source. This very simple tool is used in ffmpeg project
> regression tests (and is part of ffmpeg distribution). It does not do
> anything extraordinary, but just analyzes the difference between several
> audio files and gets standard deviation and PSNR statistics. I just did not
> feel like reinventing the wheel and used it for estimating quality :)
> 
> I only use the following patch on top of it (it can do automatic shift detection):
> http://article.gmane.org/gmane.comp.video.ffmpeg.devel/74597
> I find this patch quite useful (especially as it turns out for SBC), but ffmpeg
> developers do not think so...
> 
> A crude ruby script (sorry, I don't speak shell scripting language) is attached.
> I have been using it for testing audio quality.
> 
> For example, here is the test of a high precision sbcenc build (bitpool=255):
> 
> ./sbc_encode_test.rb BigBuckBunny-stereo.flac
> [2, 48000]
> ["-j -S -s8 -B16 -b255", "-j -l16 -n8 -r1569000"]
> --- comparing original / sbcenc + sbcdec ---
> stddev:  108.67 PSNR: 55.60 bytes:114519261/114520000
> 
> --- comparing original / sbcenc + sbc_decoder.exe ---
> stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000
> 
> --- comparing original / sbc_encoder.exe + sbc_decoder.exe ---
> stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000
> 
> --- comparing sbcenc + sbc_decoder.exe / sbc_encoder.exe + sbc_decoder.exe
> stddev:    0.01 PSNR:130.99 bytes:114519552/114519552
> 
> Test of a standard sbcenc build (bitpool=255):
> 
> ./sbc_encode_test.rb BigBuckBunny-stereo.flac
> [2, 48000]
> ["-j -S -s8 -B16 -b255", "-j -l16 -n8 -r1569000"]
> --- comparing original / sbcenc + sbcdec ---
> stddev:  108.71 PSNR: 55.59 bytes:114519261/114520000
> 
> --- comparing original / sbcenc + sbc_decoder.exe ---
> stddev:    2.07 PSNR: 89.98 bytes:114519260/114520000
> 
> --- comparing original / sbc_encoder.exe + sbc_decoder.exe ---
> stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000
> 
> --- comparing sbcenc + sbc_decoder.exe / sbc_encoder.exe + sbc_decoder.exe
> stddev:    1.77 PSNR: 91.34 bytes:114519552/114519552

[...] 

> By the way, 'sbcdec' seems to introduce quite a noticeable distortion and
> is orders of magnitude less precise than the encoder.

Please disregard this statement. I only tried to add support for sbcdec as the
last minute change (I did not pay any attention to the decoder audio quality
before today) and forgot to change its output to a normal WAV format
before comparing files. Sorry about that. A fixed script is attached and
updated results are below.

Test of a high precision sbcenc build (bitpool=255):

./sbc_encode_test.rb BigBuckBunny-stereo.flac
[2, 48000]
["-j -S -s8 -B16 -b255", "-j -l16 -n8 -r1569000"]
--- comparing original / sbcenc + sbcdec ---
stddev:    4.49 PSNR: 83.26 bytes:114519260/114520000

--- comparing original / sbcenc + sbc_decoder.exe ---
stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000

--- comparing original / sbc_encoder.exe + sbc_decoder.exe ---
stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000

--- comparing sbcenc + sbc_decoder.exe / sbc_encoder.exe + sbc_decoder.exe
stddev:    0.01 PSNR:130.99 bytes:114519552/114519552

Test of a standard sbcenc build (bitpool=255):

./sbc_encode_test.rb BigBuckBunny-stereo.flac
[2, 48000]
["-j -S -s8 -B16 -b255", "-j -l16 -n8 -r1569000"]
--- comparing original / sbcenc + sbcdec ---
stddev:    4.85 PSNR: 82.60 bytes:114519260/114520000

--- comparing original / sbcenc + sbc_decoder.exe ---
stddev:    2.07 PSNR: 89.98 bytes:114519260/114520000

--- comparing original / sbc_encoder.exe + sbc_decoder.exe ---
stddev:    1.09 PSNR: 95.56 bytes:114519260/114520000

--- comparing sbcenc + sbc_decoder.exe / sbc_encoder.exe + sbc_decoder.exe
stddev:    1.77 PSNR: 91.34 bytes:114519552/114519552

Still the sbcdec seems to be the major contributor to quality loss in all cases.


Best regards,
Siarhei Siamashka

[-- Attachment #2: sbc_encode_test.rb --]
[-- Type: application/x-ruby, Size: 2811 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-23 10:45                   ` Siarhei Siamashka
  2008-12-23 11:48                     ` Marcel Holtmann
@ 2008-12-29 10:46                     ` Siarhei Siamashka
  2008-12-29 11:56                       ` Marcel Holtmann
  1 sibling, 1 reply; 35+ messages in thread
From: Siarhei Siamashka @ 2008-12-29 10:46 UTC (permalink / raw)
  To: ext Marcel Holtmann; +Cc: ext Brad Midgley, Jaska Uimonen, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 1462 bytes --]

On Tuesday 23 December 2008 12:45:14 ext Siarhei Siamashka wrote:

[coding style problems description skipped]
 
> OK, I'll try to fix these. 

> Right now I'm also doublechecking correctness of the code to ensure that
> there are no overflows of other problems related to audio quality.

Here is (hopefully) the final revision of patch. I tested it with all the possible files
that I could find or generate myself and it seems to work fine in all cases.

The code was reverted to use fabs macro. After all, looks like it is the right
way to handle scale factors and adheres to the specification. The weird
effects on sound quality related to the use of fabs and the modified macro
which was tried as a replacement have been apparently the side effects of
other bugs.

Also I have to admit that the change from
http://marc.info/?l=linux-bluetooth&m=122946440507726&w=2
is not needed and the original patch was in fact correct. The output of
quantizer is strictly a positive number (unless there are bugs or overflows).
This was also changed back.

Rounding for the values in the final analysis function output was removed
(we already keep a lot of extra bits in output, so it does not matter for
precision). Also the change of SCALE_OUT_BITS to 15 has a nice feature
that the shifting of output values is not needed at all for 16-bit version
(and naturally rounding is not needed here too), this should be good for
performance.


Best regards,
Siarhei Siamashka

[-- Attachment #2: 0002-New-SBC-analysis-filter-function-to-replace-current.patch --]
[-- Type: text/x-diff, Size: 27301 bytes --]

>From a148f08f98a357c37f2f02efe477948e0723345e Mon Sep 17 00:00:00 2001
From: Siarhei Siamashka <siarhei.siamashka@nokia.com>
Date: Sun, 28 Dec 2008 03:22:59 +0200
Subject: [PATCH] New SBC analysis filter function to replace current broken code

This code is heavily based on the patch submitted by Jaska Uimonen.
Additional changes include preserving extra bits in the output of
filter function for better precision, support for both 16-bit and
32-bit fixed point implementation. Sign of some table values was
changed in order to preserve a regular code structure and have
multiply-accumulate oparations only. No additional optimizations
were applied as this code is intended to be some kind of "reference"
implementation. Platform specific optimizations may require
different tricks and can be branched off from this implementation.
Some extra information about this code can be found in linux-bluetooth
mailing list archive for December 2008.

---
 sbc/sbc.c        |  306 +++++++++++++++++++-----------------------------------
 sbc/sbc_math.h   |   14 +--
 sbc/sbc_tables.h |  247 +++++++++++++++++++++++++++++++++++++------
 3 files changed, 323 insertions(+), 244 deletions(-)

diff --git a/sbc/sbc.c b/sbc/sbc.c
index d3dcd9a..ce52e1e 100644
--- a/sbc/sbc.c
+++ b/sbc/sbc.c
@@ -40,6 +40,7 @@
 #include <string.h>
 #include <stdlib.h>
 #include <sys/types.h>
+#include <limits.h>
 
 #include "sbc_math.h"
 #include "sbc_tables.h"
@@ -93,7 +94,7 @@ struct sbc_decoder_state {
 struct sbc_encoder_state {
 	int subbands;
 	int position[2];
-	int32_t X[2][160];
+	int16_t X[2][160];
 };
 
 /*
@@ -656,75 +657,47 @@ static void sbc_encoder_init(struct sbc_encoder_state *state,
 	state->position[0] = state->position[1] = 9 * frame->subbands;
 }
 
-static inline void _sbc_analyze_four(const int32_t *in, int32_t *out)
+static inline void _sbc_analyze_four(const int16_t *in, int32_t *out)
 {
-	sbc_fixed_t t[8], s[5];
-
-	t[0] = SCALE4_STAGE1( /* Q8 */
-		MULA(_sbc_proto_4[0], in[8] - in[32], /* Q18 */
-		MUL( _sbc_proto_4[1], in[16] - in[24])));
-
-	t[1] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[2], in[1],
-		MULA(_sbc_proto_4[3], in[9],
-		MULA(_sbc_proto_4[4], in[17],
-		MULA(_sbc_proto_4[5], in[25],
-		MUL( _sbc_proto_4[6], in[33]))))));
-
-	t[2] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[7], in[2],
-		MULA(_sbc_proto_4[8], in[10],
-		MULA(_sbc_proto_4[9], in[18],
-		MULA(_sbc_proto_4[10], in[26],
-		MUL( _sbc_proto_4[11], in[34]))))));
-
-	t[3] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[12], in[3],
-		MULA(_sbc_proto_4[13], in[11],
-		MULA(_sbc_proto_4[14], in[19],
-		MULA(_sbc_proto_4[15], in[27],
-		MUL( _sbc_proto_4[16], in[35]))))));
-
-	t[4] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[17], in[4] + in[36],
-		MULA(_sbc_proto_4[18], in[12] + in[28],
-		MUL( _sbc_proto_4[19], in[20]))));
-
-	t[5] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[16], in[5],
-		MULA(_sbc_proto_4[15], in[13],
-		MULA(_sbc_proto_4[14], in[21],
-		MULA(_sbc_proto_4[13], in[29],
-		MUL( _sbc_proto_4[12], in[37]))))));
-
-	/* don't compute t[6]... this term always multiplies
-	 * with cos(pi/2) = 0 */
-
-	t[7] = SCALE4_STAGE1(
-		MULA(_sbc_proto_4[6], in[7],
-		MULA(_sbc_proto_4[5], in[15],
-		MULA(_sbc_proto_4[4], in[23],
-		MULA(_sbc_proto_4[3], in[31],
-		MUL( _sbc_proto_4[2], in[39]))))));
-
-	s[0] = MUL( _anamatrix4[0], t[0] + t[4]);
-	s[1] = MUL( _anamatrix4[2], t[2]);
-	s[2] = MULA(_anamatrix4[1], t[1] + t[3],
-		MUL(_anamatrix4[3], t[5]));
-	s[3] = MULA(_anamatrix4[3], t[1] + t[3],
-		MUL(_anamatrix4[1], -t[5] + t[7]));
-	s[4] = MUL( _anamatrix4[3], t[7]);
-
-	out[0] = SCALE4_STAGE2( s[0] + s[1] + s[2] + s[4]); /* Q0 */
-	out[1] = SCALE4_STAGE2(-s[0] + s[1] + s[3]);
-	out[2] = SCALE4_STAGE2(-s[0] + s[1] - s[3]);
-	out[3] = SCALE4_STAGE2( s[0] + s[1] - s[2] - s[4]);
+	FIXED_A t1[4];
+	FIXED_T t2[4];
+	int i = 0, hop = 0;
+
+	/* rounding coefficient */
+	t1[0] = t1[1] = t1[2] = t1[3] =
+		(FIXED_A) 1 << (SBC_PROTO_FIXED4_SCALE - 1);
+
+	/* low pass polyphase filter */
+	for (hop = 0; hop < 40; hop += 8) {
+		t1[0] += (FIXED_A) in[hop] * _sbc_proto_fixed4[hop];
+		t1[1] += (FIXED_A) in[hop + 1] * _sbc_proto_fixed4[hop + 1];
+		t1[2] += (FIXED_A) in[hop + 2] * _sbc_proto_fixed4[hop + 2];
+		t1[1] += (FIXED_A) in[hop + 3] * _sbc_proto_fixed4[hop + 3];
+		t1[0] += (FIXED_A) in[hop + 4] * _sbc_proto_fixed4[hop + 4];
+		t1[3] += (FIXED_A) in[hop + 5] * _sbc_proto_fixed4[hop + 5];
+		t1[3] += (FIXED_A) in[hop + 7] * _sbc_proto_fixed4[hop + 7];
+	}
+
+	/* scaling */
+	t2[0] = t1[0] >> SBC_PROTO_FIXED4_SCALE;
+	t2[1] = t1[1] >> SBC_PROTO_FIXED4_SCALE;
+	t2[2] = t1[2] >> SBC_PROTO_FIXED4_SCALE;
+	t2[3] = t1[3] >> SBC_PROTO_FIXED4_SCALE;
+
+	/* do the cos transform */
+	for (i = 0, hop = 0; i < 4; hop += 8, i++) {
+		out[i] = ((FIXED_A) t2[0] * cos_table_fixed_4[0 + hop] +
+			  (FIXED_A) t2[1] * cos_table_fixed_4[1 + hop] +
+			  (FIXED_A) t2[2] * cos_table_fixed_4[2 + hop] +
+			  (FIXED_A) t2[3] * cos_table_fixed_4[5 + hop]) >>
+			(SBC_COS_TABLE_FIXED4_SCALE - SCALE_OUT_BITS);
+	}
 }
 
 static inline void sbc_analyze_four(struct sbc_encoder_state *state,
 				struct sbc_frame *frame, int ch, int blk)
 {
-	int32_t *x = &state->X[ch][state->position[ch]];
+	int16_t *x = &state->X[ch][state->position[ch]];
 	int16_t *pcm = &frame->pcm_sample[ch][blk * 4];
 
 	/* Input 4 Audio Samples */
@@ -740,133 +713,64 @@ static inline void sbc_analyze_four(struct sbc_encoder_state *state,
 		state->position[ch] = 36;
 }
 
-static inline void _sbc_analyze_eight(const int32_t *in, int32_t *out)
+static inline void _sbc_analyze_eight(const int16_t *in, int32_t *out)
 {
-	sbc_fixed_t t[8], s[8];
-
-	t[0] = SCALE8_STAGE1( /* Q10 */
-		MULA(_sbc_proto_8[0], (in[16] - in[64]), /* Q18 = Q18 * Q0 */
-		MULA(_sbc_proto_8[1], (in[32] - in[48]),
-		MULA(_sbc_proto_8[2], in[4],
-		MULA(_sbc_proto_8[3], in[20],
-		MULA(_sbc_proto_8[4], in[36],
-		MUL( _sbc_proto_8[5], in[52])))))));
-
-	t[1] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[6], in[2],
-		MULA(_sbc_proto_8[7], in[18],
-		MULA(_sbc_proto_8[8], in[34],
-		MULA(_sbc_proto_8[9], in[50],
-		MUL(_sbc_proto_8[10], in[66]))))));
-
-	t[2] = SCALE8_STAGE1(
-		MULA(_sbc_proto_8[11], in[1],
-		MULA(_sbc_proto_8[12], in[17],
-		MULA(_sbc_proto_8[13], in[33],
-		MULA(_sbc_proto_8[14], in[49],
-		MULA(_sbc_proto_8[15], in[65],
-		MULA(_sbc_proto_8[16], in[3],
-		MULA(_sbc_proto_8[17], in[19],
-		MULA(_sbc_proto_8[18], in[35],
-		MULA(_sbc_proto_8[19], in[51],
-		MUL( _sbc_proto_8[20], in[67])))))))))));
-
-	t[3] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[21], in[5],
-		MULA( _sbc_proto_8[22], in[21],
-		MULA( _sbc_proto_8[23], in[37],
-		MULA( _sbc_proto_8[24], in[53],
-		MULA( _sbc_proto_8[25], in[69],
-		MULA(-_sbc_proto_8[15], in[15],
-		MULA(-_sbc_proto_8[14], in[31],
-		MULA(-_sbc_proto_8[13], in[47],
-		MULA(-_sbc_proto_8[12], in[63],
-		MUL( -_sbc_proto_8[11], in[79])))))))))));
-
-	t[4] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[26], in[6],
-		MULA( _sbc_proto_8[27], in[22],
-		MULA( _sbc_proto_8[28], in[38],
-		MULA( _sbc_proto_8[29], in[54],
-		MULA( _sbc_proto_8[30], in[70],
-		MULA(-_sbc_proto_8[10], in[14],
-		MULA(-_sbc_proto_8[9], in[30],
-		MULA(-_sbc_proto_8[8], in[46],
-		MULA(-_sbc_proto_8[7], in[62],
-		MUL( -_sbc_proto_8[6], in[78])))))))))));
-
-	t[5] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[31], in[7],
-		MULA( _sbc_proto_8[32], in[23],
-		MULA( _sbc_proto_8[33], in[39],
-		MULA( _sbc_proto_8[34], in[55],
-		MULA( _sbc_proto_8[35], in[71],
-		MULA(-_sbc_proto_8[20], in[13],
-		MULA(-_sbc_proto_8[19], in[29],
-		MULA(-_sbc_proto_8[18], in[45],
-		MULA(-_sbc_proto_8[17], in[61],
-		MUL( -_sbc_proto_8[16], in[77])))))))))));
-
-	t[6] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[36], (in[8] + in[72]),
-		MULA( _sbc_proto_8[37], (in[24] + in[56]),
-		MULA( _sbc_proto_8[38], in[40],
-		MULA(-_sbc_proto_8[39], in[12],
-		MULA(-_sbc_proto_8[5], in[28],
-		MULA(-_sbc_proto_8[4], in[44],
-		MULA(-_sbc_proto_8[3], in[60],
-		MUL( -_sbc_proto_8[2], in[76])))))))));
-
-	t[7] = SCALE8_STAGE1(
-		MULA( _sbc_proto_8[35], in[9],
-		MULA( _sbc_proto_8[34], in[25],
-		MULA( _sbc_proto_8[33], in[41],
-		MULA( _sbc_proto_8[32], in[57],
-		MULA( _sbc_proto_8[31], in[73],
-		MULA(-_sbc_proto_8[25], in[11],
-		MULA(-_sbc_proto_8[24], in[27],
-		MULA(-_sbc_proto_8[23], in[43],
-		MULA(-_sbc_proto_8[22], in[59],
-		MUL( -_sbc_proto_8[21], in[75])))))))))));
-
-	s[0] = MULA(  _anamatrix8[0], t[0],
-		MUL(  _anamatrix8[1], t[6]));
-	s[1] = MUL(   _anamatrix8[7], t[1]);
-	s[2] = MULA(  _anamatrix8[2], t[2],
-		MULA( _anamatrix8[3], t[3],
-		MULA( _anamatrix8[4], t[5],
-		MUL(  _anamatrix8[5], t[7]))));
-	s[3] = MUL(   _anamatrix8[6], t[4]);
-	s[4] = MULA(  _anamatrix8[3], t[2],
-		MULA(-_anamatrix8[5], t[3],
-		MULA(-_anamatrix8[2], t[5],
-		MUL( -_anamatrix8[4], t[7]))));
-	s[5] = MULA(  _anamatrix8[4], t[2],
-		MULA(-_anamatrix8[2], t[3],
-		MULA( _anamatrix8[5], t[5],
-		MUL(  _anamatrix8[3], t[7]))));
-	s[6] = MULA(  _anamatrix8[1], t[0],
-		MUL( -_anamatrix8[0], t[6]));
-	s[7] = MULA(  _anamatrix8[5], t[2],
-		MULA(-_anamatrix8[4], t[3],
-		MULA( _anamatrix8[3], t[5],
-		MUL( -_anamatrix8[2], t[7]))));
-
-	out[0] = SCALE8_STAGE2( s[0] + s[1] + s[2] + s[3]);
-	out[1] = SCALE8_STAGE2( s[1] - s[3] + s[4] + s[6]);
-	out[2] = SCALE8_STAGE2( s[1] - s[3] + s[5] - s[6]);
-	out[3] = SCALE8_STAGE2(-s[0] + s[1] + s[3] + s[7]);
-	out[4] = SCALE8_STAGE2(-s[0] + s[1] + s[3] - s[7]);
-	out[5] = SCALE8_STAGE2( s[1] - s[3] - s[5] - s[6]);
-	out[6] = SCALE8_STAGE2( s[1] - s[3] - s[4] + s[6]);
-	out[7] = SCALE8_STAGE2( s[0] + s[1] - s[2] + s[3]);
+	FIXED_A t1[8];
+	FIXED_T t2[8];
+	int i, hop;
+
+	/* rounding coefficient */
+	t1[0] = t1[1] = t1[2] = t1[3] = t1[4] = t1[5] = t1[6] = t1[7] =
+		(FIXED_A) 1 << (SBC_PROTO_FIXED8_SCALE-1);
+
+	/* low pass polyphase filter */
+	for (hop = 0; hop < 80; hop += 16) {
+		t1[0] += (FIXED_A) in[hop] * _sbc_proto_fixed8[hop];
+		t1[1] += (FIXED_A) in[hop + 1] * _sbc_proto_fixed8[hop + 1];
+		t1[2] += (FIXED_A) in[hop + 2] * _sbc_proto_fixed8[hop + 2];
+		t1[3] += (FIXED_A) in[hop + 3] * _sbc_proto_fixed8[hop + 3];
+		t1[4] += (FIXED_A) in[hop + 4] * _sbc_proto_fixed8[hop + 4];
+		t1[3] += (FIXED_A) in[hop + 5] * _sbc_proto_fixed8[hop + 5];
+		t1[2] += (FIXED_A) in[hop + 6] * _sbc_proto_fixed8[hop + 6];
+		t1[1] += (FIXED_A) in[hop + 7] * _sbc_proto_fixed8[hop + 7];
+		t1[0] += (FIXED_A) in[hop + 8] * _sbc_proto_fixed8[hop + 8];
+		t1[5] += (FIXED_A) in[hop + 9] * _sbc_proto_fixed8[hop + 9];
+		t1[6] += (FIXED_A) in[hop + 10] * _sbc_proto_fixed8[hop + 10];
+		t1[7] += (FIXED_A) in[hop + 11] * _sbc_proto_fixed8[hop + 11];
+		t1[7] += (FIXED_A) in[hop + 13] * _sbc_proto_fixed8[hop + 13];
+		t1[6] += (FIXED_A) in[hop + 14] * _sbc_proto_fixed8[hop + 14];
+		t1[5] += (FIXED_A) in[hop + 15] * _sbc_proto_fixed8[hop + 15];
+	}
+
+	/* scaling */
+	t2[0] = t1[0] >> SBC_PROTO_FIXED8_SCALE;
+	t2[1] = t1[1] >> SBC_PROTO_FIXED8_SCALE;
+	t2[2] = t1[2] >> SBC_PROTO_FIXED8_SCALE;
+	t2[3] = t1[3] >> SBC_PROTO_FIXED8_SCALE;
+	t2[4] = t1[4] >> SBC_PROTO_FIXED8_SCALE;
+	t2[5] = t1[5] >> SBC_PROTO_FIXED8_SCALE;
+	t2[6] = t1[6] >> SBC_PROTO_FIXED8_SCALE;
+	t2[7] = t1[7] >> SBC_PROTO_FIXED8_SCALE;
+
+	/* do the cos transform */
+	for (i = 0, hop = 0; i < 8; hop += 16, i++) {
+		out[i] = ((FIXED_A) t2[0] * cos_table_fixed_8[0 + hop] +
+			  (FIXED_A) t2[1] * cos_table_fixed_8[1 + hop] +
+			  (FIXED_A) t2[2] * cos_table_fixed_8[2 + hop] +
+			  (FIXED_A) t2[3] * cos_table_fixed_8[3 + hop] +
+			  (FIXED_A) t2[4] * cos_table_fixed_8[4 + hop] +
+			  (FIXED_A) t2[5] * cos_table_fixed_8[9 + hop] +
+			  (FIXED_A) t2[6] * cos_table_fixed_8[10 + hop] +
+			  (FIXED_A) t2[7] * cos_table_fixed_8[11 + hop]) >>
+			(SBC_COS_TABLE_FIXED8_SCALE - SCALE_OUT_BITS);
+	}
 }
 
 static inline void sbc_analyze_eight(struct sbc_encoder_state *state,
 					struct sbc_frame *frame, int ch,
 					int blk)
 {
-	int32_t *x = &state->X[ch][state->position[ch]];
+	int16_t *x = &state->X[ch][state->position[ch]];
 	int16_t *pcm = &frame->pcm_sample[ch][blk * 8];
 
 	/* Input 8 Audio Samples */
@@ -1004,7 +908,7 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 	for (ch = 0; ch < frame->channels; ch++) {
 		for (sb = 0; sb < frame->subbands; sb++) {
 			frame->scale_factor[ch][sb] = 0;
-			scalefactor[ch][sb] = 2;
+			scalefactor[ch][sb] = 2 << SCALE_OUT_BITS;
 			for (blk = 0; blk < frame->blocks; blk++) {
 				while (scalefactor[ch][sb] < fabs(frame->sb_sample_f[blk][ch][sb])) {
 					frame->scale_factor[ch][sb]++;
@@ -1026,18 +930,18 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 
 		for (sb = 0; sb < frame->subbands - 1; sb++) {
 			scale_factor_j[0] = 0;
-			scalefactor_j[0] = 2;
+			scalefactor_j[0] = 2 << SCALE_OUT_BITS;
 			scale_factor_j[1] = 0;
-			scalefactor_j[1] = 2;
+			scalefactor_j[1] = 2 << SCALE_OUT_BITS;
 
 			for (blk = 0; blk < frame->blocks; blk++) {
 				/* Calculate joint stereo signal */
 				sb_sample_j[blk][0] =
-					(frame->sb_sample_f[blk][0][sb] +
-						frame->sb_sample_f[blk][1][sb]) >> 1;
+					ASR(frame->sb_sample_f[blk][0][sb], 1) +
+					ASR(frame->sb_sample_f[blk][1][sb], 1);
 				sb_sample_j[blk][1] =
-					(frame->sb_sample_f[blk][0][sb] -
-						frame->sb_sample_f[blk][1][sb]) >> 1;
+					ASR(frame->sb_sample_f[blk][0][sb], 1) -
+					ASR(frame->sb_sample_f[blk][1][sb], 1);
 
 				/* calculate scale_factor_j and scalefactor_j for joint case */
 				while (scalefactor_j[0] < fabs(sb_sample_j[blk][0])) {
@@ -1099,13 +1003,19 @@ static int sbc_pack_frame(uint8_t *data, struct sbc_frame *frame, size_t len)
 	for (blk = 0; blk < frame->blocks; blk++) {
 		for (ch = 0; ch < frame->channels; ch++) {
 			for (sb = 0; sb < frame->subbands; sb++) {
-				if (levels[ch][sb] > 0) {
-					audio_sample =
-						(uint16_t) (((((int64_t)frame->sb_sample_f[blk][ch][sb]*levels[ch][sb]) >>
-									(frame->scale_factor[ch][sb] + 1)) +
-								levels[ch][sb]) >> 1);
-					PUT_BITS(audio_sample & levels[ch][sb], bits[ch][sb]);
-				}
+
+				if (bits[ch][sb] == 0)
+					continue;
+
+				audio_sample = ((uint64_t) levels[ch][sb] *
+					(((uint32_t) 1 <<
+					(frame->scale_factor[ch][sb] +
+					SCALE_OUT_BITS + 1)) +
+					frame->sb_sample_f[blk][ch][sb])) >>
+						(frame->scale_factor[ch][sb] +
+						SCALE_OUT_BITS + 2);
+
+				PUT_BITS(audio_sample, bits[ch][sb]);
 			}
 		}
 	}
diff --git a/sbc/sbc_math.h b/sbc/sbc_math.h
index b3d87a6..1503d75 100644
--- a/sbc/sbc_math.h
+++ b/sbc/sbc_math.h
@@ -29,31 +29,21 @@
 #define ASR(val, bits) ((-2 >> 1 == -1) ? \
 		 ((int32_t)(val)) >> (bits) : ((int32_t) (val)) / (1 << (bits)))
 
-#define SCALE_PROTO4_TBL	15
-#define SCALE_ANA4_TBL		17
-#define SCALE_PROTO8_TBL	16
-#define SCALE_ANA8_TBL		17
+#define SCALE_OUT_BITS 15
+
 #define SCALE_SPROTO4_TBL	12
 #define SCALE_SPROTO8_TBL	14
 #define SCALE_NPROTO4_TBL	11
 #define SCALE_NPROTO8_TBL	11
-#define SCALE4_STAGE1_BITS	15
-#define SCALE4_STAGE2_BITS	16
 #define SCALE4_STAGED1_BITS	15
 #define SCALE4_STAGED2_BITS	16
-#define SCALE8_STAGE1_BITS	15
-#define SCALE8_STAGE2_BITS	15
 #define SCALE8_STAGED1_BITS	15
 #define SCALE8_STAGED2_BITS	16
 
 typedef int32_t sbc_fixed_t;
 
-#define SCALE4_STAGE1(src)  ASR(src, SCALE4_STAGE1_BITS)
-#define SCALE4_STAGE2(src)  ASR(src, SCALE4_STAGE2_BITS)
 #define SCALE4_STAGED1(src) ASR(src, SCALE4_STAGED1_BITS)
 #define SCALE4_STAGED2(src) ASR(src, SCALE4_STAGED2_BITS)
-#define SCALE8_STAGE1(src)  ASR(src, SCALE8_STAGE1_BITS)
-#define SCALE8_STAGE2(src)  ASR(src, SCALE8_STAGE2_BITS)
 #define SCALE8_STAGED1(src) ASR(src, SCALE8_STAGED1_BITS)
 #define SCALE8_STAGED2(src) ASR(src, SCALE8_STAGED2_BITS)
 
diff --git a/sbc/sbc_tables.h b/sbc/sbc_tables.h
index f5daaa7..8df8c1f 100644
--- a/sbc/sbc_tables.h
+++ b/sbc/sbc_tables.h
@@ -40,40 +40,11 @@ static const int sbc_offset8[4][8] = {
 };
 
 
-#define SP4(val) (((int32_t)(val))/17658) /* Used to be #define SP4(val) ASR(val, SCALE_PROTO4_TBL) but causes wrong gain */
-#define SA4(val) ASR(val, SCALE_ANA4_TBL)
-#define SP8(val) (((int32_t)(val))/57740) /* Used to be #define SP8(val) ASR(val, SCALE_PROTO8_TBL) but causes wrong gain */
-#define SA8(val) ASR(val, SCALE_ANA8_TBL)
 #define SS4(val) ASR(val, SCALE_SPROTO4_TBL)
 #define SS8(val) ASR(val, SCALE_SPROTO8_TBL)
 #define SN4(val) ASR(val, SCALE_NPROTO4_TBL)
 #define SN8(val) ASR(val, SCALE_NPROTO8_TBL)
 
-static const int32_t _sbc_proto_4[20] = {
-	SP4(0x02cb3e8c), SP4(0x22b63dc0), SP4(0x002329cc), SP4(0x053b7548),
-	SP4(0x31eab940), SP4(0xec1f5e60), SP4(0xff3773a8), SP4(0x0061c5a7),
-	SP4(0x07646680), SP4(0x3f239480), SP4(0xf89f23a8), SP4(0x007a4737),
-	SP4(0x00b32807), SP4(0x083ddc80), SP4(0x4825e480), SP4(0x0191e578),
-	SP4(0x00ff11ca), SP4(0x00fb7991), SP4(0x069fdc58), SP4(0x4b584000)
-};
-
-static const int32_t _anamatrix4[4] = {
-	SA4(0x2d413cc0), SA4(0x3b20d780), SA4(0x40000000), SA4(0x187de2a0)
-};
-
-static const int32_t _sbc_proto_8[40] = {
-	SP8(0x02e5cd20), SP8(0x22d0c200), SP8(0x006bfe27), SP8(0x07808930),
-	SP8(0x3f1c8800), SP8(0xf8810d70), SP8(0x002cfdc6), SP8(0x055acf28),
-	SP8(0x31f566c0), SP8(0xebfe57e0), SP8(0xff27c437), SP8(0x001485cc),
-	SP8(0x041c6e58), SP8(0x2a7cfa80), SP8(0xe4c4a240), SP8(0xfe359e4c),
-	SP8(0x0048b1f8), SP8(0x0686ce30), SP8(0x38eec5c0), SP8(0xf2a1b9f0),
-	SP8(0xffe8904a), SP8(0x0095698a), SP8(0x0824a480), SP8(0x443b3c00),
-	SP8(0xfd7badc8), SP8(0x00d3e2d9), SP8(0x00c183d2), SP8(0x084e1950),
-	SP8(0x4810d800), SP8(0x017f43fe), SP8(0x01056dd8), SP8(0x00e9cb9f),
-	SP8(0x07d7d090), SP8(0x4a708980), SP8(0x0488fae8), SP8(0x0113bd20),
-	SP8(0x0107b1a8), SP8(0x069fb3c0), SP8(0x4b3db200), SP8(0x00763f48)
-};
-
 static const int32_t sbc_proto_4_40m0[] = {
 	SS4(0x00000000), SS4(0xffa6982f), SS4(0xfba93848), SS4(0x0456c7b8),
 	SS4(0x005967d1), SS4(0xfffb9ac7), SS4(0xff589157), SS4(0xf9c2a8d8),
@@ -116,11 +87,6 @@ static const int32_t sbc_proto_8_80m1[] = {
 	SS8(0x0d9daee0), SS8(0xeac182c0), SS8(0xfdf1c8d4), SS8(0xfff5bd1a)
 };
 
-static const int32_t _anamatrix8[8] = {
-	SA8(0x3b20d780), SA8(0x187de2a0), SA8(0x3ec52f80), SA8(0x3536cc40),
-	SA8(0x238e7680), SA8(0x0c7c5c20), SA8(0x2d413cc0), SA8(0x40000000)
-};
-
 static const int32_t synmatrix4[8][4] = {
 	{ SN4(0x05a82798), SN4(0xfa57d868), SN4(0xfa57d868), SN4(0x05a82798) },
 	{ SN4(0x030fbc54), SN4(0xf89be510), SN4(0x07641af0), SN4(0xfcf043ac) },
@@ -166,3 +132,216 @@ static const int32_t synmatrix8[16][8] = {
 	{ SN8(0xf9592678), SN8(0x018f8b84), SN8(0x07d8a5f0), SN8(0x0471ced0),
 	  SN8(0xfb8e3130), SN8(0xf8275a10), SN8(0xfe70747c), SN8(0x06a6d988) }
 };
+
+/* Uncomment the following line to enable high precision build of SBC encoder */
+
+/* #define SBC_HIGH_PRECISION */
+
+#ifdef SBC_HIGH_PRECISION
+#define FIXED_A int64_t /* data type for fixed point accumulator */
+#define FIXED_T int32_t /* data type for fixed point constants */
+#define SBC_FIXED_EXTRA_BITS 16
+#else
+#define FIXED_A int32_t /* data type for fixed point accumulator */
+#define FIXED_T int16_t /* data type for fixed point constants */
+#define SBC_FIXED_EXTRA_BITS 0
+#endif
+
+/* A2DP specification: Section 12.8 Tables
+ *
+ * Original values are premultiplied by 2 for better precision (that is the
+ * maximum which is possible without overflows)
+ *
+ * Note: in each block of 8 numbers sign was changed for elements 2 and 7
+ * in order to compensate the same change applied to cos_table_fixed_4
+ */
+#define SBC_PROTO_FIXED4_SCALE \
+	((sizeof(FIXED_T) * CHAR_BIT - 1) - SBC_FIXED_EXTRA_BITS + 1)
+#define F(x) (FIXED_A) ((x * 2) * \
+	((FIXED_A) 1 << (sizeof(FIXED_T) * CHAR_BIT - 1)) + 0.5)
+static const FIXED_T _sbc_proto_fixed4[40] = {
+	 F(0.00000000E+00),  F(5.36548976E-04),
+	-F(1.49188357E-03),  F(2.73370904E-03),
+	 F(3.83720193E-03),  F(3.89205149E-03),
+	 F(1.86581691E-03),  F(3.06012286E-03),
+
+	 F(1.09137620E-02),  F(2.04385087E-02),
+	-F(2.88757392E-02),  F(3.21939290E-02),
+	 F(2.58767811E-02),  F(6.13245186E-03),
+	-F(2.88217274E-02),  F(7.76463494E-02),
+
+	 F(1.35593274E-01),  F(1.94987841E-01),
+	-F(2.46636662E-01),  F(2.81828203E-01),
+	 F(2.94315332E-01),  F(2.81828203E-01),
+	 F(2.46636662E-01), -F(1.94987841E-01),
+
+	-F(1.35593274E-01), -F(7.76463494E-02),
+	 F(2.88217274E-02),  F(6.13245186E-03),
+	 F(2.58767811E-02),  F(3.21939290E-02),
+	 F(2.88757392E-02), -F(2.04385087E-02),
+
+	-F(1.09137620E-02), -F(3.06012286E-03),
+	-F(1.86581691E-03),  F(3.89205149E-03),
+	 F(3.83720193E-03),  F(2.73370904E-03),
+	 F(1.49188357E-03), -F(5.36548976E-04),
+};
+#undef F
+
+/*
+ * To produce this cosine matrix in Octave:
+ *
+ * b = zeros(4, 8);
+ * for i = 0:3
+ * for j = 0:7 b(i+1, j+1) = cos((i + 0.5) * (j - 2) * (pi/4))
+ * endfor
+ * endfor;
+ * printf("%.10f, ", b');
+ *
+ * Note: in each block of 8 numbers sign was changed for elements 2 and 7
+ *
+ * Change of sign for element 2 allows to replace constant 1.0 (not
+ * representable in Q15 format) with -1.0 (fine with Q15).
+ * Changed sign for element 7 allows to have more similar constants
+ * and simplify subband filter function code.
+ */
+#define SBC_COS_TABLE_FIXED4_SCALE \
+	((sizeof(FIXED_T) * CHAR_BIT - 1) + SBC_FIXED_EXTRA_BITS)
+#define F(x) (FIXED_A) ((x) * \
+	((FIXED_A) 1 << (sizeof(FIXED_T) * CHAR_BIT - 1)) + 0.5)
+static const FIXED_T cos_table_fixed_4[32] = {
+	 F(0.7071067812),  F(0.9238795325), -F(1.0000000000),  F(0.9238795325),
+	 F(0.7071067812),  F(0.3826834324),  F(0.0000000000),  F(0.3826834324),
+
+	-F(0.7071067812),  F(0.3826834324), -F(1.0000000000),  F(0.3826834324),
+	-F(0.7071067812), -F(0.9238795325), -F(0.0000000000), -F(0.9238795325),
+
+	-F(0.7071067812), -F(0.3826834324), -F(1.0000000000), -F(0.3826834324),
+	-F(0.7071067812),  F(0.9238795325),  F(0.0000000000),  F(0.9238795325),
+
+	 F(0.7071067812), -F(0.9238795325), -F(1.0000000000), -F(0.9238795325),
+	 F(0.7071067812), -F(0.3826834324), -F(0.0000000000), -F(0.3826834324),
+};
+#undef F
+
+/* A2DP specification: Section 12.8 Tables
+ *
+ * Original values are premultiplied by 4 for better precision (that is the
+ * maximum which is possible without overflows)
+ *
+ * Note: in each block of 16 numbers sign was changed for elements 4, 13, 14, 15
+ * in order to compensate the same change applied to cos_table_fixed_8
+ */
+#define SBC_PROTO_FIXED8_SCALE \
+	((sizeof(FIXED_T) * CHAR_BIT - 1) - SBC_FIXED_EXTRA_BITS + 2)
+#define F(x) (FIXED_A) ((x * 4) * \
+	((FIXED_A) 1 << (sizeof(FIXED_T) * CHAR_BIT - 1)) + 0.5)
+static const FIXED_T _sbc_proto_fixed8[80] = {
+	 F(0.00000000E+00),  F(1.56575398E-04),
+	 F(3.43256425E-04),  F(5.54620202E-04),
+	-F(8.23919506E-04),  F(1.13992507E-03),
+	 F(1.47640169E-03),  F(1.78371725E-03),
+	 F(2.01182542E-03),  F(2.10371989E-03),
+	 F(1.99454554E-03),  F(1.61656283E-03),
+	 F(9.02154502E-04),  F(1.78805361E-04),
+	 F(1.64973098E-03),  F(3.49717454E-03),
+
+	 F(5.65949473E-03),  F(8.02941163E-03),
+	 F(1.04584443E-02),  F(1.27472335E-02),
+	-F(1.46525263E-02),  F(1.59045603E-02),
+	 F(1.62208471E-02),  F(1.53184106E-02),
+	 F(1.29371806E-02),  F(8.85757540E-03),
+	 F(2.92408442E-03), -F(4.91578024E-03),
+	-F(1.46404076E-02),  F(2.61098752E-02),
+	 F(3.90751381E-02),  F(5.31873032E-02),
+
+	 F(6.79989431E-02),  F(8.29847578E-02),
+	 F(9.75753918E-02),  F(1.11196689E-01),
+	-F(1.23264548E-01),  F(1.33264415E-01),
+	 F(1.40753505E-01),  F(1.45389847E-01),
+	 F(1.46955068E-01),  F(1.45389847E-01),
+	 F(1.40753505E-01),  F(1.33264415E-01),
+	 F(1.23264548E-01), -F(1.11196689E-01),
+	-F(9.75753918E-02), -F(8.29847578E-02),
+
+	-F(6.79989431E-02), -F(5.31873032E-02),
+	-F(3.90751381E-02), -F(2.61098752E-02),
+	 F(1.46404076E-02), -F(4.91578024E-03),
+	 F(2.92408442E-03),  F(8.85757540E-03),
+	 F(1.29371806E-02),  F(1.53184106E-02),
+	 F(1.62208471E-02),  F(1.59045603E-02),
+	 F(1.46525263E-02), -F(1.27472335E-02),
+	-F(1.04584443E-02), -F(8.02941163E-03),
+
+	-F(5.65949473E-03), -F(3.49717454E-03),
+	-F(1.64973098E-03), -F(1.78805361E-04),
+	-F(9.02154502E-04),  F(1.61656283E-03),
+	 F(1.99454554E-03),  F(2.10371989E-03),
+	 F(2.01182542E-03),  F(1.78371725E-03),
+	 F(1.47640169E-03),  F(1.13992507E-03),
+	 F(8.23919506E-04), -F(5.54620202E-04),
+	-F(3.43256425E-04), -F(1.56575398E-04),
+};
+#undef F
+
+/*
+ * To produce this cosine matrix in Octave:
+ *
+ * b = zeros(8, 16);
+ * for i = 0:7
+ * for j = 0:15 b(i+1, j+1) = cos((i + 0.5) * (j - 4) * (pi/8))
+ * endfor endfor;
+ * printf("%.10f, ", b');
+ *
+ * Note: in each block of 16 numbers sign was changed for elements 4, 13, 14, 15
+ *
+ * Change of sign for element 4 allows to replace constant 1.0 (not
+ * representable in Q15 format) with -1.0 (fine with Q15).
+ * Changed signs for elements 13, 14, 15 allow to have more similar constants
+ * and simplify subband filter function code.
+ */
+#define SBC_COS_TABLE_FIXED8_SCALE \
+	((sizeof(FIXED_T) * CHAR_BIT - 1) + SBC_FIXED_EXTRA_BITS)
+#define F(x) (FIXED_A) ((x) * \
+	((FIXED_A) 1 << (sizeof(FIXED_T) * CHAR_BIT - 1)) + 0.5)
+static const FIXED_T cos_table_fixed_8[128] = {
+	 F(0.7071067812),  F(0.8314696123),  F(0.9238795325),  F(0.9807852804),
+	-F(1.0000000000),  F(0.9807852804),  F(0.9238795325),  F(0.8314696123),
+	 F(0.7071067812),  F(0.5555702330),  F(0.3826834324),  F(0.1950903220),
+	 F(0.0000000000),  F(0.1950903220),  F(0.3826834324),  F(0.5555702330),
+
+	-F(0.7071067812), -F(0.1950903220),  F(0.3826834324),  F(0.8314696123),
+	-F(1.0000000000),  F(0.8314696123),  F(0.3826834324), -F(0.1950903220),
+	-F(0.7071067812), -F(0.9807852804), -F(0.9238795325), -F(0.5555702330),
+	-F(0.0000000000), -F(0.5555702330), -F(0.9238795325), -F(0.9807852804),
+
+	-F(0.7071067812), -F(0.9807852804), -F(0.3826834324),  F(0.5555702330),
+	-F(1.0000000000),  F(0.5555702330), -F(0.3826834324), -F(0.9807852804),
+	-F(0.7071067812),  F(0.1950903220),  F(0.9238795325),  F(0.8314696123),
+	 F(0.0000000000),  F(0.8314696123),  F(0.9238795325),  F(0.1950903220),
+
+	 F(0.7071067812), -F(0.5555702330), -F(0.9238795325),  F(0.1950903220),
+	-F(1.0000000000),  F(0.1950903220), -F(0.9238795325), -F(0.5555702330),
+	 F(0.7071067812),  F(0.8314696123), -F(0.3826834324), -F(0.9807852804),
+	-F(0.0000000000), -F(0.9807852804), -F(0.3826834324),  F(0.8314696123),
+
+	 F(0.7071067812),  F(0.5555702330), -F(0.9238795325), -F(0.1950903220),
+	-F(1.0000000000), -F(0.1950903220), -F(0.9238795325),  F(0.5555702330),
+	 F(0.7071067812), -F(0.8314696123), -F(0.3826834324),  F(0.9807852804),
+	 F(0.0000000000),  F(0.9807852804), -F(0.3826834324), -F(0.8314696123),
+
+	-F(0.7071067812),  F(0.9807852804), -F(0.3826834324), -F(0.5555702330),
+	-F(1.0000000000), -F(0.5555702330), -F(0.3826834324),  F(0.9807852804),
+	-F(0.7071067812), -F(0.1950903220),  F(0.9238795325), -F(0.8314696123),
+	-F(0.0000000000), -F(0.8314696123),  F(0.9238795325), -F(0.1950903220),
+
+	-F(0.7071067812),  F(0.1950903220),  F(0.3826834324), -F(0.8314696123),
+	-F(1.0000000000), -F(0.8314696123),  F(0.3826834324),  F(0.1950903220),
+	-F(0.7071067812),  F(0.9807852804), -F(0.9238795325),  F(0.5555702330),
+	-F(0.0000000000),  F(0.5555702330), -F(0.9238795325),  F(0.9807852804),
+
+	 F(0.7071067812), -F(0.8314696123),  F(0.9238795325), -F(0.9807852804),
+	-F(1.0000000000), -F(0.9807852804),  F(0.9238795325), -F(0.8314696123),
+	 F(0.7071067812), -F(0.5555702330),  F(0.3826834324), -F(0.1950903220),
+	-F(0.0000000000), -F(0.1950903220),  F(0.3826834324), -F(0.5555702330),
+};
+#undef F
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding
  2008-12-29 10:46                     ` [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding Siarhei Siamashka
@ 2008-12-29 11:56                       ` Marcel Holtmann
  0 siblings, 0 replies; 35+ messages in thread
From: Marcel Holtmann @ 2008-12-29 11:56 UTC (permalink / raw)
  To: Siarhei Siamashka; +Cc: ext Brad Midgley, Jaska Uimonen, linux-bluetooth

Hi Siarhei,

> > Right now I'm also doublechecking correctness of the code to ensure that
> > there are no overflows of other problems related to audio quality.
> 
> Here is (hopefully) the final revision of patch. I tested it with all the possible files
> that I could find or generate myself and it seems to work fine in all cases.
> 
> The code was reverted to use fabs macro. After all, looks like it is the right
> way to handle scale factors and adheres to the specification. The weird
> effects on sound quality related to the use of fabs and the modified macro
> which was tried as a replacement have been apparently the side effects of
> other bugs.
> 
> Also I have to admit that the change from
> http://marc.info/?l=linux-bluetooth&m=122946440507726&w=2
> is not needed and the original patch was in fact correct. The output of
> quantizer is strictly a positive number (unless there are bugs or overflows).
> This was also changed back.
> 
> Rounding for the values in the final analysis function output was removed
> (we already keep a lot of extra bits in output, so it does not matter for
> precision). Also the change of SCALE_OUT_BITS to 15 has a nice feature
> that the shifting of output values is not needed at all for 16-bit version
> (and naturally rounding is not needed here too), this should be good for
> performance.

I applied your patch to the upstream tree. Thanks for your work. I know
like to get feedback from other people if this helps us with the SBC
conformance testing and the audio quality.

Regards

Marcel



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2009-01-01 14:29 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-28 13:35 [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding Jaska Uimonen
2008-11-28 14:18 ` Marcel Holtmann
2008-11-28 14:24   ` Jelle de Jong
2008-11-28 15:20     ` Jaska Uimonen
2008-11-28 18:13       ` David Sainty
2008-11-28 15:14   ` Jaska Uimonen
2008-12-02 20:15 ` Jim Carter
2008-12-12 17:14   ` Siarhei Siamashka
2008-12-12 19:19     ` Brad Midgley
2008-12-15 12:54       ` Siarhei Siamashka
2008-12-15 15:16         ` Brad Midgley
2008-12-16 22:37           ` Siarhei Siamashka
2008-12-17  8:16             ` Jaska Uimonen
2008-12-19 22:12             ` Siarhei Siamashka
2008-12-22 23:30               ` Siarhei Siamashka
2008-12-23  1:00                 ` Marcel Holtmann
2008-12-23  8:20                   ` Jaska.Uimonen
2008-12-23 11:14                     ` Siarhei Siamashka
2008-12-23 10:45                   ` Siarhei Siamashka
2008-12-23 11:48                     ` Marcel Holtmann
2008-12-29  9:16                       ` Testing SBC filtering functions Christian Hoene
2008-12-29 10:00                         ` Marcel Holtmann
2008-12-29 10:55                           ` Christian Hoene
2008-12-29 12:03                             ` Marcel Holtmann
2008-12-29 12:31                               ` Christian Hoene
2008-12-29 12:41                                 ` Marcel Holtmann
2008-12-29 13:11                                   ` Christian Hoene
2008-12-29 13:17                                     ` Marcel Holtmann
2009-01-01 14:29                                       ` Testing SBC encoder correctness with sbctester works Christian Hoene
2008-12-29 11:06                         ` Testing SBC filtering functions Siarhei Siamashka
2008-12-29 12:04                           ` Marcel Holtmann
2008-12-29 14:36                             ` Siarhei Siamashka
2008-12-29 15:04                               ` Siarhei Siamashka
2008-12-29 10:46                     ` [RFC/PATCH] sbc: new filtering function for 8 band fixed point encoding Siarhei Siamashka
2008-12-29 11:56                       ` Marcel Holtmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.