On 25.09.20 19:13, Vladimir Sementsov-Ogievskiy wrote:
> 25.09.2020 13:24, Max Reitz wrote:
>> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote:
>>> Performance improvements / degradations are usually discussed in
>>> percentage. Let's make the script calculate it for us.
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> ---
>>>   scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++---
>>>   1 file changed, 42 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/scripts/simplebench/simplebench.py
>>> b/scripts/simplebench/simplebench.py
>>> index 56d3a91ea2..0ff05a38b8 100644
>>> --- a/scripts/simplebench/simplebench.py
>>> +++ b/scripts/simplebench/simplebench.py
>>
>> [...]
>>
>>> +            for j in range(0, i):
>>> +                env_j = results['envs'][j]
>>> +                res_j = case_results[env_j['id']]
>>> +
>>> +                if 'average' not in res_j:
>>> +                    # Failed result
>>> +                    cell += ' --'
>>> +                    continue
>>> +
>>> +                col_j = chr(ord('A') + j)
>>> +                avg_j = res_j['average']
>>> +                delta = (res['average'] - avg_j) / avg_j * 100
>>
>> I was wondering why you’d subtract, when percentage differences usually
>> mean a quotient.  Then I realized that this would usually be written as:
>>
>> (res['average'] / avg_j - 1) * 100
>>
>>> +                delta_delta = (res['delta'] + res_j['delta']) /
>>> avg_j * 100
>>
>> Why not use the new format_percent for both cases?
> 
> because I want less precision here
> 
>>
>>> +                cell += f'
>>> {col_j}{round(delta):+}±{round(delta_delta)}%'
>>
>> I don’t know what I should think about ±delta_delta.  If I saw “Compared
>> to run A, this is +42.1%±2.0%”, I would think that you calculated the
>> difference between each run result, and then based on that array
>> calculated average and standard deviation.
>>
>> Furthermore, I don’t even know what the delta_delta is supposed to tell
>> you.  It isn’t even a delta_delta, it’s an average_delta.
> 
> not avarage, but sum of errors. And it shows the error for the delta
> 
>>
>> The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0.
> 
> and this shows nothing.
> 
> Assume we have = A = 10+-2 and B = 15+-2
> 
> The difference is (15-10)+-(2+2) = 5+-4.
> And your formula will give (2/2 - 1) *100 = 0, which is wrong.

Well, it’s the difference in delta (whatever “delta” means here).  I
wouldn’t call it wrong.

We want to compare two test runs, so if both have the same delta, then
the difference in delta is 0.  That’s how understood it, hence my “Δ±”
notation below.  (This may be useful information, because perhaps one
may consider a big delta bad, and so if one run has less delta than
another one, that may be considered a better outcoming.  Comparing
deltas has a purpose.)

I see I understood your intentions wrong, though; you want to just give
an error estimate for the difference of the means of both runs.  I have
to admit I don’t know how that works exactly, and it will probably
heavily depend on what “delta” is.

(Googling suggests that for the standard deviation, one would square
each SD to get the variance back, then divide by the respective sample
size, add, and take the square root.  But that’s for when you have two
distributions that you want to combine, but we want to compare here...
http://homework.uoregon.edu/pub/class/es202/ztest.html seems to suggest
the same for such a comparison, though.  I don’t know.)

(As for your current version, after more thinking it does seem right
when delta is the maximum deviation.  Or perhaps the deltas shouldn’t be
added then but the maximum should be used?  I’m just not sure.)

((Perhaps it doesn’t even matter.  “Don’t believe any statistics you
haven’t forged yourself”, and so on.))

Max