On 25.09.20 19:13, Vladimir Sementsov-Ogievskiy wrote: > 25.09.2020 13:24, Max Reitz wrote: >> On 18.09.20 20:19, Vladimir Sementsov-Ogievskiy wrote: >>> Performance improvements / degradations are usually discussed in >>> percentage. Let's make the script calculate it for us. >>> >>> Signed-off-by: Vladimir Sementsov-Ogievskiy >>> --- >>>   scripts/simplebench/simplebench.py | 46 +++++++++++++++++++++++++++--- >>>   1 file changed, 42 insertions(+), 4 deletions(-) >>> >>> diff --git a/scripts/simplebench/simplebench.py >>> b/scripts/simplebench/simplebench.py >>> index 56d3a91ea2..0ff05a38b8 100644 >>> --- a/scripts/simplebench/simplebench.py >>> +++ b/scripts/simplebench/simplebench.py >> >> [...] >> >>> +            for j in range(0, i): >>> +                env_j = results['envs'][j] >>> +                res_j = case_results[env_j['id']] >>> + >>> +                if 'average' not in res_j: >>> +                    # Failed result >>> +                    cell += ' --' >>> +                    continue >>> + >>> +                col_j = chr(ord('A') + j) >>> +                avg_j = res_j['average'] >>> +                delta = (res['average'] - avg_j) / avg_j * 100 >> >> I was wondering why you’d subtract, when percentage differences usually >> mean a quotient.  Then I realized that this would usually be written as: >> >> (res['average'] / avg_j - 1) * 100 >> >>> +                delta_delta = (res['delta'] + res_j['delta']) / >>> avg_j * 100 >> >> Why not use the new format_percent for both cases? > > because I want less precision here > >> >>> +                cell += f' >>> {col_j}{round(delta):+}±{round(delta_delta)}%' >> >> I don’t know what I should think about ±delta_delta.  If I saw “Compared >> to run A, this is +42.1%±2.0%”, I would think that you calculated the >> difference between each run result, and then based on that array >> calculated average and standard deviation. >> >> Furthermore, I don’t even know what the delta_delta is supposed to tell >> you.  It isn’t even a delta_delta, it’s an average_delta. > > not avarage, but sum of errors. And it shows the error for the delta > >> >> The delta_delta would be (res['delta'] / res_j['delta'] - 1) * 100.0. > > and this shows nothing. > > Assume we have = A = 10+-2 and B = 15+-2 > > The difference is (15-10)+-(2+2) = 5+-4. > And your formula will give (2/2 - 1) *100 = 0, which is wrong. Well, it’s the difference in delta (whatever “delta” means here). I wouldn’t call it wrong. We want to compare two test runs, so if both have the same delta, then the difference in delta is 0. That’s how understood it, hence my “Δ±” notation below. (This may be useful information, because perhaps one may consider a big delta bad, and so if one run has less delta than another one, that may be considered a better outcoming. Comparing deltas has a purpose.) I see I understood your intentions wrong, though; you want to just give an error estimate for the difference of the means of both runs. I have to admit I don’t know how that works exactly, and it will probably heavily depend on what “delta” is. (Googling suggests that for the standard deviation, one would square each SD to get the variance back, then divide by the respective sample size, add, and take the square root. But that’s for when you have two distributions that you want to combine, but we want to compare here... http://homework.uoregon.edu/pub/class/es202/ztest.html seems to suggest the same for such a comparison, though. I don’t know.) (As for your current version, after more thinking it does seem right when delta is the maximum deviation. Or perhaps the deltas shouldn’t be added then but the maximum should be used? I’m just not sure.) ((Perhaps it doesn’t even matter. “Don’t believe any statistics you haven’t forged yourself”, and so on.)) Max