On Wed, Oct 16, 2019 at 12:24:14PM +0100, Vincenzo Frascino wrote:
> On 10/11/19 2:23 AM, Dmitry Safonov wrote:
> > From: Andrei Vagin <avagin@gmail.com>
> > 
> > Place the branch with no concurrent write before contended case.
> > 
> > Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz
> > (more clock_gettime() cycles - the better):
> >         | before    | after
> > -----------------------------------
> >         | 150252214 | 153242367
> >         | 150301112 | 153324800
> >         | 150392773 | 153125401
> >         | 150373957 | 153399355
> >         | 150303157 | 153489417
> >         | 150365237 | 153494270
> > -----------------------------------
> > avg     | 150331408 | 153345935
> > diff %  | 2	    | 0
> > -----------------------------------
> > stdev % | 0.3	    | 0.1
> > 
> > Signed-off-by: Andrei Vagin <avagin@gmail.com>
> > Co-developed-by: Dmitry Safonov <dima@arista.com>
> > Signed-off-by: Dmitry Safonov <dima@arista.com>
> 
> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com>
> Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com>

Hello Vincenzo,

Could you test the attached patch on aarch64? On x86, it gives about 9%
performance improvement for CLOCK_MONOTONIC and CLOCK_BOOTTIME.

Here is my test:
https://github.com/avagin/vdso-perf

It is calling clock_gettime() in a loop for three seconds and then
reports a number of iterations.

Thanks,
Andrei