On Wed, Oct 16, 2019 at 12:24:14PM +0100, Vincenzo Frascino wrote: > On 10/11/19 2:23 AM, Dmitry Safonov wrote: > > From: Andrei Vagin > > > > Place the branch with no concurrent write before contended case. > > > > Performance numbers for Intel(R) Core(TM) i5-6300U CPU @ 2.40GHz > > (more clock_gettime() cycles - the better): > > | before | after > > ----------------------------------- > > | 150252214 | 153242367 > > | 150301112 | 153324800 > > | 150392773 | 153125401 > > | 150373957 | 153399355 > > | 150303157 | 153489417 > > | 150365237 | 153494270 > > ----------------------------------- > > avg | 150331408 | 153345935 > > diff % | 2 | 0 > > ----------------------------------- > > stdev % | 0.3 | 0.1 > > > > Signed-off-by: Andrei Vagin > > Co-developed-by: Dmitry Safonov > > Signed-off-by: Dmitry Safonov > > Reviewed-by: Vincenzo Frascino > Tested-by: Vincenzo Frascino Hello Vincenzo, Could you test the attached patch on aarch64? On x86, it gives about 9% performance improvement for CLOCK_MONOTONIC and CLOCK_BOOTTIME. Here is my test: https://github.com/avagin/vdso-perf It is calling clock_gettime() in a loop for three seconds and then reports a number of iterations. Thanks, Andrei