From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: MIME-Version: 1.0 In-Reply-To: References: <20180620150138.49380-1-arnd@arndb.de> <87r2l1sby0.fsf@linux.intel.com> <87muvpsa61.fsf@linux.intel.com> From: Arnd Bergmann Date: Mon, 25 Jun 2018 15:42:54 +0200 Message-ID: Subject: Re: [PATCH] vfs: replace current_kernel_time64 with ktime equivalent To: Andi Kleen Cc: Jens Axboe , Jan Kara , Jeff Layton , "Darrick J. Wong" , y2038 Mailman List , Brian Foster , Miklos Szeredi , Pavel Tatashin , Linux Kernel Mailing List , Linux FS-devel Mailing List , Alexander Viro , Andi Kleen , Andrew Morton , Deepa Dinamani , Daniel Lezcano , Thomas Gleixner , John Stultz , Stephen Boyd Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: On Wed, Jun 20, 2018 at 9:35 PM, Arnd Bergmann wrote: > On Wed, Jun 20, 2018 at 6:19 PM, Andi Kleen wrote: >> Arnd Bergmann writes: >>> >>> To clarify: current_kernel_time() uses at most millisecond resolution rather >>> than microsecond, as tkr_mono.xtime_nsec only gets updated during the >>> timer tick. >> >> Ah you're right. I remember now: the motivation was to make sure there >> is basically no overhead. In some setups the full gtod can be rather >> slow, particularly if it falls back to some crappy timer. > > This means, we're probably fine with a compile-time option that > distros can choose to enable depending on what classes of hardware > they are targetting, like > > struct timespec64 current_time(struct inode *inode) > { > struct timespec64 now; > u64 gran = inode->i_sb->s_time_gran; > > if (IS_ENABLED(CONFIG_HIRES_INODE_TIMES) && > gran <= NSEC_PER_JIFFY) > ktime_get_real_ts64(&now); > else > ktime_get_coarse_real_ts64(&now); > > return timespec64_trunc(now, gran); > } > > With that implementation, we could still let file systems choose > to get coarse timestamps by tuning the granularity in the > superblock s_time_gran, which would result in nice round > tv_nsec values that represent the actual accuracy. I've done some simple tests and found that on a variety of x86, arm32 and arm64 CPUs, it takes between 70 and 100 CPU cycles to read the TSC and add it to the coarse clock, e.g. on a 3.1GHz Ryzen, using the little test program below: vdso hires: 37.18ns vdso coarse: 6.44ns sysc hires: 161.62ns sysc coarse: 133.87ns On the same machine, it takes around 400ns (1240 cycles) to write one byte into a tmpfs file with pwrite(). Adding 5% to 10% overhead for accurate timestamps would definitely be noticed, so I guess we wouldn't enable that unconditionally, but could do it as an opt-in mount option if someone had a use case. Arnd --- /* measure times for high-resolution clocksource access from userspace */ #include #include #include #include #include static int do_clock_gettime(clockid_t clkid, struct timespec *tp, bool vdso) { if (vdso) return clock_gettime(clkid, tp); return syscall(__NR_clock_gettime, clkid, tp); } static int loop1sec(int clkid, bool vdso) { int i; struct timespec t, start; do_clock_gettime(clkid, &start, vdso); i = 0; do { do_clock_gettime(clkid, &t, vdso); i++; } while (t.tv_sec == start.tv_sec || t.tv_nsec < start.tv_nsec); return i; } int main(void) { printf("vdso hires: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME, true)); printf("vdso coarse: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME_COARSE, true)); printf("sysc hires: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME, false)); printf("sysc coarse: %7.2fns\n", 1000000000.0 / loop1sec(CLOCK_REALTIME_COARSE, false)); return 0; }