LKML Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH] time.c::timespec_trunc: fix nanosecond file time rounding
@ 2015-06-09 17:36 Karsten Blees
  2015-06-16 17:07 ` John Stultz
  0 siblings, 1 reply; 6+ messages in thread
From: Karsten Blees @ 2015-06-09 17:36 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner, John Stultz

From: Karsten Blees <blees@dcon.de>
Date: Tue, 9 Jun 2015 10:50:28 +0200

The rounding optimization in timespec_trunc() is based on the incorrect
assumptions that current_kernel_time() is rounded to jiffies resolution,
and that jiffies resolution is a multiple of all potential file time
granularities.

Thus, sub-second portions of in-core file times are not rounded to on-disk
granularity. I.e. file times may change when the inode is re-read from disk
or when the file system is remounted.

File systems with on-disk resolutions of exactly 1 ns or 1 s are not
affected by this.

Steps to reproduce with e.g. UDF:

  $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
  $ mkdir udf && mount udfdisk udf
  $ touch udf/test && stat -c %y udf/test
  2015-06-09 10:22:56.130006767 +0200
  $ umount udf && mount udfdisk udf
  $ stat -c %y udf/test
  2015-06-09 10:22:56.130006000 +0200

Remounting rounds the mtime to 1µs.

Fix the rounding in timespec_trunc() and update the documentation.

Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
as struct super_block.s_time_gran isn't prepared to handle different
ctime / mtime / atime resolutions nor resolutions > 1 second.

Signed-off-by: Karsten Blees <blees@dcon.de>
---

This issue came up in a recent discussion on the git ML about enabling
nanosecond file times on Windows, see

http://thread.gmane.org/gmane.comp.version-control.msysgit/21290/focus=21315


 kernel/time/time.c | 17 ++++-------------
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 972e3bb..362ee06 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -287,23 +287,14 @@ EXPORT_SYMBOL(jiffies_to_usecs);
  * @t: Timespec
  * @gran: Granularity in ns.
  *
- * Truncate a timespec to a granularity. gran must be smaller than a second.
- * Always rounds down.
- *
- * This function should be only used for timestamps returned by
- * current_kernel_time() or CURRENT_TIME, not with do_gettimeofday() because
- * it doesn't handle the better resolution of the latter.
+ * Truncate a timespec to a granularity. gran must not be greater than a
+ * second (10^9 ns). Always rounds down.
  */
 struct timespec timespec_trunc(struct timespec t, unsigned gran)
 {
-	/*
-	 * Division is pretty slow so avoid it for common cases.
-	 * Currently current_kernel_time() never returns better than
-	 * jiffies resolution. Exploit that.
-	 */
-	if (gran <= jiffies_to_usecs(1) * 1000) {
+	if (gran <= 1) {
 		/* nothing */
-	} else if (gran == 1000000000) {
+	} else if (gran >= 1000000000) {
 		t.tv_nsec = 0;
 	} else {
 		t.tv_nsec -= t.tv_nsec % gran;
-- 
2.0.0.791.g124e248


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] time.c::timespec_trunc: fix nanosecond file time rounding
  2015-06-09 17:36 [PATCH] time.c::timespec_trunc: fix nanosecond file time rounding Karsten Blees
@ 2015-06-16 17:07 ` John Stultz
  2015-06-16 22:39   ` Karsten Blees
  0 siblings, 1 reply; 6+ messages in thread
From: John Stultz @ 2015-06-16 17:07 UTC (permalink / raw)
  To: Karsten Blees; +Cc: lkml, Thomas Gleixner

On Tue, Jun 9, 2015 at 10:36 AM, Karsten Blees <karsten.blees@gmail.com> wrote:
> From: Karsten Blees <blees@dcon.de>
> Date: Tue, 9 Jun 2015 10:50:28 +0200
>
> The rounding optimization in timespec_trunc() is based on the incorrect
> assumptions that current_kernel_time() is rounded to jiffies resolution,
> and that jiffies resolution is a multiple of all potential file time
> granularities.

Sorry, this is a little opaque on the first read. You're saying that
there are filesystems where the on-disk granularity is smaller then a
tick/jiffy, but larger then a nanosecond, right?

> Thus, sub-second portions of in-core file times are not rounded to on-disk
> granularity. I.e. file times may change when the inode is re-read from disk
> or when the file system is remounted.
>
> File systems with on-disk resolutions of exactly 1 ns or 1 s are not
> affected by this.
>
> Steps to reproduce with e.g. UDF:
>
>   $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
>   $ mkdir udf && mount udfdisk udf
>   $ touch udf/test && stat -c %y udf/test
>   2015-06-09 10:22:56.130006767 +0200
>   $ umount udf && mount udfdisk udf
>   $ stat -c %y udf/test
>   2015-06-09 10:22:56.130006000 +0200
>
> Remounting rounds the mtime to 1µs.
>
> Fix the rounding in timespec_trunc() and update the documentation.
>
> Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
> as struct super_block.s_time_gran isn't prepared to handle different
> ctime / mtime / atime resolutions nor resolutions > 1 second.
>
> Signed-off-by: Karsten Blees <blees@dcon.de>
> ---
>
> This issue came up in a recent discussion on the git ML about enabling
> nanosecond file times on Windows, see
>
> http://thread.gmane.org/gmane.comp.version-control.msysgit/21290/focus=21315
>
>
>  kernel/time/time.c | 17 ++++-------------
>  1 file changed, 4 insertions(+), 13 deletions(-)
>
> diff --git a/kernel/time/time.c b/kernel/time/time.c
> index 972e3bb..362ee06 100644
> --- a/kernel/time/time.c
> +++ b/kernel/time/time.c
> @@ -287,23 +287,14 @@ EXPORT_SYMBOL(jiffies_to_usecs);
>   * @t: Timespec
>   * @gran: Granularity in ns.
>   *
> - * Truncate a timespec to a granularity. gran must be smaller than a second.
> - * Always rounds down.
> - *
> - * This function should be only used for timestamps returned by
> - * current_kernel_time() or CURRENT_TIME, not with do_gettimeofday() because
> - * it doesn't handle the better resolution of the latter.
> + * Truncate a timespec to a granularity. gran must not be greater than a
> + * second (10^9 ns). Always rounds down.
>   */
>  struct timespec timespec_trunc(struct timespec t, unsigned gran)
>  {
> -       /*
> -        * Division is pretty slow so avoid it for common cases.
> -        * Currently current_kernel_time() never returns better than
> -        * jiffies resolution. Exploit that.
> -        */
> -       if (gran <= jiffies_to_usecs(1) * 1000) {
> +       if (gran <= 1) {
>                 /* nothing */

So this change will in effect, cause us to truncate where granularity
was less then one tick, where before we didn't do anything. Have you
reviewed all users to ensure this is safe (I assume you have, but it
might be good to describe which users are affected in the commit
message)?


> -       } else if (gran == 1000000000) {
> +       } else if (gran >= 1000000000) {
>                 t.tv_nsec = 0;

While the code (which is quite old) wasn't super intuitive, this looks
to be making it more subtle instead of more clear. So if the
granularity is larger then a second, we just truncate to a second?
That seems surprising. If handling granularity larger then a second
isn't supported, we should probably make that explicit and add a
WARN_ON to catch problematic users of the function. Or we should
rework the logic to properly handle more coarse granularities (which
from your description it sounds like the FAT case needs?).

thanks
-john

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] time.c::timespec_trunc: fix nanosecond file time rounding
  2015-06-16 17:07 ` John Stultz
@ 2015-06-16 22:39   ` Karsten Blees
  2015-06-16 23:08     ` John Stultz
  0 siblings, 1 reply; 6+ messages in thread
From: Karsten Blees @ 2015-06-16 22:39 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Thomas Gleixner

Am 16.06.2015 um 19:07 schrieb John Stultz:
> On Tue, Jun 9, 2015 at 10:36 AM, Karsten Blees <karsten.blees@gmail.com> wrote:
>> From: Karsten Blees <blees@dcon.de>
>> Date: Tue, 9 Jun 2015 10:50:28 +0200
>>
>> The rounding optimization in timespec_trunc() is based on the incorrect
>> assumptions that current_kernel_time() is rounded to jiffies resolution,
>> and that jiffies resolution is a multiple of all potential file time
>> granularities.
> 
> Sorry, this is a little opaque on the first read. You're saying that
> there are filesystems where the on-disk granularity is smaller then a
> tick/jiffy, but larger then a nanosecond, right?
> 

Yes, examples include CIFS, NTFS (100 ns) and CEPH, UDF (1000 ns).

The current code assumes that rounding can be avoided if (gran <= ns_per_tick).

However, this optimization is only valid if:

1. current_kernel_time().tv_nsec is already rounded to tick resolution.
   E.g. with HZ=1000 you would get tv_nsec = 1000000, 2000000, 3000000, but
   never 1000001. AFAICT this is not true; current_kernel_time() may be
   incremented only once per tick, but its not rounded to tick resolution.

2. ns_per_tick is evenly divisible by gran, for all potential HZ and
   granularity values. IOW "(ns_per_tick % gran) == 0". This may have been
   true for HZ=100, 250, 1000, but not for HZ=300. E.g. if assumption 1
   above was true, HZ=300 would give you tv_nsec = 3333333, 6666666,
   9999999... This would definitely need to be rounded to e.g. UDF
   resolution, even though (1000 <= 3333333) is clearly true.

>> Thus, sub-second portions of in-core file times are not rounded to on-disk
>> granularity. I.e. file times may change when the inode is re-read from disk
>> or when the file system is remounted.
>>
>> File systems with on-disk resolutions of exactly 1 ns or 1 s are not
>> affected by this.
>>
>> Steps to reproduce with e.g. UDF:
>>
>>   $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
>>   $ mkdir udf && mount udfdisk udf
>>   $ touch udf/test && stat -c %y udf/test
>>   2015-06-09 10:22:56.130006767 +0200
>>   $ umount udf && mount udfdisk udf
>>   $ stat -c %y udf/test
>>   2015-06-09 10:22:56.130006000 +0200
>>
>> Remounting rounds the mtime to 1µs.
>>
>> Fix the rounding in timespec_trunc() and update the documentation.
>>
>> Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
>> as struct super_block.s_time_gran isn't prepared to handle different
>> ctime / mtime / atime resolutions nor resolutions > 1 second.
>>
>> Signed-off-by: Karsten Blees <blees@dcon.de>
>> ---
>>
>> This issue came up in a recent discussion on the git ML about enabling
>> nanosecond file times on Windows, see
>>
>> http://thread.gmane.org/gmane.comp.version-control.msysgit/21290/focus=21315
>>
>>
>>  kernel/time/time.c | 17 ++++-------------
>>  1 file changed, 4 insertions(+), 13 deletions(-)
>>
>> diff --git a/kernel/time/time.c b/kernel/time/time.c
>> index 972e3bb..362ee06 100644
>> --- a/kernel/time/time.c
>> +++ b/kernel/time/time.c
>> @@ -287,23 +287,14 @@ EXPORT_SYMBOL(jiffies_to_usecs);
>>   * @t: Timespec
>>   * @gran: Granularity in ns.
>>   *
>> - * Truncate a timespec to a granularity. gran must be smaller than a second.
>> - * Always rounds down.
>> - *
>> - * This function should be only used for timestamps returned by
>> - * current_kernel_time() or CURRENT_TIME, not with do_gettimeofday() because
>> - * it doesn't handle the better resolution of the latter.
>> + * Truncate a timespec to a granularity. gran must not be greater than a
>> + * second (10^9 ns). Always rounds down.
>>   */
>>  struct timespec timespec_trunc(struct timespec t, unsigned gran)
>>  {
>> -       /*
>> -        * Division is pretty slow so avoid it for common cases.
>> -        * Currently current_kernel_time() never returns better than
>> -        * jiffies resolution. Exploit that.
>> -        */
>> -       if (gran <= jiffies_to_usecs(1) * 1000) {
>> +       if (gran <= 1) {
>>                 /* nothing */
> 
> So this change will in effect, cause us to truncate where granularity
> was less then one tick, where before we didn't do anything. Have you
> reviewed all users to ensure this is safe (I assume you have, but it
> might be good to describe which users are affected in the commit
> message)?
> 
> 

timespec_trunc() is exclusively used to calculate inode's [acm]time.
It is mostly called through current_fs_time(), only a handful of fs
drivers use it directly (but always with super_block.s_time_gran as
second argument).

So I think changing the function to do what the documentation says it
does should be safe...

>> -       } else if (gran == 1000000000) {
>> +       } else if (gran >= 1000000000) {
>>                 t.tv_nsec = 0;
> 
> While the code (which is quite old) wasn't super intuitive, this looks
> to be making it more subtle instead of more clear. So if the
> granularity is larger then a second, we just truncate to a second?
> That seems surprising. If handling granularity larger then a second
> isn't supported, we should probably make that explicit and add a
> WARN_ON to catch problematic users of the function.

Indeed, I changed this to catch invalid arguments (similar to how
"gran <= 1" catches 0 and thus prevents division by zero).

What about this instead?

	if (gran == 1) {
		/* nothing */
	} else if (gran == 1000000000) {
		t.tv_nsec = 0;
	} else if (gran < 1 || gran > 1000000000) {
		WARN_ON(1);
	} else {
		t.tv_nsec -= t.tv_nsec % gran;
	}
	return t;

I.e. only the few file systems that need rounding are affected by
extra comparisons.

> Or we should
> rework the logic to properly handle more coarse granularities (which
> from your description it sounds like the FAT case needs?).

AFAIK FAT is the only file system with such coarse granularities
(1 day for atime, 2s for mtime, 10ms for create time, there is no ctime
aka "change time" field). I think this is such a special case that it
should better be handled in the FAT driver.

Thanks,
Karsten

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] time.c::timespec_trunc: fix nanosecond file time rounding
  2015-06-16 22:39   ` Karsten Blees
@ 2015-06-16 23:08     ` John Stultz
  2015-06-25 12:13       ` [PATCH v2] " Karsten Blees
  0 siblings, 1 reply; 6+ messages in thread
From: John Stultz @ 2015-06-16 23:08 UTC (permalink / raw)
  To: Karsten Blees; +Cc: lkml, Thomas Gleixner

On Tue, Jun 16, 2015 at 3:39 PM, Karsten Blees <karsten.blees@gmail.com> wrote:
> Am 16.06.2015 um 19:07 schrieb John Stultz:
>> On Tue, Jun 9, 2015 at 10:36 AM, Karsten Blees <karsten.blees@gmail.com> wrote:
>>> From: Karsten Blees <blees@dcon.de>
>>> Date: Tue, 9 Jun 2015 10:50:28 +0200
>>>
>>> The rounding optimization in timespec_trunc() is based on the incorrect
>>> assumptions that current_kernel_time() is rounded to jiffies resolution,
>>> and that jiffies resolution is a multiple of all potential file time
>>> granularities.
>>
>> Sorry, this is a little opaque on the first read. You're saying that
>> there are filesystems where the on-disk granularity is smaller then a
>> tick/jiffy, but larger then a nanosecond, right?
>>
>
> Yes, examples include CIFS, NTFS (100 ns) and CEPH, UDF (1000 ns).

Thanks. Adding these concrete examples to the commit message would be good.


> The current code assumes that rounding can be avoided if (gran <= ns_per_tick).
>
> However, this optimization is only valid if:
>
> 1. current_kernel_time().tv_nsec is already rounded to tick resolution.
>    E.g. with HZ=1000 you would get tv_nsec = 1000000, 2000000, 3000000, but
>    never 1000001. AFAICT this is not true; current_kernel_time() may be
>    incremented only once per tick, but its not rounded to tick resolution.
>
> 2. ns_per_tick is evenly divisible by gran, for all potential HZ and
>    granularity values. IOW "(ns_per_tick % gran) == 0". This may have been
>    true for HZ=100, 250, 1000, but not for HZ=300. E.g. if assumption 1
>    above was true, HZ=300 would give you tv_nsec = 3333333, 6666666,
>    9999999... This would definitely need to be rounded to e.g. UDF
>    resolution, even though (1000 <= 3333333) is clearly true.
>
>>> Thus, sub-second portions of in-core file times are not rounded to on-disk
>>> granularity. I.e. file times may change when the inode is re-read from disk
>>> or when the file system is remounted.
>>>
>>> File systems with on-disk resolutions of exactly 1 ns or 1 s are not
>>> affected by this.
>>>
>>> Steps to reproduce with e.g. UDF:
>>>
>>>   $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
>>>   $ mkdir udf && mount udfdisk udf
>>>   $ touch udf/test && stat -c %y udf/test
>>>   2015-06-09 10:22:56.130006767 +0200
>>>   $ umount udf && mount udfdisk udf
>>>   $ stat -c %y udf/test
>>>   2015-06-09 10:22:56.130006000 +0200
>>>
>>> Remounting rounds the mtime to 1µs.
>>>
>>> Fix the rounding in timespec_trunc() and update the documentation.
>>>
>>> Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
>>> as struct super_block.s_time_gran isn't prepared to handle different
>>> ctime / mtime / atime resolutions nor resolutions > 1 second.
>>>
>>> Signed-off-by: Karsten Blees <blees@dcon.de>
>>> ---
>>>
>>> This issue came up in a recent discussion on the git ML about enabling
>>> nanosecond file times on Windows, see
>>>
>>> http://thread.gmane.org/gmane.comp.version-control.msysgit/21290/focus=21315
>>>
>>>
>>>  kernel/time/time.c | 17 ++++-------------
>>>  1 file changed, 4 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/kernel/time/time.c b/kernel/time/time.c
>>> index 972e3bb..362ee06 100644
>>> --- a/kernel/time/time.c
>>> +++ b/kernel/time/time.c
>>> @@ -287,23 +287,14 @@ EXPORT_SYMBOL(jiffies_to_usecs);
>>>   * @t: Timespec
>>>   * @gran: Granularity in ns.
>>>   *
>>> - * Truncate a timespec to a granularity. gran must be smaller than a second.
>>> - * Always rounds down.
>>> - *
>>> - * This function should be only used for timestamps returned by
>>> - * current_kernel_time() or CURRENT_TIME, not with do_gettimeofday() because
>>> - * it doesn't handle the better resolution of the latter.
>>> + * Truncate a timespec to a granularity. gran must not be greater than a
>>> + * second (10^9 ns). Always rounds down.
>>>   */
>>>  struct timespec timespec_trunc(struct timespec t, unsigned gran)
>>>  {
>>> -       /*
>>> -        * Division is pretty slow so avoid it for common cases.
>>> -        * Currently current_kernel_time() never returns better than
>>> -        * jiffies resolution. Exploit that.
>>> -        */
>>> -       if (gran <= jiffies_to_usecs(1) * 1000) {
>>> +       if (gran <= 1) {
>>>                 /* nothing */
>>
>> So this change will in effect, cause us to truncate where granularity
>> was less then one tick, where before we didn't do anything. Have you
>> reviewed all users to ensure this is safe (I assume you have, but it
>> might be good to describe which users are affected in the commit
>> message)?
>>
>>
>
> timespec_trunc() is exclusively used to calculate inode's [acm]time.
> It is mostly called through current_fs_time(), only a handful of fs
> drivers use it directly (but always with super_block.s_time_gran as
> second argument).
>
> So I think changing the function to do what the documentation says it
> does should be safe...

Yea, though existing behavior is often more "expected" then documented
behavior. :)


>
>>> -       } else if (gran == 1000000000) {
>>> +       } else if (gran >= 1000000000) {
>>>                 t.tv_nsec = 0;
>>
>> While the code (which is quite old) wasn't super intuitive, this looks
>> to be making it more subtle instead of more clear. So if the
>> granularity is larger then a second, we just truncate to a second?
>> That seems surprising. If handling granularity larger then a second
>> isn't supported, we should probably make that explicit and add a
>> WARN_ON to catch problematic users of the function.
>
> Indeed, I changed this to catch invalid arguments (similar to how
> "gran <= 1" catches 0 and thus prevents division by zero).
>
> What about this instead?
>
>         if (gran == 1) {
>                 /* nothing */
>         } else if (gran == 1000000000) {
>                 t.tv_nsec = 0;
>         } else if (gran < 1 || gran > 1000000000) {
>                 WARN_ON(1);
>         } else {
>                 t.tv_nsec -= t.tv_nsec % gran;
>         }
>         return t;

Logically its ok. I might suggest cleaning it up as:

if ((gran < 1) || (gran > NSEC_PER_SEC))
   WARN_ON(1);  /* catch invalid granularity values  */
else if (gran == NSEC_PER_SEC)
   t.tv_nsec = 0; /* special case to avoid div */
else if ((gran > 1) && ( gran < NSEC_PER_SEC))
     t.tv_nsec -= t.tv_nsec % gran;
return t;

Also it would be good to make it clear in the function comment that
gran > NSEC_PER_SEC are invalid.

thanks
-john

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH v2] time.c::timespec_trunc: fix nanosecond file time rounding
  2015-06-16 23:08     ` John Stultz
@ 2015-06-25 12:13       ` " Karsten Blees
  2015-07-01 18:07         ` John Stultz
  0 siblings, 1 reply; 6+ messages in thread
From: Karsten Blees @ 2015-06-25 12:13 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Thomas Gleixner

timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
(or TICK_NSEC). This optimization assumes that:

 1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
    with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
    This is no longer true (probably since hrtimers introduced in 2.6.16).

 2. TICK_NSEC is evenly divisible by all possible granularities. This may
    be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
    TICK_NSEC=3333333 (introduced in 2.6.20).

Thus, sub-second portions of in-core file times are not rounded to on-disk
granularity. I.e. file times may change when the inode is re-read from disk
or when the file system is remounted.

This affects all file systems with file time granularities > 1 ns and < 1s,
e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
(configurable from user mode via struct fuse_init_out.time_gran).

Steps to reproduce with e.g. UDF:

  $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
  $ mkdir udf && mount udfdisk udf
  $ touch udf/test && stat -c %y udf/test
  2015-06-09 10:22:56.130006767 +0200
  $ umount udf && mount udfdisk udf
  $ stat -c %y udf/test
  2015-06-09 10:22:56.130006000 +0200

Remounting truncates the mtime to 1 µs.

Fix the rounding in timespec_trunc() and update the documentation.

timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
via current_fs_time()), and always with super_block.s_time_gran as second
argument. So this can safely be changed without side effects.

Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
as super_block.s_time_gran isn't prepared to handle different ctime /
mtime / atime resolutions nor resolutions > 1 second.

Signed-off-by: Karsten Blees <blees@dcon.de>
---

Am 17.06.2015 um 01:08 schrieb John Stultz:
> 
> Logically its ok. I might suggest cleaning it up as:
> 
> if ((gran < 1) || (gran > NSEC_PER_SEC))
>    WARN_ON(1);  /* catch invalid granularity values  */
> else if (gran == NSEC_PER_SEC)
>    t.tv_nsec = 0; /* special case to avoid div */
> else if ((gran > 1) && ( gran < NSEC_PER_SEC))
>      t.tv_nsec -= t.tv_nsec % gran;
> return t;
> 
> Also it would be good to make it clear in the function comment that
> gran > NSEC_PER_SEC are invalid.
> 
> thanks
> -john
> 

I chose to stick to testing the most common cases first (1 ns and 1 s).
I don't think GCC would be smart enough to reoder the comparisons based
on 'unlikely()' in the WARN macro...

Thanks,
Karsten

 kernel/time/time.c | 22 ++++++++--------------
 1 file changed, 8 insertions(+), 14 deletions(-)

diff --git a/kernel/time/time.c b/kernel/time/time.c
index 972e3bb..5733922 100644
--- a/kernel/time/time.c
+++ b/kernel/time/time.c
@@ -287,26 +287,20 @@ EXPORT_SYMBOL(jiffies_to_usecs);
  * @t: Timespec
  * @gran: Granularity in ns.
  *
- * Truncate a timespec to a granularity. gran must be smaller than a second.
- * Always rounds down.
- *
- * This function should be only used for timestamps returned by
- * current_kernel_time() or CURRENT_TIME, not with do_gettimeofday() because
- * it doesn't handle the better resolution of the latter.
+ * Truncate a timespec to a granularity. Always rounds down. gran must
+ * not be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns).
  */
 struct timespec timespec_trunc(struct timespec t, unsigned gran)
 {
-	/*
-	 * Division is pretty slow so avoid it for common cases.
-	 * Currently current_kernel_time() never returns better than
-	 * jiffies resolution. Exploit that.
-	 */
-	if (gran <= jiffies_to_usecs(1) * 1000) {
+	/* Avoid division in the common cases 1 ns and 1 s. */
+	if (gran == 1) {
 		/* nothing */
-	} else if (gran == 1000000000) {
+	} else if (gran == NSEC_PER_SEC) {
 		t.tv_nsec = 0;
-	} else {
+	} else if (gran > 1 && gran < NSEC_PER_SEC) {
 		t.tv_nsec -= t.tv_nsec % gran;
+	} else {
+		WARN(1, "illegal file time granularity: %u", gran);
 	}
 	return t;
 }
-- 
2.0.0.791.g124e248


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] time.c::timespec_trunc: fix nanosecond file time rounding
  2015-06-25 12:13       ` [PATCH v2] " Karsten Blees
@ 2015-07-01 18:07         ` John Stultz
  0 siblings, 0 replies; 6+ messages in thread
From: John Stultz @ 2015-07-01 18:07 UTC (permalink / raw)
  To: Karsten Blees; +Cc: lkml, Thomas Gleixner

On Thu, Jun 25, 2015 at 5:13 AM, Karsten Blees <karsten.blees@gmail.com> wrote:
> timespec_trunc() avoids rounding if granularity <= nanoseconds-per-jiffie
> (or TICK_NSEC). This optimization assumes that:
>
>  1. current_kernel_time().tv_nsec is already rounded to TICK_NSEC (i.e.
>     with HZ=1000 you'd get 1000000, 2000000, 3000000... but never 1000001).
>     This is no longer true (probably since hrtimers introduced in 2.6.16).
>
>  2. TICK_NSEC is evenly divisible by all possible granularities. This may
>     be true for HZ=100, 250, 1000, but obviously not for HZ=300 /
>     TICK_NSEC=3333333 (introduced in 2.6.20).
>
> Thus, sub-second portions of in-core file times are not rounded to on-disk
> granularity. I.e. file times may change when the inode is re-read from disk
> or when the file system is remounted.
>
> This affects all file systems with file time granularities > 1 ns and < 1s,
> e.g. CEPH (1000 ns), UDF (1000 ns), CIFS (100 ns), NTFS (100 ns) and FUSE
> (configurable from user mode via struct fuse_init_out.time_gran).
>
> Steps to reproduce with e.g. UDF:
>
>   $ dd if=/dev/zero of=udfdisk count=10000 && mkudffs udfdisk
>   $ mkdir udf && mount udfdisk udf
>   $ touch udf/test && stat -c %y udf/test
>   2015-06-09 10:22:56.130006767 +0200
>   $ umount udf && mount udfdisk udf
>   $ stat -c %y udf/test
>   2015-06-09 10:22:56.130006000 +0200
>
> Remounting truncates the mtime to 1 µs.
>
> Fix the rounding in timespec_trunc() and update the documentation.
>
> timespec_trunc() is exclusively used to calculate inode's [acm]time (mostly
> via current_fs_time()), and always with super_block.s_time_gran as second
> argument. So this can safely be changed without side effects.
>
> Note: This does _not_ fix the issue for FAT's 2 second mtime resolution,
> as super_block.s_time_gran isn't prepared to handle different ctime /
> mtime / atime resolutions nor resolutions > 1 second.
>
> Signed-off-by: Karsten Blees <blees@dcon.de>


Ok. Looks good. I'm queuing it for testing, targeting the 4.3 merge window.

thanks
-john

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, back to index

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-09 17:36 [PATCH] time.c::timespec_trunc: fix nanosecond file time rounding Karsten Blees
2015-06-16 17:07 ` John Stultz
2015-06-16 22:39   ` Karsten Blees
2015-06-16 23:08     ` John Stultz
2015-06-25 12:13       ` [PATCH v2] " Karsten Blees
2015-07-01 18:07         ` John Stultz

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git