linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Finegrained a/c/mtime was Re: Directory notification problem
       [not found] <Pine.LNX.4.33.0110022206100.29931-100000@devserv.devel.redhat.com.suse.lists.linux.kernel>
@ 2001-10-03  7:53 ` Andi Kleen
  2001-10-03  8:06   ` Ulrich Drepper
  2001-10-03 17:45   ` Bernd Eckenfels
  0 siblings, 2 replies; 24+ messages in thread
From: Andi Kleen @ 2001-10-03  7:53 UTC (permalink / raw)
  To: Alex Larsson; +Cc: linux-kernel

Alex Larsson <alexl@redhat.com> writes:

> I discovered a problem with the dnotify API while fixing a FAM bug today.
> 
> The problem occurs when you want to watch a file in a directory, and that 
> file is changed several times in the same second. When I get the directory 
> notify signal on the directory I need to stat the file to see if the 
> change was actually in the file. If the file already changed in the 
> current second the stat() result will be identical to the previous stat() 
> call, since the resolution of mtime and ctime is one second. 
> 
> This leads to missed notifications, leaving clients (such as Nautilus or 
> Konqueror) displaying an state not representing the current state.
> 
> The only userspace solutions I see is to delay all change notifications to 
> the end of the second, so that clients always read the correct state. This 
> is somewhat countrary to the idea of FAM though, as it does not give 
> instant feedback.
> 
> Is there any possibility of extending struct stat with a generation 
> counter? Or is there another solution to this problem?

make has similar problems with parallel builds on bigger multiprocessor
machines. Solaris7 has fixed this problem with adding new stat fields
to state that contains the ms for mtime/atime/ctime. There are even 
already filesystems on linux that support fine grained timestamps 
on linux, e.g. XFS has it as ns on disk. The problem is that VFS doesn't
support it currently, so it sets the ns parts always to zero. To fix 
it for m/c/atime requires new system calls for utime and stat64.
For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64
structure reserved an additional 4 bytes for every timestamp, but these
either need to be used to give more seconds for the year 2038 problem
or be used for the ms fractions. y2038 is somewhat important too.

[In theory the existing additional bytes could be used for both on a
big endian host if you manage to define a numeric 48byte type in gcc
and be satisfied with 16bit ms resolution, but such a hack would
probably cause problems e.g. with other compilers. It would be
possible on Little Endian too, but only for mtime and ctime, as there
is no unused field in front of st_atime.  Overall I think a new stat
call is better. The ugly thing is just that the glibc ABI needs
updating too]

Solving it properly is a 2.5 thing.

-Andi



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03  7:53 ` Finegrained a/c/mtime was Re: Directory notification problem Andi Kleen
@ 2001-10-03  8:06   ` Ulrich Drepper
  2001-10-03 13:35     ` Eric W. Biederman
  2001-10-03 15:15     ` Alex Larsson
  2001-10-03 17:45   ` Bernd Eckenfels
  1 sibling, 2 replies; 24+ messages in thread
From: Ulrich Drepper @ 2001-10-03  8:06 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alex Larsson, linux-kernel

Andi Kleen <ak@suse.de> writes:

> For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64

Not only stat64, also plain stat.

> structure reserved an additional 4 bytes for every timestamp, but these
> either need to be used to give more seconds for the year 2038 problem
> or be used for the ms fractions. y2038 is somewhat important too.

The fields are meant for nanoseconds.  The y2038 will definitely be
solved by time-shifting or making time_t unsigned.  In any way nothing
of importance here and now.  Especially since there won't be many
systems which are running today and which have a 32-bit time_t be used
then.  For the rest I'm sure that in 37 years there will be the one or
the other ABI change.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03  8:06   ` Ulrich Drepper
@ 2001-10-03 13:35     ` Eric W. Biederman
  2001-10-03 14:11       ` Netfilter problem Kirill Ratkin
  2001-10-03 15:24       ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack
  2001-10-03 15:15     ` Alex Larsson
  1 sibling, 2 replies; 24+ messages in thread
From: Eric W. Biederman @ 2001-10-03 13:35 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Andi Kleen, Alex Larsson, linux-kernel

Ulrich Drepper <drepper@redhat.com> writes:

> Andi Kleen <ak@suse.de> writes:
> 
> > For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64
> 
> Not only stat64, also plain stat.
> 
> > structure reserved an additional 4 bytes for every timestamp, but these
> > either need to be used to give more seconds for the year 2038 problem
> > or be used for the ms fractions. y2038 is somewhat important too.
> 
> The fields are meant for nanoseconds.  The y2038 will definitely be
> solved by time-shifting or making time_t unsigned.  In any way nothing
> of importance here and now.  Especially since there won't be many
> systems which are running today and which have a 32-bit time_t be used
> then.  For the rest I'm sure that in 37 years there will be the one or
> the other ABI change.

Right.  Given current uptimes and being optimistic the fix for y2038 
is probably needed by 2030 or just a little later.  But in any case
64 bit systems should be maxing out by then, and the conversion to 128
bit systems should have already happened on the server side.  32 bit
systems will likely be limited to embedded and legacy systems by then.

Eric


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Netfilter problem
  2001-10-03 13:35     ` Eric W. Biederman
@ 2001-10-03 14:11       ` Kirill Ratkin
  2001-10-03 21:42         ` Luigi Genoni
  2001-10-03 15:24       ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack
  1 sibling, 1 reply; 24+ messages in thread
From: Kirill Ratkin @ 2001-10-03 14:11 UTC (permalink / raw)
  To: linux-kernel

Hi.

I've a strange error when I try to check protocol type
in netfilter hook function. 

I see this message:
kping.c: In function `knet_hook':
kping.c:116: dereferencing pointer to incomplete type
make: *** [kping.o] Error 1

This is part of my code:
static
unsigned int knet_hook(unsigned int hooknum,
                      struct sk_buff** p_skb,
                      const struct net_device* p_in,
                      const struct net_device* p_out,
                      int (*okfn)(struct sk_buff* ))
{
  ...
  if((*p_skb)->nh.iph->protocol==
	(unsigned char)IPPROTO_ICMP)
  {
    printk("<1>ICMP Packet killed\n");
    return NF_DROP;
  }
  ...
}

It had compiled on 2.4.1 version.

I don't understand why ... .


__________________________________________________
Do You Yahoo!?
Listen to your Yahoo! Mail messages from any phone.
http://phone.yahoo.com

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03  8:06   ` Ulrich Drepper
  2001-10-03 13:35     ` Eric W. Biederman
@ 2001-10-03 15:15     ` Alex Larsson
  2001-10-03 21:26       ` Andi Kleen
  1 sibling, 1 reply; 24+ messages in thread
From: Alex Larsson @ 2001-10-03 15:15 UTC (permalink / raw)
  To: Ulrich Drepper; +Cc: Andi Kleen, linux-kernel

On 3 Oct 2001, Ulrich Drepper wrote:

> Andi Kleen <ak@suse.de> writes:
> 
> > structure reserved an additional 4 bytes for every timestamp, but these
> > either need to be used to give more seconds for the year 2038 problem
> > or be used for the ms fractions. y2038 is somewhat important too.
> 
> The fields are meant for nanoseconds.  The y2038 will definitely be
> solved by time-shifting or making time_t unsigned.  In any way nothing
> of importance here and now.  Especially since there won't be many
> systems which are running today and which have a 32-bit time_t be used
> then.  For the rest I'm sure that in 37 years there will be the one or
> the other ABI change.

Is a nanoseconds field the right choice though? In reality you might not 
have a nanosecond resolution timer, so you would miss changes that appear
on shorter timescale than the timer resolution. Wouldn't a generation 
counter, increased when ctime was updated, be a better solution?

/ Alex



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03 13:35     ` Eric W. Biederman
  2001-10-03 14:11       ` Netfilter problem Kirill Ratkin
@ 2001-10-03 15:24       ` Gerhard Mack
  2001-10-16 18:56         ` Riley Williams
  1 sibling, 1 reply; 24+ messages in thread
From: Gerhard Mack @ 2001-10-03 15:24 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: Ulrich Drepper, Andi Kleen, Alex Larsson, linux-kernel

On 3 Oct 2001, Eric W. Biederman wrote:

> Ulrich Drepper <drepper@redhat.com> writes:
> 
> > Andi Kleen <ak@suse.de> writes:
> > 
> > > For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64
> > 
> > Not only stat64, also plain stat.
> > 
> > > structure reserved an additional 4 bytes for every timestamp, but these
> > > either need to be used to give more seconds for the year 2038 problem
> > > or be used for the ms fractions. y2038 is somewhat important too.
> > 
> > The fields are meant for nanoseconds.  The y2038 will definitely be
> > solved by time-shifting or making time_t unsigned.  In any way nothing
> > of importance here and now.  Especially since there won't be many
> > systems which are running today and which have a 32-bit time_t be used
> > then.  For the rest I'm sure that in 37 years there will be the one or
> > the other ABI change.
> 
> Right.  Given current uptimes and being optimistic the fix for y2038 
> is probably needed by 2030 or just a little later.  But in any case
> 64 bit systems should be maxing out by then, and the conversion to 128
> bit systems should have already happened on the server side.  32 bit
> systems will likely be limited to embedded and legacy systems by then.
> 
> Eric

Why do I get the feeling no one has learned from the problems the computer
industry had with 2 digit date fields?

Odds are legacy systems will be running something people for whatever
reason couldn't replace.


	Gerhard

--
Gerhard Mack

gmack@innerfire.net

<>< As a computer I find your faith in technology amusing.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03  7:53 ` Finegrained a/c/mtime was Re: Directory notification problem Andi Kleen
  2001-10-03  8:06   ` Ulrich Drepper
@ 2001-10-03 17:45   ` Bernd Eckenfels
  2001-10-13 15:24     ` Jamie Lokier
  1 sibling, 1 reply; 24+ messages in thread
From: Bernd Eckenfels @ 2001-10-03 17:45 UTC (permalink / raw)
  To: linux-kernel

> Alex Larsson <alexl@redhat.com> writes:
>> I discovered a problem with the dnotify API while fixing a FAM bug today.
>> 
>> The problem occurs when you want to watch a file in a directory, and that 
>> file is changed several times in the same second. When I get the directory 
>> notify signal on the directory I need to stat the file to see if the 
>> change was actually in the file. If the file already changed in the 
>> current second the stat() result will be identical to the previous stat() 
>> call, since the resolution of mtime and ctime is one second. 

If you simply check the mtime and the file size you have the two most
relevant parts. If neighter of those changes this means that programs using
the dnotify api most likely do not need to act. After all it is not an
auditing facility but a notifier for things like reload of directory
listings. The only thing I could imagine can cause problems is a self
reloading config file. But in that case dnotify is overkill anyway and a 1
sec delay could be asumed to be reasonable.

Greetigs
Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03 15:15     ` Alex Larsson
@ 2001-10-03 21:26       ` Andi Kleen
  2001-10-05 12:44         ` Padraig Brady
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2001-10-03 21:26 UTC (permalink / raw)
  To: Alex Larsson; +Cc: Ulrich Drepper, Andi Kleen, linux-kernel

On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote:
> Is a nanoseconds field the right choice though? In reality you might not 
> have a nanosecond resolution timer, so you would miss changes that appear
> on shorter timescale than the timer resolution. Wouldn't a generation 
> counter, increased when ctime was updated, be a better solution?

Near any CPU has a cycle counter builtin now, which gives you ns like
resolution. In theory you could still get collisions on MP systems, 
but window is small enough that it can be ignored in practice.

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Netfilter problem
  2001-10-03 14:11       ` Netfilter problem Kirill Ratkin
@ 2001-10-03 21:42         ` Luigi Genoni
  0 siblings, 0 replies; 24+ messages in thread
From: Luigi Genoni @ 2001-10-03 21:42 UTC (permalink / raw)
  To: Kirill Ratkin; +Cc: linux-kernel

strange!
it compiled correctly fr mw with all 2.4 kernels, with gcc 2.95.3, and gcc
3.0.0/1

Luigi


On Wed, 3 Oct 2001, Kirill Ratkin wrote:

> Hi.
>
> I've a strange error when I try to check protocol type
> in netfilter hook function.
>
> I see this message:
> kping.c: In function `knet_hook':
> kping.c:116: dereferencing pointer to incomplete type
> make: *** [kping.o] Error 1
>
> This is part of my code:
> static
> unsigned int knet_hook(unsigned int hooknum,
>                       struct sk_buff** p_skb,
>                       const struct net_device* p_in,
>                       const struct net_device* p_out,
>                       int (*okfn)(struct sk_buff* ))
> {
>   ...
>   if((*p_skb)->nh.iph->protocol==
> 	(unsigned char)IPPROTO_ICMP)
>   {
>     printk("<1>ICMP Packet killed\n");
>     return NF_DROP;
>   }
>   ...
> }
>
> It had compiled on 2.4.1 version.
>
> I don't understand why ... .
>
>
> __________________________________________________
> Do You Yahoo!?
> Listen to your Yahoo! Mail messages from any phone.
> http://phone.yahoo.com
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03 21:26       ` Andi Kleen
@ 2001-10-05 12:44         ` Padraig Brady
  2001-10-05 12:59           ` Andrew Pimlott
  2001-10-05 13:01           ` Andi Kleen
  0 siblings, 2 replies; 24+ messages in thread
From: Padraig Brady @ 2001-10-05 12:44 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel

Andi Kleen wrote:

>On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote:
>
>>Is a nanoseconds field the right choice though? In reality you might not 
>>have a nanosecond resolution timer, so you would miss changes that appear
>>on shorter timescale than the timer resolution. Wouldn't a generation 
>>counter, increased when ctime was updated, be a better solution?
>>
>
>Near any CPU has a cycle counter builtin now, which gives you ns like
>resolution. In theory you could still get collisions on MP systems, 
>but window is small enough that it can be ignored in practice.
>
>-Andi
>
But the point is you, only ever would want nano second resolution to make
sure you notice all changes to a file. A more general (and much simpler)
solution would be to gen_count++ every time a file's modified. What other
applications would require better than second resolution on files?

Padraig.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 12:44         ` Padraig Brady
@ 2001-10-05 12:59           ` Andrew Pimlott
  2001-10-05 13:01           ` Andi Kleen
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Pimlott @ 2001-10-05 12:59 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel

On Fri, Oct 05, 2001 at 01:44:20PM +0100, Padraig Brady wrote:
> But the point is you, only ever would want nano second resolution to make
> sure you notice all changes to a file. A more general (and much simpler)
> solution would be to gen_count++ every time a file's modified. What other
> applications would require better than second resolution on files?

Correlating file timestamps with an event log.  Comparing timestamps
on different files (make).  Real time is _much_ more useful (not to
mention convenient) than a generation count; and given that we've
survived with second resolution so far, I think the hypothetical
collisions on a nanosecond scale are ignorable.

Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 12:44         ` Padraig Brady
  2001-10-05 12:59           ` Andrew Pimlott
@ 2001-10-05 13:01           ` Andi Kleen
  2001-10-05 13:15             ` Padraig Brady
  1 sibling, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2001-10-05 13:01 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel

On Fri, Oct 05, 2001 at 01:44:20PM +0100, Padraig Brady wrote:
> Andi Kleen wrote:
> 
> >On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote:
> >
> >>Is a nanoseconds field the right choice though? In reality you might not 
> >>have a nanosecond resolution timer, so you would miss changes that appear
> >>on shorter timescale than the timer resolution. Wouldn't a generation 
> >>counter, increased when ctime was updated, be a better solution?
> >>
> >
> >Near any CPU has a cycle counter builtin now, which gives you ns like
> >resolution. In theory you could still get collisions on MP systems, 
> >but window is small enough that it can be ignored in practice.
> >
> >-Andi
> >
> But the point is you, only ever would want nano second resolution to make
> sure you notice all changes to a file. A more general (and much simpler)
> solution would be to gen_count++ every time a file's modified. What other
> applications would require better than second resolution on files?

The main advantage of using a real timestamp instead of a generation
counter is that we would be compatible to Unixware/Solaris/... Their
API is fine, so I see no advantage in inventing a new incompatible one.

Another advantage of using the real time instead of a counter is that 
you can easily merge the both values into a single 64bit value and do
arithmetic on it in user space. With a generation counter you would need 
to work with number pairs, which is much more complex. 
[or alternatively reset the generation counter every second in the kernel
to get a flat time range again, 
which would be racy and ugly and complicated in the kernel because it 
would need additional timestamps] 

Also a rdtsc/get_timestamp or in the worst case a jiffie read is really
not complex to code in kernel, what makes you think it is? 

-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 13:01           ` Andi Kleen
@ 2001-10-05 13:15             ` Padraig Brady
  2001-10-05 14:38               ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Padraig Brady @ 2001-10-05 13:15 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel

Andi Kleen wrote:

>On Fri, Oct 05, 2001 at 01:44:20PM +0100, Padraig Brady wrote:
>
>>Andi Kleen wrote:
>>
>>>On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote:
>>>
>>>>Is a nanoseconds field the right choice though? In reality you might not 
>>>>have a nanosecond resolution timer, so you would miss changes that appear
>>>>on shorter timescale than the timer resolution. Wouldn't a generation 
>>>>counter, increased when ctime was updated, be a better solution?
>>>>
>>>Near any CPU has a cycle counter builtin now, which gives you ns like
>>>resolution. In theory you could still get collisions on MP systems, 
>>>but window is small enough that it can be ignored in practice.
>>>
>>>-Andi
>>>
>>But the point is you, only ever would want nano second resolution to make
>>sure you notice all changes to a file. A more general (and much simpler)
>>solution would be to gen_count++ every time a file's modified. What other
>>applications would require better than second resolution on files?
>>
>
>The main advantage of using a real timestamp instead of a generation
>counter is that we would be compatible to Unixware/Solaris/... Their
>API is fine, so I see no advantage in inventing a new incompatible one.
>
Even so I can't see a need to have this resolution for mtime, and as you 
pointed
out there can still be races on SMP systems and timing resolutions are 
system
dependent anyway.

>
>Another advantage of using the real time instead of a counter is that 
>you can easily merge the both values into a single 64bit value and do
>arithmetic on it in user space. With a generation counter you would need 
>to work with number pairs, which is much more complex. 
>
??
if (file->mtime != mtime || file->gen_count != gen_count)
     file_changed=1;

>
>[or alternatively reset the generation counter every second in the kernel
>to get a flat time range again, 
>which would be racy and ugly and complicated in the kernel because it 
>would need additional timestamps] 
>
No need as long as it doesn't wrap within the mtime resolution (1 second).

>
>Also a rdtsc/get_timestamp or in the worst case a jiffie read is really
>not complex to code in kernel, what makes you think it is? 
>
Sorry, by more complex I meant more instructions/CPU expensive.

>
>
>-Andi
>
Padraig.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 13:15             ` Padraig Brady
@ 2001-10-05 14:38               ` Andi Kleen
  2001-10-05 15:00                 ` Padraig Brady
  2001-10-05 20:22                 ` Bernd Eckenfels
  0 siblings, 2 replies; 24+ messages in thread
From: Andi Kleen @ 2001-10-05 14:38 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel

> >Another advantage of using the real time instead of a counter is that 
> >you can easily merge the both values into a single 64bit value and do
> >arithmetic on it in user space. With a generation counter you would need 
> >to work with number pairs, which is much more complex. 
> >
> ??
> if (file->mtime != mtime || file->gen_count != gen_count)
>      file_changed=1;

And how would you implement "newer than" and "older than" with a generation
count that doesn't reset in a always fixed time interval (=requiring
additional timestamps in kernel)?  

-Andi


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 14:38               ` Andi Kleen
@ 2001-10-05 15:00                 ` Padraig Brady
  2001-10-05 19:12                   ` Andi Kleen
  2001-10-05 20:22                 ` Bernd Eckenfels
  1 sibling, 1 reply; 24+ messages in thread
From: Padraig Brady @ 2001-10-05 15:00 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel

Andi Kleen wrote:

>>>Another advantage of using the real time instead of a counter is that 
>>>you can easily merge the both values into a single 64bit value and do
>>>arithmetic on it in user space. With a generation counter you would need 
>>>to work with number pairs, which is much more complex. 
>>>
>>??
>>if (file->mtime != mtime || file->gen_count != gen_count)
>>     file_changed=1;
>>
>
>And how would you implement "newer than" and "older than" with a generation
>count that doesn't reset in a always fixed time interval (=requiring
>additional timestamps in kernel)?  
>
>-Andi
>
Well IMHO "newer than", "older than" applications have until now
done with second resolution, and that's all that's required?

Padraig.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 15:00                 ` Padraig Brady
@ 2001-10-05 19:12                   ` Andi Kleen
  2001-10-08  8:39                     ` Padraig Brady
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2001-10-05 19:12 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel

On Fri, Oct 05, 2001 at 04:00:08PM +0100, Padraig Brady wrote:
> Andi Kleen wrote:
> 
> >>>Another advantage of using the real time instead of a counter is that 
> >>>you can easily merge the both values into a single 64bit value and do
> >>>arithmetic on it in user space. With a generation counter you would need 
> >>>to work with number pairs, which is much more complex. 
> >>>
> >>??
> >>if (file->mtime != mtime || file->gen_count != gen_count)
> >>     file_changed=1;
> >>
> >
> >And how would you implement "newer than" and "older than" with a generation
> >count that doesn't reset in a always fixed time interval (=requiring
> >additional timestamps in kernel)?  
> >
> >-Andi
> >
> Well IMHO "newer than", "older than" applications have until now
> done with second resolution, and that's all that's required?

No they haven't. GNU make supports nsec mtime on Solaris and apparently
some other OS too, because the second granuality mtime can be a big 
problem with make -j<bignumber> on a big SMP box. make has to distingush
"is older" from "is newer"; "not equal" alone doesn't cut it.

[If you think it is modify your make to replace the "is older" check
for dependencies with "is not equal" and see what happens]



-Andi

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 14:38               ` Andi Kleen
  2001-10-05 15:00                 ` Padraig Brady
@ 2001-10-05 20:22                 ` Bernd Eckenfels
  1 sibling, 0 replies; 24+ messages in thread
From: Bernd Eckenfels @ 2001-10-05 20:22 UTC (permalink / raw)
  To: linux-kernel

In article <20011005163807.A13524@gruyere.muc.suse.de> you wrote:
>> if (file->mtime != mtime || file->gen_count != gen_count)
>>      file_changed=1;

> And how would you implement "newer than" and "older than" with a generation
> count that doesn't reset in a always fixed time interval (=requiring
> additional timestamps in kernel)?  

newer:

if ((file->mtime < mtime) || ((file->mtime == mtime) && (file->gen_count < gen_count))

The Advantage here is, that even can contain some usefull info like "x
modifications".

Greetings
Bernd

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-05 19:12                   ` Andi Kleen
@ 2001-10-08  8:39                     ` Padraig Brady
  2001-10-08  8:58                       ` Padraig Brady
  2001-10-08 10:04                       ` Trond Myklebust
  0 siblings, 2 replies; 24+ messages in thread
From: Padraig Brady @ 2001-10-08  8:39 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel

Andi Kleen wrote:

>On Fri, Oct 05, 2001 at 04:00:08PM +0100, Padraig Brady wrote:
>
>>Andi Kleen wrote:
>>
>>>>>Another advantage of using the real time instead of a counter is that 
>>>>>you can easily merge the both values into a single 64bit value and do
>>>>>arithmetic on it in user space. With a generation counter you would need 
>>>>>to work with number pairs, which is much more complex. 
>>>>>
>>>>??
>>>>if (file->mtime != mtime || file->gen_count != gen_count)
>>>>    file_changed=1;
>>>>
>>>And how would you implement "newer than" and "older than" with a generation
>>>count that doesn't reset in a always fixed time interval (=requiring
>>>additional timestamps in kernel)?  
>>>
>>>-Andi
>>>
>>Well IMHO "newer than", "older than" applications have until now
>>done with second resolution, and that's all that's required?
>>
>
>No they haven't. GNU make supports nsec mtime on Solaris and apparently
>some other OS too, because the second granuality mtime can be a big 
>problem with make -j<bignumber> on a big SMP box. make has to distingush
>"is older" from "is newer"; "not equal" alone doesn't cut it.
>
>[If you think it is modify your make to replace the "is older" check
>for dependencies with "is not equal" and see what happens]
>

OK agreed, in this case the, complete state/relationship between files, 
must be
maintained independently of the userspace app, i.e. in the filesystem. 
But wont
you then have the same problem with synchronising nanosecond times between
the various processors (which could be the other side of a network cable 
in some
configurations)? So perhaps the best solution is to maintain both a 
generation
count which would do for many apps who just care if the file has changed 
relative
to some moment it time and not relative to another file(s) on the 
filesystem .
Then for make type applications you could maintain the full resolution 
timestamp,
however this will still have the synchronisation/portability/CPU expense 
issues
discussed previously.

Padraig.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-08  8:39                     ` Padraig Brady
@ 2001-10-08  8:58                       ` Padraig Brady
  2001-10-08 10:04                       ` Trond Myklebust
  1 sibling, 0 replies; 24+ messages in thread
From: Padraig Brady @ 2001-10-08  8:58 UTC (permalink / raw)
  To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel

Padraig Brady wrote:

> Andi Kleen wrote:
>
>> On Fri, Oct 05, 2001 at 04:00:08PM +0100, Padraig Brady wrote:
>>
>>> Andi Kleen wrote:
>>>
>>>>>> Another advantage of using the real time instead of a counter is 
>>>>>> that you can easily merge the both values into a single 64bit 
>>>>>> value and do
>>>>>> arithmetic on it in user space. With a generation counter you 
>>>>>> would need to work with number pairs, which is much more complex.
>>>>>
>>>>> ??
>>>>> if (file->mtime != mtime || file->gen_count != gen_count)
>>>>>    file_changed=1;
>>>>>
>>>> And how would you implement "newer than" and "older than" with a 
>>>> generation
>>>> count that doesn't reset in a always fixed time interval (=requiring
>>>> additional timestamps in kernel)? 
>>>> -Andi
>>>>
>>> Well IMHO "newer than", "older than" applications have until now
>>> done with second resolution, and that's all that's required?
>>>
>>
>> No they haven't. GNU make supports nsec mtime on Solaris and apparently
>> some other OS too, because the second granuality mtime can be a big 
>> problem with make -j<bignumber> on a big SMP box. make has to distingush
>> "is older" from "is newer"; "not equal" alone doesn't cut it.
>>
>> [If you think it is modify your make to replace the "is older" check
>> for dependencies with "is not equal" and see what happens]
>>
>
> OK agreed, in this case the, complete state/relationship between 
> files, must be
> maintained independently of the userspace app, i.e. in the filesystem. 
> But wont
> you then have the same problem with synchronising nanosecond times 
> between
> the various processors (which could be the other side of a network 
> cable in some
> configurations)? So perhaps the best solution is to maintain both a 
> generation
> count which would do for many apps who just care if the file has 
> changed relative
> to some moment it time and not relative to another file(s) on the 
> filesystem .
> Then for make type applications you could maintain the full resolution 
> timestamp,
> however this will still have the synchronisation/portability/CPU 
> expense issues
> discussed previously.


Just thinking that it's VERY hard to synchronise timings to nanosecond 
or even millisecond
resolution over distributed or even within the same filesystem, how 
about you synchronise
the timestamps to the particular filesystem and not the universe. I.E. 
Instead of incrementing
a "generation count" in each inode you could increment a global 
filesystem count everytime
a file is modified in the filesystem, and then this count is stored in 
the particular inode being
modified. This would allow you to have exact order relationships between 
files in the same
filesystem, and would work perfectly every time for both "types" of apps 
mentioned above.
Outside the filesystem you can then resort to just the (second 
resolution) timestamp.

Padraig.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-08  8:39                     ` Padraig Brady
  2001-10-08  8:58                       ` Padraig Brady
@ 2001-10-08 10:04                       ` Trond Myklebust
  1 sibling, 0 replies; 24+ messages in thread
From: Trond Myklebust @ 2001-10-08 10:04 UTC (permalink / raw)
  To: Padraig Brady; +Cc: linux-kernel

>>>>> " " == Padraig Brady <padraig@antefacto.com> writes:

     > you then have the same problem with synchronising nanosecond
     > times between the various processors (which could be the other
     > side of a network cable in some

     > configurations)? So perhaps the best solution is to maintain
     > both a generation

     > count which would do for many apps who just care if the file
     > has changed relative

     > to some moment it time and not relative to another file(s) on
     > the filesystem .

     > Then for make type applications you could maintain the full
     > resolution timestamp,

     > however this will still have the
     > synchronisation/portability/CPU expense issues

     > discussed previously.

This `generation count' idea for file change stamping will eventually
have to go into the kernel if only because things like NFSv4 will
require it.

Meanwhile though, you're going have to look elsewhere than ordinary
NFS to be able to share the generation information over your
network. The current protocols support microsecond(v2)/nanosecond(v3)
timestamps only.

Cheers,
   Trond

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03 17:45   ` Bernd Eckenfels
@ 2001-10-13 15:24     ` Jamie Lokier
  2001-10-13 16:12       ` Andi Kleen
  0 siblings, 1 reply; 24+ messages in thread
From: Jamie Lokier @ 2001-10-13 15:24 UTC (permalink / raw)
  To: Bernd Eckenfels, Ulrich Drepper, Alex Larsson, Andi Kleen,
	Padraig Brady, Andrew Pimlott
  Cc: linux-kernel

This note explains how to implement file timestamps in such a way that
modifications to file can always be detect reliably.  Currently,
programs such as `make' and other interesting applications cannot give
absolute guarantees of detecting changed files.

Andi Kleen says we can ignore the risk; I disagree, as there are some
applications that cannot be trusted if the risk is plausible, and it can
be fixed easily.

Bernd Eckenfels wrote:
> If you simply check the mtime and the file size you have the two most
> relevant parts. If neighter of those changes this means that programs using
> the dnotify api most likely do not need to act.
                  ^^^^^^^^^^^

In other words, the API is broken for certain applications.

One that springs to mind is transparent caching of JIT-compiled code
between interpreter invocations.  If dnotify misses the notification
sometimes, the caching ceases to be transparent, and you have to switch
it off for reliable behaviour, a major efficiency loss.

Microsecond resolution, of course, does not fix this problem.

Alex Larsson wrote:
> Is a nanoseconds field the right choice though? In reality you might not 
> have a nanosecond resolution timer, so you would miss changes that appear
> on shorter timescale than the timer resolution. Wouldn't a generation 
> counter, increased when ctime was updated, be a better solution?

As has been pointed out, it would not be compatible with other unix
systems and existing software, and timestamps have nice audit trail
possibilities.

I didn't realise there was enough precision left in ext2 inodes for
nanosecond timestamps.

Timestamps have _many_ problems: the main problem is that you can't
guarantee to reliably detect a changed file.  For some interesting
applications this is fatal.

However, you can fix timestamps and keep the best benefits of timestamps
and counters:

  - high resolution timestamps.

  - whenever there is a change event, check whether the timestamp
    would be advanced.  If not, delay the change (i.e. inside the
    write() call) until the clock time has advanced to the next
    high-resolution unit.

  - if you use nanoseconds, this will never occur on current machines
    and only rarely on faster machines.

  - spinning is an acceptable way to delay for such a short time.

  - it's not necessary to delay if nobody read the mtime since the last
    timestamp update, which will nearly always be the case.  So even on
    extremely fast future machines, you would hardly ever pause.

cheers,
-- Jamie

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-13 15:24     ` Jamie Lokier
@ 2001-10-13 16:12       ` Andi Kleen
  2001-10-13 19:38         ` Jamie Lokier
  0 siblings, 1 reply; 24+ messages in thread
From: Andi Kleen @ 2001-10-13 16:12 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Bernd Eckenfels, Ulrich Drepper, Alex Larsson, Padraig Brady,
	Andrew Pimlott, linux-kernel

In article <20011013172419.B20499@kushida.jlokier.co.uk>,
Jamie Lokier <lk@tantalophile.demon.co.uk> writes:
> Andi Kleen says we can ignore the risk; I disagree, as there are some
> applications that cannot be trusted if the risk is plausible, and it can
> be fixed easily.

You're misquoting me badly.  I said we can ignore the risk that two
nanosecond resolution timestamps that get changed by two different cpus 
with out-of-sync cycle counter on a smp system and which are fast enough
to free/aquire the inode lock in a smaller time than they're out of sync
(= giving two file changes with the same ns timestamp) can be ignored.
I implied on the systems that don't have a cycle counter and which use
jiffie resolution gettimeofday it can be also ignored, because they're
unlikely to be SMP and dying out too anyways. 

-Andi



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-13 16:12       ` Andi Kleen
@ 2001-10-13 19:38         ` Jamie Lokier
  0 siblings, 0 replies; 24+ messages in thread
From: Jamie Lokier @ 2001-10-13 19:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Bernd Eckenfels, Ulrich Drepper, Alex Larsson, Padraig Brady,
	Andrew Pimlott, linux-kernel

Andi Kleen wrote:
> > Andi Kleen says we can ignore the risk; I disagree, as there are some
> > applications that cannot be trusted if the risk is plausible, and it can
> > be fixed easily.
> 
> You're misquoting me badly.  I said we can ignore the risk that two
> nanosecond resolution timestamps that get changed by two different cpus 
> with out-of-sync cycle counter on a smp system and which are fast enough
> to free/aquire the inode lock in a smaller time than they're out of sync
> (= giving two file changes with the same ns timestamp) can be ignored.
> I implied on the systems that don't have a cycle counter and which use
> jiffie resolution gettimeofday it can be also ignored, because they're
> unlikely to be SMP and dying out too anyways. 

Andi, sorry I misrepresented your statement.

I misread your original as saying that the risks due to SMP nanosecond
scale synchronisation problems can be ignored.  Implied from that, that
the small risk of one SMP process modifying a file while another checks
the timestamp can be ignored.  I misread this way because others have
suggested higher resolution solves the problem, and I believe it does not.

As you say above, multiple modifications within a single tick are not a
problem, do not have to be tracked, and therefore do not require SMP
sychronisation.

The SMP risk of missing a change after checking the timestamp is among
the risks I consider critical for an application which must not miss the
fact that a file has changed.  I do not want us to repeat the mistake of
1 second at a smaller timescale.

cheers,
-- Jamie

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Finegrained a/c/mtime was Re: Directory notification problem
  2001-10-03 15:24       ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack
@ 2001-10-16 18:56         ` Riley Williams
  0 siblings, 0 replies; 24+ messages in thread
From: Riley Williams @ 2001-10-16 18:56 UTC (permalink / raw)
  To: Gerhard Mack
  Cc: Eric W Biederman, Ulrich Drepper, Andi Kleen, Alex Larsson,
	Alan Cox, Linux Kernel

Hi Gerhard.

>>>> For stat is also requires a changed glibc ABI -- the glibc/2.4
>>>> stat64 structure reserved an additional 4 bytes for every
>>>> timestamp, but these either need to be used to give more seconds
>>>> for the year 2038 problem or be used for the ms fractions. y2038
>>>> is somewhat important too.

>>> The fields are meant for nanoseconds. The y2038 will definitely be
>>> solved by time-shifting or making time_t unsigned. In any way
>>> nothing of importance here and now. Especially since there won't be
>>> many systems which are running today and which have a 32-bit time_t
>>> be used then. For the rest I'm sure that in 37 years there will be
>>> the one or the other ABI change.

>> Right.  Given current uptimes and being optimistic the fix for y2038
>> is probably needed by 2030 or just a little later.  But in any case
>> 64 bit systems should be maxing out by then, and the conversion to
>> 128 bit systems should have already happened on the server side.  
>> 32 bit systems will likely be limited to embedded and legacy systems
>> by then.

> Why do I get the feeling no one has learned from the problems the
> computer industry had with 2 digit date fields?

Precicely my feeling. Let's see what the various field widths do for the
y2038 problem, assuming a signed field and that we retain the current
date origin of Jan 1 00:00:00 UTC 1970 for the new routines:

	Field Width	Rollover Date	  Time
	~~~~~~~~~~~	~~~~~~~~~~~~~	~~~~~~~~
	    32		19 Jan   2038	 3:14:08
	    33		 7 Feb   2106	 6:28:16
	    34		16 Mar   2242	12:56:32
	    35		30 May   2514	 1:53:04
	    36		26 Oct   3058	 3:46:08
	    37		20 Aug   4147	 7:32:16
	    38		 8 Apr   6325	15:04:32
	    39		14 Jul  10680	 6:09:04
	    40		25 Jan  19391	12:18:08
	    41		20 Feb  36812	 0:36:16
	    42		10 Apr  71654	 1:12:32
	    43		19 Jul 141338	 2:25:04
	    44		 4 Feb 280707	 4:50:08
	    45		 8 Mar 559444	 9:40:16

I somehow don't see the need to go any further with this table...

We can get some really decent rollover dates by expanding the field
width, and the basic question comes down to how far ahead we wish to
push the problem - noting that the WinXX Y2K problem has only been
pushed back to be the Y10K problem now.

The other side of the equation is that we need to increase the
resolution with which we give out timestamps, and it appears to me that
the simplest means would be to change the kernel to use a smaller unit
to record timestamps. The current set of calls would then convert this
to seconds, and we would provide a new set of calls that returned the
raw values as used in the kernel.

Assuming the field widths have to be a complete number of bytes, we need
to determine what the minimum resolution is to allow us to record times
up to 00:00:00 GMT on the 1st of January in whatever year we wish to be
able to record up to. Here's what we would need to use for the given
field sizes to handle up to the following years:

Field  Year   Year   Year   Year   Year   Year   Year   Year   Year
Width  2038   2500   5000   10000  25000  50000 100000 250000 500000
~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~
  32    1  s
  40    4 ms  31 ms 174 ms 461 ms
  48   16 us 119 us 680 us 1.8 ms 5.1 ms  11 ms  22 ms  56 ms 112 ms
  56   60 ns 465 ns 2.7 us 7.1 us  21 us  43 us  86 us 218 us 437 us
  64  233 ps 1.8 ns  11 ns  28 ns  79 ns 165 ns 336 ns 849 ns 1.8 us
  72  909 fs 7.1 ps  41 ps 108 ps 308 ps 642 ps 1.4 ns 3.4 ns 6.7 ns
  80  3.6 fs  28 fs 159 fs 420 fs 1.2 ps 2.6 ps 5.2 ps  13 ps  27 ps
~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~

I note that with the recent Y2K changes, WinXX software will next hit
rollover in case (C), and we don't want to be worse than that. Also, to
keep the conversion routines for the current functions simple, we need
to choose an interval that divides exactly into one second.

I would therefore conclude that we could aim for any of the following:

	Field Width	Unit of Time	Rollover Month
	~~~~~~~~~~~	~~~~~~~~~~~~	~~~~~~~~~~~~~~

	   40 bits	    500 ms	  May  10680
			      1  s	  Sep  19390

	   48 bits	   2500 us	  Apr  13119
			      5 ms	  Jul  24268	*
			     10 ms	  Jan  46567
			     25 ms	  Jul 113462
			    125 ms	  Sep 559432

	   56 bits	     10 us	  Nov  13386
			     25 us	  Feb  30512	*
			     50 us	  Mar  59054
			    100 us	  May 116138
			    500 us	  Nov 572811

	   64 bits	     50 ns	  Jul  16583
			    100 ns	  Feb  31197	*
			    250 ns	  Oct  75037
			    500 ns	  Jul 148105
			   1000 ns	  Jan 294241
			   2500 ns	  Jul 732647

	   72 bits	    125 ps	  Sep  11322
			    250 ps	  May  20675
			    500 ps	  Sep  39380	*
			      1 ns	  May  76791
			     10 ns	  Oct 750183

	   80 bits	    500 fs	  Feb  11547
			   1000 fs	  Apr  21124
			   1250 fs	  Nov  25912	*
			   2500 fs	  Sep  49855
			      5 ps	  May  97741
			     10 ps	  Sep 193512
			     25 ps	  Nov 480826

Allowing that WinXX software is now only susceptible to the Y10K
problem, we can't afford to do worse than that, and the sooner we
sort this out, the better for all concerned as far as I can tell.

My personal choices at each field width would be those marked with an
asterisk, and this is based on the principle of using the shortest time
interval possible that is consistant with being able to record up to
around AD 25000 in a signed field.

My overall preference would be to go straight to 64 bit date fields and
define them as storing the time in units of 100 nanoseconds, but it has
apparently been decided that we will use 48 bit fields, if what I've
seen on this list is correct.

> Odds are legacy systems will be running something people for
> whatever reason couldn't replace.

Most probably...

Best wishes from Riley.


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2001-10-16 20:03 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.33.0110022206100.29931-100000@devserv.devel.redhat.com.suse.lists.linux.kernel>
2001-10-03  7:53 ` Finegrained a/c/mtime was Re: Directory notification problem Andi Kleen
2001-10-03  8:06   ` Ulrich Drepper
2001-10-03 13:35     ` Eric W. Biederman
2001-10-03 14:11       ` Netfilter problem Kirill Ratkin
2001-10-03 21:42         ` Luigi Genoni
2001-10-03 15:24       ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack
2001-10-16 18:56         ` Riley Williams
2001-10-03 15:15     ` Alex Larsson
2001-10-03 21:26       ` Andi Kleen
2001-10-05 12:44         ` Padraig Brady
2001-10-05 12:59           ` Andrew Pimlott
2001-10-05 13:01           ` Andi Kleen
2001-10-05 13:15             ` Padraig Brady
2001-10-05 14:38               ` Andi Kleen
2001-10-05 15:00                 ` Padraig Brady
2001-10-05 19:12                   ` Andi Kleen
2001-10-08  8:39                     ` Padraig Brady
2001-10-08  8:58                       ` Padraig Brady
2001-10-08 10:04                       ` Trond Myklebust
2001-10-05 20:22                 ` Bernd Eckenfels
2001-10-03 17:45   ` Bernd Eckenfels
2001-10-13 15:24     ` Jamie Lokier
2001-10-13 16:12       ` Andi Kleen
2001-10-13 19:38         ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).