* Finegrained a/c/mtime was Re: Directory notification problem [not found] <Pine.LNX.4.33.0110022206100.29931-100000@devserv.devel.redhat.com.suse.lists.linux.kernel> @ 2001-10-03 7:53 ` Andi Kleen 2001-10-03 8:06 ` Ulrich Drepper 2001-10-03 17:45 ` Bernd Eckenfels 0 siblings, 2 replies; 24+ messages in thread From: Andi Kleen @ 2001-10-03 7:53 UTC (permalink / raw) To: Alex Larsson; +Cc: linux-kernel Alex Larsson <alexl@redhat.com> writes: > I discovered a problem with the dnotify API while fixing a FAM bug today. > > The problem occurs when you want to watch a file in a directory, and that > file is changed several times in the same second. When I get the directory > notify signal on the directory I need to stat the file to see if the > change was actually in the file. If the file already changed in the > current second the stat() result will be identical to the previous stat() > call, since the resolution of mtime and ctime is one second. > > This leads to missed notifications, leaving clients (such as Nautilus or > Konqueror) displaying an state not representing the current state. > > The only userspace solutions I see is to delay all change notifications to > the end of the second, so that clients always read the correct state. This > is somewhat countrary to the idea of FAM though, as it does not give > instant feedback. > > Is there any possibility of extending struct stat with a generation > counter? Or is there another solution to this problem? make has similar problems with parallel builds on bigger multiprocessor machines. Solaris7 has fixed this problem with adding new stat fields to state that contains the ms for mtime/atime/ctime. There are even already filesystems on linux that support fine grained timestamps on linux, e.g. XFS has it as ns on disk. The problem is that VFS doesn't support it currently, so it sets the ns parts always to zero. To fix it for m/c/atime requires new system calls for utime and stat64. For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64 structure reserved an additional 4 bytes for every timestamp, but these either need to be used to give more seconds for the year 2038 problem or be used for the ms fractions. y2038 is somewhat important too. [In theory the existing additional bytes could be used for both on a big endian host if you manage to define a numeric 48byte type in gcc and be satisfied with 16bit ms resolution, but such a hack would probably cause problems e.g. with other compilers. It would be possible on Little Endian too, but only for mtime and ctime, as there is no unused field in front of st_atime. Overall I think a new stat call is better. The ugly thing is just that the glibc ABI needs updating too] Solving it properly is a 2.5 thing. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 7:53 ` Finegrained a/c/mtime was Re: Directory notification problem Andi Kleen @ 2001-10-03 8:06 ` Ulrich Drepper 2001-10-03 13:35 ` Eric W. Biederman 2001-10-03 15:15 ` Alex Larsson 2001-10-03 17:45 ` Bernd Eckenfels 1 sibling, 2 replies; 24+ messages in thread From: Ulrich Drepper @ 2001-10-03 8:06 UTC (permalink / raw) To: Andi Kleen; +Cc: Alex Larsson, linux-kernel Andi Kleen <ak@suse.de> writes: > For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64 Not only stat64, also plain stat. > structure reserved an additional 4 bytes for every timestamp, but these > either need to be used to give more seconds for the year 2038 problem > or be used for the ms fractions. y2038 is somewhat important too. The fields are meant for nanoseconds. The y2038 will definitely be solved by time-shifting or making time_t unsigned. In any way nothing of importance here and now. Especially since there won't be many systems which are running today and which have a 32-bit time_t be used then. For the rest I'm sure that in 37 years there will be the one or the other ABI change. -- ---------------. ,-. 1325 Chesapeake Terrace Ulrich Drepper \ ,-------------------' \ Sunnyvale, CA 94089 USA Red Hat `--' drepper at redhat.com `------------------------ ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 8:06 ` Ulrich Drepper @ 2001-10-03 13:35 ` Eric W. Biederman 2001-10-03 14:11 ` Netfilter problem Kirill Ratkin 2001-10-03 15:24 ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack 2001-10-03 15:15 ` Alex Larsson 1 sibling, 2 replies; 24+ messages in thread From: Eric W. Biederman @ 2001-10-03 13:35 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, Alex Larsson, linux-kernel Ulrich Drepper <drepper@redhat.com> writes: > Andi Kleen <ak@suse.de> writes: > > > For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64 > > Not only stat64, also plain stat. > > > structure reserved an additional 4 bytes for every timestamp, but these > > either need to be used to give more seconds for the year 2038 problem > > or be used for the ms fractions. y2038 is somewhat important too. > > The fields are meant for nanoseconds. The y2038 will definitely be > solved by time-shifting or making time_t unsigned. In any way nothing > of importance here and now. Especially since there won't be many > systems which are running today and which have a 32-bit time_t be used > then. For the rest I'm sure that in 37 years there will be the one or > the other ABI change. Right. Given current uptimes and being optimistic the fix for y2038 is probably needed by 2030 or just a little later. But in any case 64 bit systems should be maxing out by then, and the conversion to 128 bit systems should have already happened on the server side. 32 bit systems will likely be limited to embedded and legacy systems by then. Eric ^ permalink raw reply [flat|nested] 24+ messages in thread
* Netfilter problem 2001-10-03 13:35 ` Eric W. Biederman @ 2001-10-03 14:11 ` Kirill Ratkin 2001-10-03 21:42 ` Luigi Genoni 2001-10-03 15:24 ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack 1 sibling, 1 reply; 24+ messages in thread From: Kirill Ratkin @ 2001-10-03 14:11 UTC (permalink / raw) To: linux-kernel Hi. I've a strange error when I try to check protocol type in netfilter hook function. I see this message: kping.c: In function `knet_hook': kping.c:116: dereferencing pointer to incomplete type make: *** [kping.o] Error 1 This is part of my code: static unsigned int knet_hook(unsigned int hooknum, struct sk_buff** p_skb, const struct net_device* p_in, const struct net_device* p_out, int (*okfn)(struct sk_buff* )) { ... if((*p_skb)->nh.iph->protocol== (unsigned char)IPPROTO_ICMP) { printk("<1>ICMP Packet killed\n"); return NF_DROP; } ... } It had compiled on 2.4.1 version. I don't understand why ... . __________________________________________________ Do You Yahoo!? Listen to your Yahoo! Mail messages from any phone. http://phone.yahoo.com ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Netfilter problem 2001-10-03 14:11 ` Netfilter problem Kirill Ratkin @ 2001-10-03 21:42 ` Luigi Genoni 0 siblings, 0 replies; 24+ messages in thread From: Luigi Genoni @ 2001-10-03 21:42 UTC (permalink / raw) To: Kirill Ratkin; +Cc: linux-kernel strange! it compiled correctly fr mw with all 2.4 kernels, with gcc 2.95.3, and gcc 3.0.0/1 Luigi On Wed, 3 Oct 2001, Kirill Ratkin wrote: > Hi. > > I've a strange error when I try to check protocol type > in netfilter hook function. > > I see this message: > kping.c: In function `knet_hook': > kping.c:116: dereferencing pointer to incomplete type > make: *** [kping.o] Error 1 > > This is part of my code: > static > unsigned int knet_hook(unsigned int hooknum, > struct sk_buff** p_skb, > const struct net_device* p_in, > const struct net_device* p_out, > int (*okfn)(struct sk_buff* )) > { > ... > if((*p_skb)->nh.iph->protocol== > (unsigned char)IPPROTO_ICMP) > { > printk("<1>ICMP Packet killed\n"); > return NF_DROP; > } > ... > } > > It had compiled on 2.4.1 version. > > I don't understand why ... . > > > __________________________________________________ > Do You Yahoo!? > Listen to your Yahoo! Mail messages from any phone. > http://phone.yahoo.com > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 13:35 ` Eric W. Biederman 2001-10-03 14:11 ` Netfilter problem Kirill Ratkin @ 2001-10-03 15:24 ` Gerhard Mack 2001-10-16 18:56 ` Riley Williams 1 sibling, 1 reply; 24+ messages in thread From: Gerhard Mack @ 2001-10-03 15:24 UTC (permalink / raw) To: Eric W. Biederman; +Cc: Ulrich Drepper, Andi Kleen, Alex Larsson, linux-kernel On 3 Oct 2001, Eric W. Biederman wrote: > Ulrich Drepper <drepper@redhat.com> writes: > > > Andi Kleen <ak@suse.de> writes: > > > > > For stat is also requires a changed glibc ABI -- the glibc/2.4 stat64 > > > > Not only stat64, also plain stat. > > > > > structure reserved an additional 4 bytes for every timestamp, but these > > > either need to be used to give more seconds for the year 2038 problem > > > or be used for the ms fractions. y2038 is somewhat important too. > > > > The fields are meant for nanoseconds. The y2038 will definitely be > > solved by time-shifting or making time_t unsigned. In any way nothing > > of importance here and now. Especially since there won't be many > > systems which are running today and which have a 32-bit time_t be used > > then. For the rest I'm sure that in 37 years there will be the one or > > the other ABI change. > > Right. Given current uptimes and being optimistic the fix for y2038 > is probably needed by 2030 or just a little later. But in any case > 64 bit systems should be maxing out by then, and the conversion to 128 > bit systems should have already happened on the server side. 32 bit > systems will likely be limited to embedded and legacy systems by then. > > Eric Why do I get the feeling no one has learned from the problems the computer industry had with 2 digit date fields? Odds are legacy systems will be running something people for whatever reason couldn't replace. Gerhard -- Gerhard Mack gmack@innerfire.net <>< As a computer I find your faith in technology amusing. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 15:24 ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack @ 2001-10-16 18:56 ` Riley Williams 0 siblings, 0 replies; 24+ messages in thread From: Riley Williams @ 2001-10-16 18:56 UTC (permalink / raw) To: Gerhard Mack Cc: Eric W Biederman, Ulrich Drepper, Andi Kleen, Alex Larsson, Alan Cox, Linux Kernel Hi Gerhard. >>>> For stat is also requires a changed glibc ABI -- the glibc/2.4 >>>> stat64 structure reserved an additional 4 bytes for every >>>> timestamp, but these either need to be used to give more seconds >>>> for the year 2038 problem or be used for the ms fractions. y2038 >>>> is somewhat important too. >>> The fields are meant for nanoseconds. The y2038 will definitely be >>> solved by time-shifting or making time_t unsigned. In any way >>> nothing of importance here and now. Especially since there won't be >>> many systems which are running today and which have a 32-bit time_t >>> be used then. For the rest I'm sure that in 37 years there will be >>> the one or the other ABI change. >> Right. Given current uptimes and being optimistic the fix for y2038 >> is probably needed by 2030 or just a little later. But in any case >> 64 bit systems should be maxing out by then, and the conversion to >> 128 bit systems should have already happened on the server side. >> 32 bit systems will likely be limited to embedded and legacy systems >> by then. > Why do I get the feeling no one has learned from the problems the > computer industry had with 2 digit date fields? Precicely my feeling. Let's see what the various field widths do for the y2038 problem, assuming a signed field and that we retain the current date origin of Jan 1 00:00:00 UTC 1970 for the new routines: Field Width Rollover Date Time ~~~~~~~~~~~ ~~~~~~~~~~~~~ ~~~~~~~~ 32 19 Jan 2038 3:14:08 33 7 Feb 2106 6:28:16 34 16 Mar 2242 12:56:32 35 30 May 2514 1:53:04 36 26 Oct 3058 3:46:08 37 20 Aug 4147 7:32:16 38 8 Apr 6325 15:04:32 39 14 Jul 10680 6:09:04 40 25 Jan 19391 12:18:08 41 20 Feb 36812 0:36:16 42 10 Apr 71654 1:12:32 43 19 Jul 141338 2:25:04 44 4 Feb 280707 4:50:08 45 8 Mar 559444 9:40:16 I somehow don't see the need to go any further with this table... We can get some really decent rollover dates by expanding the field width, and the basic question comes down to how far ahead we wish to push the problem - noting that the WinXX Y2K problem has only been pushed back to be the Y10K problem now. The other side of the equation is that we need to increase the resolution with which we give out timestamps, and it appears to me that the simplest means would be to change the kernel to use a smaller unit to record timestamps. The current set of calls would then convert this to seconds, and we would provide a new set of calls that returned the raw values as used in the kernel. Assuming the field widths have to be a complete number of bytes, we need to determine what the minimum resolution is to allow us to record times up to 00:00:00 GMT on the 1st of January in whatever year we wish to be able to record up to. Here's what we would need to use for the given field sizes to handle up to the following years: Field Year Year Year Year Year Year Year Year Year Width 2038 2500 5000 10000 25000 50000 100000 250000 500000 ~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ 32 1 s 40 4 ms 31 ms 174 ms 461 ms 48 16 us 119 us 680 us 1.8 ms 5.1 ms 11 ms 22 ms 56 ms 112 ms 56 60 ns 465 ns 2.7 us 7.1 us 21 us 43 us 86 us 218 us 437 us 64 233 ps 1.8 ns 11 ns 28 ns 79 ns 165 ns 336 ns 849 ns 1.8 us 72 909 fs 7.1 ps 41 ps 108 ps 308 ps 642 ps 1.4 ns 3.4 ns 6.7 ns 80 3.6 fs 28 fs 159 fs 420 fs 1.2 ps 2.6 ps 5.2 ps 13 ps 27 ps ~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ ~~~~~~ I note that with the recent Y2K changes, WinXX software will next hit rollover in case (C), and we don't want to be worse than that. Also, to keep the conversion routines for the current functions simple, we need to choose an interval that divides exactly into one second. I would therefore conclude that we could aim for any of the following: Field Width Unit of Time Rollover Month ~~~~~~~~~~~ ~~~~~~~~~~~~ ~~~~~~~~~~~~~~ 40 bits 500 ms May 10680 1 s Sep 19390 48 bits 2500 us Apr 13119 5 ms Jul 24268 * 10 ms Jan 46567 25 ms Jul 113462 125 ms Sep 559432 56 bits 10 us Nov 13386 25 us Feb 30512 * 50 us Mar 59054 100 us May 116138 500 us Nov 572811 64 bits 50 ns Jul 16583 100 ns Feb 31197 * 250 ns Oct 75037 500 ns Jul 148105 1000 ns Jan 294241 2500 ns Jul 732647 72 bits 125 ps Sep 11322 250 ps May 20675 500 ps Sep 39380 * 1 ns May 76791 10 ns Oct 750183 80 bits 500 fs Feb 11547 1000 fs Apr 21124 1250 fs Nov 25912 * 2500 fs Sep 49855 5 ps May 97741 10 ps Sep 193512 25 ps Nov 480826 Allowing that WinXX software is now only susceptible to the Y10K problem, we can't afford to do worse than that, and the sooner we sort this out, the better for all concerned as far as I can tell. My personal choices at each field width would be those marked with an asterisk, and this is based on the principle of using the shortest time interval possible that is consistant with being able to record up to around AD 25000 in a signed field. My overall preference would be to go straight to 64 bit date fields and define them as storing the time in units of 100 nanoseconds, but it has apparently been decided that we will use 48 bit fields, if what I've seen on this list is correct. > Odds are legacy systems will be running something people for > whatever reason couldn't replace. Most probably... Best wishes from Riley. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 8:06 ` Ulrich Drepper 2001-10-03 13:35 ` Eric W. Biederman @ 2001-10-03 15:15 ` Alex Larsson 2001-10-03 21:26 ` Andi Kleen 1 sibling, 1 reply; 24+ messages in thread From: Alex Larsson @ 2001-10-03 15:15 UTC (permalink / raw) To: Ulrich Drepper; +Cc: Andi Kleen, linux-kernel On 3 Oct 2001, Ulrich Drepper wrote: > Andi Kleen <ak@suse.de> writes: > > > structure reserved an additional 4 bytes for every timestamp, but these > > either need to be used to give more seconds for the year 2038 problem > > or be used for the ms fractions. y2038 is somewhat important too. > > The fields are meant for nanoseconds. The y2038 will definitely be > solved by time-shifting or making time_t unsigned. In any way nothing > of importance here and now. Especially since there won't be many > systems which are running today and which have a 32-bit time_t be used > then. For the rest I'm sure that in 37 years there will be the one or > the other ABI change. Is a nanoseconds field the right choice though? In reality you might not have a nanosecond resolution timer, so you would miss changes that appear on shorter timescale than the timer resolution. Wouldn't a generation counter, increased when ctime was updated, be a better solution? / Alex ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 15:15 ` Alex Larsson @ 2001-10-03 21:26 ` Andi Kleen 2001-10-05 12:44 ` Padraig Brady 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2001-10-03 21:26 UTC (permalink / raw) To: Alex Larsson; +Cc: Ulrich Drepper, Andi Kleen, linux-kernel On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote: > Is a nanoseconds field the right choice though? In reality you might not > have a nanosecond resolution timer, so you would miss changes that appear > on shorter timescale than the timer resolution. Wouldn't a generation > counter, increased when ctime was updated, be a better solution? Near any CPU has a cycle counter builtin now, which gives you ns like resolution. In theory you could still get collisions on MP systems, but window is small enough that it can be ignored in practice. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 21:26 ` Andi Kleen @ 2001-10-05 12:44 ` Padraig Brady 2001-10-05 12:59 ` Andrew Pimlott 2001-10-05 13:01 ` Andi Kleen 0 siblings, 2 replies; 24+ messages in thread From: Padraig Brady @ 2001-10-05 12:44 UTC (permalink / raw) To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel Andi Kleen wrote: >On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote: > >>Is a nanoseconds field the right choice though? In reality you might not >>have a nanosecond resolution timer, so you would miss changes that appear >>on shorter timescale than the timer resolution. Wouldn't a generation >>counter, increased when ctime was updated, be a better solution? >> > >Near any CPU has a cycle counter builtin now, which gives you ns like >resolution. In theory you could still get collisions on MP systems, >but window is small enough that it can be ignored in practice. > >-Andi > But the point is you, only ever would want nano second resolution to make sure you notice all changes to a file. A more general (and much simpler) solution would be to gen_count++ every time a file's modified. What other applications would require better than second resolution on files? Padraig. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 12:44 ` Padraig Brady @ 2001-10-05 12:59 ` Andrew Pimlott 2001-10-05 13:01 ` Andi Kleen 1 sibling, 0 replies; 24+ messages in thread From: Andrew Pimlott @ 2001-10-05 12:59 UTC (permalink / raw) To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel On Fri, Oct 05, 2001 at 01:44:20PM +0100, Padraig Brady wrote: > But the point is you, only ever would want nano second resolution to make > sure you notice all changes to a file. A more general (and much simpler) > solution would be to gen_count++ every time a file's modified. What other > applications would require better than second resolution on files? Correlating file timestamps with an event log. Comparing timestamps on different files (make). Real time is _much_ more useful (not to mention convenient) than a generation count; and given that we've survived with second resolution so far, I think the hypothetical collisions on a nanosecond scale are ignorable. Andrew ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 12:44 ` Padraig Brady 2001-10-05 12:59 ` Andrew Pimlott @ 2001-10-05 13:01 ` Andi Kleen 2001-10-05 13:15 ` Padraig Brady 1 sibling, 1 reply; 24+ messages in thread From: Andi Kleen @ 2001-10-05 13:01 UTC (permalink / raw) To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel On Fri, Oct 05, 2001 at 01:44:20PM +0100, Padraig Brady wrote: > Andi Kleen wrote: > > >On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote: > > > >>Is a nanoseconds field the right choice though? In reality you might not > >>have a nanosecond resolution timer, so you would miss changes that appear > >>on shorter timescale than the timer resolution. Wouldn't a generation > >>counter, increased when ctime was updated, be a better solution? > >> > > > >Near any CPU has a cycle counter builtin now, which gives you ns like > >resolution. In theory you could still get collisions on MP systems, > >but window is small enough that it can be ignored in practice. > > > >-Andi > > > But the point is you, only ever would want nano second resolution to make > sure you notice all changes to a file. A more general (and much simpler) > solution would be to gen_count++ every time a file's modified. What other > applications would require better than second resolution on files? The main advantage of using a real timestamp instead of a generation counter is that we would be compatible to Unixware/Solaris/... Their API is fine, so I see no advantage in inventing a new incompatible one. Another advantage of using the real time instead of a counter is that you can easily merge the both values into a single 64bit value and do arithmetic on it in user space. With a generation counter you would need to work with number pairs, which is much more complex. [or alternatively reset the generation counter every second in the kernel to get a flat time range again, which would be racy and ugly and complicated in the kernel because it would need additional timestamps] Also a rdtsc/get_timestamp or in the worst case a jiffie read is really not complex to code in kernel, what makes you think it is? -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 13:01 ` Andi Kleen @ 2001-10-05 13:15 ` Padraig Brady 2001-10-05 14:38 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Padraig Brady @ 2001-10-05 13:15 UTC (permalink / raw) To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel Andi Kleen wrote: >On Fri, Oct 05, 2001 at 01:44:20PM +0100, Padraig Brady wrote: > >>Andi Kleen wrote: >> >>>On Wed, Oct 03, 2001 at 11:15:04AM -0400, Alex Larsson wrote: >>> >>>>Is a nanoseconds field the right choice though? In reality you might not >>>>have a nanosecond resolution timer, so you would miss changes that appear >>>>on shorter timescale than the timer resolution. Wouldn't a generation >>>>counter, increased when ctime was updated, be a better solution? >>>> >>>Near any CPU has a cycle counter builtin now, which gives you ns like >>>resolution. In theory you could still get collisions on MP systems, >>>but window is small enough that it can be ignored in practice. >>> >>>-Andi >>> >>But the point is you, only ever would want nano second resolution to make >>sure you notice all changes to a file. A more general (and much simpler) >>solution would be to gen_count++ every time a file's modified. What other >>applications would require better than second resolution on files? >> > >The main advantage of using a real timestamp instead of a generation >counter is that we would be compatible to Unixware/Solaris/... Their >API is fine, so I see no advantage in inventing a new incompatible one. > Even so I can't see a need to have this resolution for mtime, and as you pointed out there can still be races on SMP systems and timing resolutions are system dependent anyway. > >Another advantage of using the real time instead of a counter is that >you can easily merge the both values into a single 64bit value and do >arithmetic on it in user space. With a generation counter you would need >to work with number pairs, which is much more complex. > ?? if (file->mtime != mtime || file->gen_count != gen_count) file_changed=1; > >[or alternatively reset the generation counter every second in the kernel >to get a flat time range again, >which would be racy and ugly and complicated in the kernel because it >would need additional timestamps] > No need as long as it doesn't wrap within the mtime resolution (1 second). > >Also a rdtsc/get_timestamp or in the worst case a jiffie read is really >not complex to code in kernel, what makes you think it is? > Sorry, by more complex I meant more instructions/CPU expensive. > > >-Andi > Padraig. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 13:15 ` Padraig Brady @ 2001-10-05 14:38 ` Andi Kleen 2001-10-05 15:00 ` Padraig Brady 2001-10-05 20:22 ` Bernd Eckenfels 0 siblings, 2 replies; 24+ messages in thread From: Andi Kleen @ 2001-10-05 14:38 UTC (permalink / raw) To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel > >Another advantage of using the real time instead of a counter is that > >you can easily merge the both values into a single 64bit value and do > >arithmetic on it in user space. With a generation counter you would need > >to work with number pairs, which is much more complex. > > > ?? > if (file->mtime != mtime || file->gen_count != gen_count) > file_changed=1; And how would you implement "newer than" and "older than" with a generation count that doesn't reset in a always fixed time interval (=requiring additional timestamps in kernel)? -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 14:38 ` Andi Kleen @ 2001-10-05 15:00 ` Padraig Brady 2001-10-05 19:12 ` Andi Kleen 2001-10-05 20:22 ` Bernd Eckenfels 1 sibling, 1 reply; 24+ messages in thread From: Padraig Brady @ 2001-10-05 15:00 UTC (permalink / raw) To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel Andi Kleen wrote: >>>Another advantage of using the real time instead of a counter is that >>>you can easily merge the both values into a single 64bit value and do >>>arithmetic on it in user space. With a generation counter you would need >>>to work with number pairs, which is much more complex. >>> >>?? >>if (file->mtime != mtime || file->gen_count != gen_count) >> file_changed=1; >> > >And how would you implement "newer than" and "older than" with a generation >count that doesn't reset in a always fixed time interval (=requiring >additional timestamps in kernel)? > >-Andi > Well IMHO "newer than", "older than" applications have until now done with second resolution, and that's all that's required? Padraig. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 15:00 ` Padraig Brady @ 2001-10-05 19:12 ` Andi Kleen 2001-10-08 8:39 ` Padraig Brady 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2001-10-05 19:12 UTC (permalink / raw) To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel On Fri, Oct 05, 2001 at 04:00:08PM +0100, Padraig Brady wrote: > Andi Kleen wrote: > > >>>Another advantage of using the real time instead of a counter is that > >>>you can easily merge the both values into a single 64bit value and do > >>>arithmetic on it in user space. With a generation counter you would need > >>>to work with number pairs, which is much more complex. > >>> > >>?? > >>if (file->mtime != mtime || file->gen_count != gen_count) > >> file_changed=1; > >> > > > >And how would you implement "newer than" and "older than" with a generation > >count that doesn't reset in a always fixed time interval (=requiring > >additional timestamps in kernel)? > > > >-Andi > > > Well IMHO "newer than", "older than" applications have until now > done with second resolution, and that's all that's required? No they haven't. GNU make supports nsec mtime on Solaris and apparently some other OS too, because the second granuality mtime can be a big problem with make -j<bignumber> on a big SMP box. make has to distingush "is older" from "is newer"; "not equal" alone doesn't cut it. [If you think it is modify your make to replace the "is older" check for dependencies with "is not equal" and see what happens] -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 19:12 ` Andi Kleen @ 2001-10-08 8:39 ` Padraig Brady 2001-10-08 8:58 ` Padraig Brady 2001-10-08 10:04 ` Trond Myklebust 0 siblings, 2 replies; 24+ messages in thread From: Padraig Brady @ 2001-10-08 8:39 UTC (permalink / raw) To: Andi Kleen; +Cc: Alex Larsson, Ulrich Drepper, linux-kernel Andi Kleen wrote: >On Fri, Oct 05, 2001 at 04:00:08PM +0100, Padraig Brady wrote: > >>Andi Kleen wrote: >> >>>>>Another advantage of using the real time instead of a counter is that >>>>>you can easily merge the both values into a single 64bit value and do >>>>>arithmetic on it in user space. With a generation counter you would need >>>>>to work with number pairs, which is much more complex. >>>>> >>>>?? >>>>if (file->mtime != mtime || file->gen_count != gen_count) >>>> file_changed=1; >>>> >>>And how would you implement "newer than" and "older than" with a generation >>>count that doesn't reset in a always fixed time interval (=requiring >>>additional timestamps in kernel)? >>> >>>-Andi >>> >>Well IMHO "newer than", "older than" applications have until now >>done with second resolution, and that's all that's required? >> > >No they haven't. GNU make supports nsec mtime on Solaris and apparently >some other OS too, because the second granuality mtime can be a big >problem with make -j<bignumber> on a big SMP box. make has to distingush >"is older" from "is newer"; "not equal" alone doesn't cut it. > >[If you think it is modify your make to replace the "is older" check >for dependencies with "is not equal" and see what happens] > OK agreed, in this case the, complete state/relationship between files, must be maintained independently of the userspace app, i.e. in the filesystem. But wont you then have the same problem with synchronising nanosecond times between the various processors (which could be the other side of a network cable in some configurations)? So perhaps the best solution is to maintain both a generation count which would do for many apps who just care if the file has changed relative to some moment it time and not relative to another file(s) on the filesystem . Then for make type applications you could maintain the full resolution timestamp, however this will still have the synchronisation/portability/CPU expense issues discussed previously. Padraig. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-08 8:39 ` Padraig Brady @ 2001-10-08 8:58 ` Padraig Brady 2001-10-08 10:04 ` Trond Myklebust 1 sibling, 0 replies; 24+ messages in thread From: Padraig Brady @ 2001-10-08 8:58 UTC (permalink / raw) To: Padraig Brady; +Cc: Andi Kleen, Alex Larsson, Ulrich Drepper, linux-kernel Padraig Brady wrote: > Andi Kleen wrote: > >> On Fri, Oct 05, 2001 at 04:00:08PM +0100, Padraig Brady wrote: >> >>> Andi Kleen wrote: >>> >>>>>> Another advantage of using the real time instead of a counter is >>>>>> that you can easily merge the both values into a single 64bit >>>>>> value and do >>>>>> arithmetic on it in user space. With a generation counter you >>>>>> would need to work with number pairs, which is much more complex. >>>>> >>>>> ?? >>>>> if (file->mtime != mtime || file->gen_count != gen_count) >>>>> file_changed=1; >>>>> >>>> And how would you implement "newer than" and "older than" with a >>>> generation >>>> count that doesn't reset in a always fixed time interval (=requiring >>>> additional timestamps in kernel)? >>>> -Andi >>>> >>> Well IMHO "newer than", "older than" applications have until now >>> done with second resolution, and that's all that's required? >>> >> >> No they haven't. GNU make supports nsec mtime on Solaris and apparently >> some other OS too, because the second granuality mtime can be a big >> problem with make -j<bignumber> on a big SMP box. make has to distingush >> "is older" from "is newer"; "not equal" alone doesn't cut it. >> >> [If you think it is modify your make to replace the "is older" check >> for dependencies with "is not equal" and see what happens] >> > > OK agreed, in this case the, complete state/relationship between > files, must be > maintained independently of the userspace app, i.e. in the filesystem. > But wont > you then have the same problem with synchronising nanosecond times > between > the various processors (which could be the other side of a network > cable in some > configurations)? So perhaps the best solution is to maintain both a > generation > count which would do for many apps who just care if the file has > changed relative > to some moment it time and not relative to another file(s) on the > filesystem . > Then for make type applications you could maintain the full resolution > timestamp, > however this will still have the synchronisation/portability/CPU > expense issues > discussed previously. Just thinking that it's VERY hard to synchronise timings to nanosecond or even millisecond resolution over distributed or even within the same filesystem, how about you synchronise the timestamps to the particular filesystem and not the universe. I.E. Instead of incrementing a "generation count" in each inode you could increment a global filesystem count everytime a file is modified in the filesystem, and then this count is stored in the particular inode being modified. This would allow you to have exact order relationships between files in the same filesystem, and would work perfectly every time for both "types" of apps mentioned above. Outside the filesystem you can then resort to just the (second resolution) timestamp. Padraig. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-08 8:39 ` Padraig Brady 2001-10-08 8:58 ` Padraig Brady @ 2001-10-08 10:04 ` Trond Myklebust 1 sibling, 0 replies; 24+ messages in thread From: Trond Myklebust @ 2001-10-08 10:04 UTC (permalink / raw) To: Padraig Brady; +Cc: linux-kernel >>>>> " " == Padraig Brady <padraig@antefacto.com> writes: > you then have the same problem with synchronising nanosecond > times between the various processors (which could be the other > side of a network cable in some > configurations)? So perhaps the best solution is to maintain > both a generation > count which would do for many apps who just care if the file > has changed relative > to some moment it time and not relative to another file(s) on > the filesystem . > Then for make type applications you could maintain the full > resolution timestamp, > however this will still have the > synchronisation/portability/CPU expense issues > discussed previously. This `generation count' idea for file change stamping will eventually have to go into the kernel if only because things like NFSv4 will require it. Meanwhile though, you're going have to look elsewhere than ordinary NFS to be able to share the generation information over your network. The current protocols support microsecond(v2)/nanosecond(v3) timestamps only. Cheers, Trond ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-05 14:38 ` Andi Kleen 2001-10-05 15:00 ` Padraig Brady @ 2001-10-05 20:22 ` Bernd Eckenfels 1 sibling, 0 replies; 24+ messages in thread From: Bernd Eckenfels @ 2001-10-05 20:22 UTC (permalink / raw) To: linux-kernel In article <20011005163807.A13524@gruyere.muc.suse.de> you wrote: >> if (file->mtime != mtime || file->gen_count != gen_count) >> file_changed=1; > And how would you implement "newer than" and "older than" with a generation > count that doesn't reset in a always fixed time interval (=requiring > additional timestamps in kernel)? newer: if ((file->mtime < mtime) || ((file->mtime == mtime) && (file->gen_count < gen_count)) The Advantage here is, that even can contain some usefull info like "x modifications". Greetings Bernd ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 7:53 ` Finegrained a/c/mtime was Re: Directory notification problem Andi Kleen 2001-10-03 8:06 ` Ulrich Drepper @ 2001-10-03 17:45 ` Bernd Eckenfels 2001-10-13 15:24 ` Jamie Lokier 1 sibling, 1 reply; 24+ messages in thread From: Bernd Eckenfels @ 2001-10-03 17:45 UTC (permalink / raw) To: linux-kernel > Alex Larsson <alexl@redhat.com> writes: >> I discovered a problem with the dnotify API while fixing a FAM bug today. >> >> The problem occurs when you want to watch a file in a directory, and that >> file is changed several times in the same second. When I get the directory >> notify signal on the directory I need to stat the file to see if the >> change was actually in the file. If the file already changed in the >> current second the stat() result will be identical to the previous stat() >> call, since the resolution of mtime and ctime is one second. If you simply check the mtime and the file size you have the two most relevant parts. If neighter of those changes this means that programs using the dnotify api most likely do not need to act. After all it is not an auditing facility but a notifier for things like reload of directory listings. The only thing I could imagine can cause problems is a self reloading config file. But in that case dnotify is overkill anyway and a 1 sec delay could be asumed to be reasonable. Greetigs Bernd ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-03 17:45 ` Bernd Eckenfels @ 2001-10-13 15:24 ` Jamie Lokier 2001-10-13 16:12 ` Andi Kleen 0 siblings, 1 reply; 24+ messages in thread From: Jamie Lokier @ 2001-10-13 15:24 UTC (permalink / raw) To: Bernd Eckenfels, Ulrich Drepper, Alex Larsson, Andi Kleen, Padraig Brady, Andrew Pimlott Cc: linux-kernel This note explains how to implement file timestamps in such a way that modifications to file can always be detect reliably. Currently, programs such as `make' and other interesting applications cannot give absolute guarantees of detecting changed files. Andi Kleen says we can ignore the risk; I disagree, as there are some applications that cannot be trusted if the risk is plausible, and it can be fixed easily. Bernd Eckenfels wrote: > If you simply check the mtime and the file size you have the two most > relevant parts. If neighter of those changes this means that programs using > the dnotify api most likely do not need to act. ^^^^^^^^^^^ In other words, the API is broken for certain applications. One that springs to mind is transparent caching of JIT-compiled code between interpreter invocations. If dnotify misses the notification sometimes, the caching ceases to be transparent, and you have to switch it off for reliable behaviour, a major efficiency loss. Microsecond resolution, of course, does not fix this problem. Alex Larsson wrote: > Is a nanoseconds field the right choice though? In reality you might not > have a nanosecond resolution timer, so you would miss changes that appear > on shorter timescale than the timer resolution. Wouldn't a generation > counter, increased when ctime was updated, be a better solution? As has been pointed out, it would not be compatible with other unix systems and existing software, and timestamps have nice audit trail possibilities. I didn't realise there was enough precision left in ext2 inodes for nanosecond timestamps. Timestamps have _many_ problems: the main problem is that you can't guarantee to reliably detect a changed file. For some interesting applications this is fatal. However, you can fix timestamps and keep the best benefits of timestamps and counters: - high resolution timestamps. - whenever there is a change event, check whether the timestamp would be advanced. If not, delay the change (i.e. inside the write() call) until the clock time has advanced to the next high-resolution unit. - if you use nanoseconds, this will never occur on current machines and only rarely on faster machines. - spinning is an acceptable way to delay for such a short time. - it's not necessary to delay if nobody read the mtime since the last timestamp update, which will nearly always be the case. So even on extremely fast future machines, you would hardly ever pause. cheers, -- Jamie ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-13 15:24 ` Jamie Lokier @ 2001-10-13 16:12 ` Andi Kleen 2001-10-13 19:38 ` Jamie Lokier 0 siblings, 1 reply; 24+ messages in thread From: Andi Kleen @ 2001-10-13 16:12 UTC (permalink / raw) To: Jamie Lokier Cc: Bernd Eckenfels, Ulrich Drepper, Alex Larsson, Padraig Brady, Andrew Pimlott, linux-kernel In article <20011013172419.B20499@kushida.jlokier.co.uk>, Jamie Lokier <lk@tantalophile.demon.co.uk> writes: > Andi Kleen says we can ignore the risk; I disagree, as there are some > applications that cannot be trusted if the risk is plausible, and it can > be fixed easily. You're misquoting me badly. I said we can ignore the risk that two nanosecond resolution timestamps that get changed by two different cpus with out-of-sync cycle counter on a smp system and which are fast enough to free/aquire the inode lock in a smaller time than they're out of sync (= giving two file changes with the same ns timestamp) can be ignored. I implied on the systems that don't have a cycle counter and which use jiffie resolution gettimeofday it can be also ignored, because they're unlikely to be SMP and dying out too anyways. -Andi ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Finegrained a/c/mtime was Re: Directory notification problem 2001-10-13 16:12 ` Andi Kleen @ 2001-10-13 19:38 ` Jamie Lokier 0 siblings, 0 replies; 24+ messages in thread From: Jamie Lokier @ 2001-10-13 19:38 UTC (permalink / raw) To: Andi Kleen Cc: Bernd Eckenfels, Ulrich Drepper, Alex Larsson, Padraig Brady, Andrew Pimlott, linux-kernel Andi Kleen wrote: > > Andi Kleen says we can ignore the risk; I disagree, as there are some > > applications that cannot be trusted if the risk is plausible, and it can > > be fixed easily. > > You're misquoting me badly. I said we can ignore the risk that two > nanosecond resolution timestamps that get changed by two different cpus > with out-of-sync cycle counter on a smp system and which are fast enough > to free/aquire the inode lock in a smaller time than they're out of sync > (= giving two file changes with the same ns timestamp) can be ignored. > I implied on the systems that don't have a cycle counter and which use > jiffie resolution gettimeofday it can be also ignored, because they're > unlikely to be SMP and dying out too anyways. Andi, sorry I misrepresented your statement. I misread your original as saying that the risks due to SMP nanosecond scale synchronisation problems can be ignored. Implied from that, that the small risk of one SMP process modifying a file while another checks the timestamp can be ignored. I misread this way because others have suggested higher resolution solves the problem, and I believe it does not. As you say above, multiple modifications within a single tick are not a problem, do not have to be tracked, and therefore do not require SMP sychronisation. The SMP risk of missing a change after checking the timestamp is among the risks I consider critical for an application which must not miss the fact that a file has changed. I do not want us to repeat the mistake of 1 second at a smaller timescale. cheers, -- Jamie ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2001-10-16 20:03 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <Pine.LNX.4.33.0110022206100.29931-100000@devserv.devel.redhat.com.suse.lists.linux.kernel> 2001-10-03 7:53 ` Finegrained a/c/mtime was Re: Directory notification problem Andi Kleen 2001-10-03 8:06 ` Ulrich Drepper 2001-10-03 13:35 ` Eric W. Biederman 2001-10-03 14:11 ` Netfilter problem Kirill Ratkin 2001-10-03 21:42 ` Luigi Genoni 2001-10-03 15:24 ` Finegrained a/c/mtime was Re: Directory notification problem Gerhard Mack 2001-10-16 18:56 ` Riley Williams 2001-10-03 15:15 ` Alex Larsson 2001-10-03 21:26 ` Andi Kleen 2001-10-05 12:44 ` Padraig Brady 2001-10-05 12:59 ` Andrew Pimlott 2001-10-05 13:01 ` Andi Kleen 2001-10-05 13:15 ` Padraig Brady 2001-10-05 14:38 ` Andi Kleen 2001-10-05 15:00 ` Padraig Brady 2001-10-05 19:12 ` Andi Kleen 2001-10-08 8:39 ` Padraig Brady 2001-10-08 8:58 ` Padraig Brady 2001-10-08 10:04 ` Trond Myklebust 2001-10-05 20:22 ` Bernd Eckenfels 2001-10-03 17:45 ` Bernd Eckenfels 2001-10-13 15:24 ` Jamie Lokier 2001-10-13 16:12 ` Andi Kleen 2001-10-13 19:38 ` Jamie Lokier
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).