linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* New nanosecond stat patch for 2.5.44
@ 2002-10-27 12:13 Andi Kleen
  2002-10-27 14:33 ` New nanosecond stat patch for 2.5.44 - new patch II Andi Kleen
  2002-10-27 21:49 ` New nanosecond stat patch for 2.5.44 Andreas Dilger
  0 siblings, 2 replies; 41+ messages in thread
From: Andi Kleen @ 2002-10-27 12:13 UTC (permalink / raw)
  To: linux-kernel


Move time_t members in struct stat to struct timespec and allow subsecond
timestamps for files.  Too big to post on the list, because it edits
a lot of file systems and drivers in a straight forward way.

This is required for reliable "make" on fast computers.

File systems that support nsec storage are currently: XFS, JFS, NFSv3
(if the filesystem on the server supports it), VFAT (not quite nanosecond),
CIFS (unit in 100ns which is above what linux supports), SMBFS (for 
newer servers)

This is proposed for 2.6. 

Changes against the last version:
- Now always take xtime_lock when accessing the whole of xtime
- Port to 2.5.44
- New filesystems supported: CIFS, AFS

ftp://ftp.firstfloor.org/pub/ak/v2.5/nsec-2.5.44-1.bz2


-Andi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44 - new patch II
  2002-10-27 12:13 New nanosecond stat patch for 2.5.44 Andi Kleen
@ 2002-10-27 14:33 ` Andi Kleen
  2002-10-27 21:49 ` New nanosecond stat patch for 2.5.44 Andreas Dilger
  1 sibling, 0 replies; 41+ messages in thread
From: Andi Kleen @ 2002-10-27 14:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

> ftp://ftp.firstfloor.org/pub/ak/v2.5/nsec-2.5.44-1.bz2

This version unfortunately had some problems. I removed it now
and replaced it with 

ftp://ftp.firstfloor.org/pub/ak/v2.5/nsec-2.5.44-2.bz2

If you already got -1 please redownload.

Thank you,
-Andi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-27 12:13 New nanosecond stat patch for 2.5.44 Andi Kleen
  2002-10-27 14:33 ` New nanosecond stat patch for 2.5.44 - new patch II Andi Kleen
@ 2002-10-27 21:49 ` Andreas Dilger
  2002-10-27 22:54   ` H. Peter Anvin
                     ` (2 more replies)
  1 sibling, 3 replies; 41+ messages in thread
From: Andreas Dilger @ 2002-10-27 21:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

On Oct 27, 2002  13:13 +0100, Andi Kleen wrote:
> Move time_t members in struct stat to struct timespec and allow subsecond
> timestamps for files.  Too big to post on the list, because it edits
> a lot of file systems and drivers in a straight forward way.
> 
> This is required for reliable "make" on fast computers.
> 
> File systems that support nsec storage are currently: XFS, JFS, NFSv3
> (if the filesystem on the server supports it), VFAT (not quite nanosecond),
> CIFS (unit in 100ns which is above what linux supports), SMBFS (for 
> newer servers)

Two notes I might make about this:
1) It would be good if it were possible to select this with a config
   option (I don't care which way the default goes), so that people who
   don't need/care about the increased resolution don't need the extra
   space in their inodes and minor extra overhead.  To make this a lot
   easier to code, having something akin to the inode_update_time()
   which does all of the i_[acm]time updates as appropriate.
2) Updating i_atime based on comparing the nsec timestamp is going to be
   a killer.  I think AKPM saw dramatic performance improvements when he
   changed the code to only do the update once/second, and even though
   you are "only" updating the atime if the times are different, in
   practise this will be always.  Even without the "per superblock interval"
   you suggest we should probably only update the atime once a second (I
   don't think anything is keyed off such high resolution atimes, unlike
   make and mtime/ctime).
3) The fields you are usurping in struct stat are actually there for the
   Y2038 problem (when time_t wraps).  At least that's what Ted said when
   we were looking into nsec times for ext2/3.  Granted, we may all be
   using 64-bit systems by 2038...  I've always thought 64 bits is much
   to large for time_t, so we could always use 20 or 30 bits for sub-second
   times, and the remaining bits for extending time_t at the high end,
   and mask those off for now, but that is a separate issue...

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-27 21:49 ` New nanosecond stat patch for 2.5.44 Andreas Dilger
@ 2002-10-27 22:54   ` H. Peter Anvin
  2002-10-28  1:23     ` Chris Friesen
  2002-11-06 13:27     ` Gabriel Paubert
  2002-10-27 23:16   ` Horst von Brand
  2002-10-29 15:01   ` Bill Davidsen
  2 siblings, 2 replies; 41+ messages in thread
From: H. Peter Anvin @ 2002-10-27 22:54 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20021027214913.GA17533@clusterfs.com>
By author:    Andreas Dilger <adilger@clusterfs.com>
In newsgroup: linux.dev.kernel
> 
> 3) The fields you are usurping in struct stat are actually there for the
>    Y2038 problem (when time_t wraps).  At least that's what Ted said when
>    we were looking into nsec times for ext2/3.  Granted, we may all be
>    using 64-bit systems by 2038...  I've always thought 64 bits is much
>    to large for time_t, so we could always use 20 or 30 bits for sub-second
>    times, and the remaining bits for extending time_t at the high end,
>    and mask those off for now, but that is a separate issue...
> 

64-bit time_t is nice because you don't *ever* need to worry about
overflow; it's capable of handling times on a galactic lifespan
scale.  It's overkill, of course, but it's the *right* kind of
overkill.

We probably need to revamp struct stat anyway, to support a larger
dev_t, and possibly a larger ino_t (we should account for 64-bit ino_t
at least if we have to redesign the structure.)  At that point I would
really like to advocate for int64_t ts_sec and uint32_t ts_nsec and
quite possibly a int32_t ts_taidelta to deal with leap seconds... I'd
personally like struct timespec to look like the above everywhere.

	-hpa


-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-27 21:49 ` New nanosecond stat patch for 2.5.44 Andreas Dilger
  2002-10-27 22:54   ` H. Peter Anvin
@ 2002-10-27 23:16   ` Horst von Brand
  2002-10-28 17:10     ` Andreas Dilger
  2002-10-29 15:01   ` Bill Davidsen
  2 siblings, 1 reply; 41+ messages in thread
From: Horst von Brand @ 2002-10-27 23:16 UTC (permalink / raw)
  To: Andi Kleen, linux-kernel

Andreas Dilger <adilger@clusterfs.com> said:
> On Oct 27, 2002  13:13 +0100, Andi Kleen wrote:
> > Move time_t members in struct stat to struct timespec and allow subsecond
> > timestamps for files.  Too big to post on the list, because it edits
> > a lot of file systems and drivers in a straight forward way.
> > 
> > This is required for reliable "make" on fast computers.
> > 
> > File systems that support nsec storage are currently: XFS, JFS, NFSv3
> > (if the filesystem on the server supports it), VFAT (not quite nanosecond),
> > CIFS (unit in 100ns which is above what linux supports), SMBFS (for 
> > newer servers)
> 
> Two notes I might make about this:
> 1) It would be good if it were possible to select this with a config
>    option (I don't care which way the default goes), so that people who
>    don't need/care about the increased resolution don't need the extra
>    space in their inodes and minor extra overhead.  To make this a lot
>    easier to code, having something akin to the inode_update_time()
>    which does all of the i_[acm]time updates as appropriate.

Please don't. Do not create incompatible versions of the same filesystem
just because they were written on kernels compiled with different
configurations. Superblock flags might be OK, but what is the point then?
Better mount flags (mount with/without finegrained timestamps)?

[....]

> 3) The fields you are usurping in struct stat are actually there for the
>    Y2038 problem (when time_t wraps).  At least that's what Ted said when
>    we were looking into nsec times for ext2/3.  Granted, we may all be
>    using 64-bit systems by 2038...  I've always thought 64 bits is much
>    to large for time_t, so we could always use 20 or 30 bits for sub-second
>    times, and the remaining bits for extending time_t at the high end,
>    and mask those off for now, but that is a separate issue...

IMVHO, keeping fields in filesystems' inodes for 36 years in the future is
daydreaming. Not even the filesystems in the just 11 year old Linux have
survived unscathed... and by '38 we'll probably be by ext8 or so, under
64-bit CPUs.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-27 22:54   ` H. Peter Anvin
@ 2002-10-28  1:23     ` Chris Friesen
  2002-10-28  1:35       ` Rob Landley
  2002-11-06 13:27     ` Gabriel Paubert
  1 sibling, 1 reply; 41+ messages in thread
From: Chris Friesen @ 2002-10-28  1:23 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

H. Peter Anvin wrote:

> We probably need to revamp struct stat anyway, to support a larger
> dev_t, and possibly a larger ino_t (we should account for 64-bit ino_t
> at least if we have to redesign the structure.)  At that point I would
> really like to advocate for int64_t ts_sec and uint32_t ts_nsec and
> quite possibly a int32_t ts_taidelta to deal with leap seconds... I'd
> personally like struct timespec to look like the above everywhere.

For filesystems can we get away with just the 64-bit nanoseconds?  By my 
calculations that gives something like 584 years--do we need to worry 
about files older than that?

Chris



-- 
Chris Friesen                    | MailStop: 043/33/F10
Nortel Networks                  | work: (613) 765-0557
3500 Carling Avenue              | fax:  (613) 765-2986
Nepean, ON K2H 8E9 Canada        | email: cfriesen@nortelnetworks.com


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-28  1:23     ` Chris Friesen
@ 2002-10-28  1:35       ` Rob Landley
  0 siblings, 0 replies; 41+ messages in thread
From: Rob Landley @ 2002-10-28  1:35 UTC (permalink / raw)
  To: Chris Friesen, H. Peter Anvin; +Cc: linux-kernel

On Sunday 27 October 2002 19:23, Chris Friesen wrote:
> H. Peter Anvin wrote:
> > We probably need to revamp struct stat anyway, to support a larger
> > dev_t, and possibly a larger ino_t (we should account for 64-bit ino_t
> > at least if we have to redesign the structure.)  At that point I would
> > really like to advocate for int64_t ts_sec and uint32_t ts_nsec and
> > quite possibly a int32_t ts_taidelta to deal with leap seconds... I'd
> > personally like struct timespec to look like the above everywhere.
>
> For filesystems can we get away with just the 64-bit nanoseconds?  By my
> calculations that gives something like 584 years--do we need to worry
> about files older than that?

1) The hard drive is only about 50 years old, so there aren't any files older 
than that at the moment:
http://www.mdhc.scu.edu/100th/reyjohnson.htm

2) This thing is unlikely to be a problem in our lifetimes, our 
grandchildren's lifetimes, or our great grandchildren's lifetimes (barring 
unforseen advances in active telomere reconstruction and a regenerative 
interpretation of DNA that somehow looks at it as a blueprint rather than a 
recipe).

3) If any current hardware or software is still in use in the year 2554, it 
will be seriously overdue for an upgrade.

Rob

-- 
http://penguicon.sf.net - Terry Pratchett, Eric Raymond, Pete Abrams, Illiad, 
CmdrTaco, liquid nitrogen ice cream, and caffienated jello.  Well why not?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-27 23:16   ` Horst von Brand
@ 2002-10-28 17:10     ` Andreas Dilger
  0 siblings, 0 replies; 41+ messages in thread
From: Andreas Dilger @ 2002-10-28 17:10 UTC (permalink / raw)
  To: Horst von Brand; +Cc: Andi Kleen, linux-kernel

On Oct 27, 2002  20:16 -0300, Horst von Brand wrote:
> Andreas Dilger <adilger@clusterfs.com> said:
> > 1) It would be good if it were possible to select this with a config
> >    option (I don't care which way the default goes), so that people who
> >    don't need/care about the increased resolution don't need the extra
> >    space in their inodes and minor extra overhead.  To make this a lot
> >    easier to code, having something akin to the inode_update_time()
> >    which does all of the i_[acm]time updates as appropriate.
> 
> Please don't. Do not create incompatible versions of the same filesystem
> just because they were written on kernels compiled with different
> configurations. Superblock flags might be OK, but what is the point then?
> Better mount flags (mount with/without finegrained timestamps)?

I don't say anything about creating incompatible versions of the same
filesystem.  Configuring out nsec timestamps is no different than what
we have today.  Many filesystems do not support nsec timestamps anyways.

I just see this as one of many hundreds of "tiny" features that are
added to Linux that could easily be made a config option when they
are first added, but all just end up adding a tiny bit of bloat for
people that don't need it.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-27 21:49 ` New nanosecond stat patch for 2.5.44 Andreas Dilger
  2002-10-27 22:54   ` H. Peter Anvin
  2002-10-27 23:16   ` Horst von Brand
@ 2002-10-29 15:01   ` Bill Davidsen
  2002-10-29 16:30     ` Andreas Dilger
  2 siblings, 1 reply; 41+ messages in thread
From: Bill Davidsen @ 2002-10-29 15:01 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Andi Kleen, linux-kernel

On Sun, 27 Oct 2002, Andreas Dilger wrote:

> Two notes I might make about this:
> 1) It would be good if it were possible to select this with a config
>    option (I don't care which way the default goes), so that people who
>    don't need/care about the increased resolution don't need the extra
>    space in their inodes and minor extra overhead.  To make this a lot
>    easier to code, having something akin to the inode_update_time()
>    which does all of the i_[acm]time updates as appropriate.

Am I missing something? That would make it two file types, no? I bet
there's more overhead in handling that problem than just writing the time.

> 2) Updating i_atime based on comparing the nsec timestamp is going to be
>    a killer.  I think AKPM saw dramatic performance improvements when he
>    changed the code to only do the update once/second, and even though
>    you are "only" updating the atime if the times are different, in
>    practise this will be always.  Even without the "per superblock interval"
>    you suggest we should probably only update the atime once a second (I
>    don't think anything is keyed off such high resolution atimes, unlike
>    make and mtime/ctime).

find -anewer seems to use as much resolution as it has. More to the point,
what is the overhead of updating the time when an i/o is done? It would
seem pretty trivial.

If you are willing to give up a flag bit you could store the time in some
native unit (machine type dependent) when an i/o is done, then do the
convert to ns when it's used, such as compare, close, etc. You could have
an inode walker thread do the convert in background if that seems needed.
There are probably other ways to reduce overhead, those just came to mind.
I think it's a pretty low impact problem with some effort on making it so.

> 3) The fields you are usurping in struct stat are actually there for the
>    Y2038 problem (when time_t wraps).  At least that's what Ted said when
>    we were looking into nsec times for ext2/3.  Granted, we may all be
>    using 64-bit systems by 2038...  I've always thought 64 bits is much
>    to large for time_t, so we could always use 20 or 30 bits for sub-second
>    times, and the remaining bits for extending time_t at the high end,
>    and mask those off for now, but that is a separate issue...

As you say, but good that you brought it up!

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-29 15:01   ` Bill Davidsen
@ 2002-10-29 16:30     ` Andreas Dilger
  2002-10-29 20:37       ` Bill Davidsen
  0 siblings, 1 reply; 41+ messages in thread
From: Andreas Dilger @ 2002-10-29 16:30 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Andi Kleen, linux-kernel

On Oct 29, 2002  10:01 -0500, Bill Davidsen wrote:
> On Sun, 27 Oct 2002, Andreas Dilger wrote:
> > 1) It would be good if it were possible to select this with a config
> >    option (I don't care which way the default goes), so that people who
> >    don't need/care about the increased resolution don't need the extra
> >    space in their inodes and minor extra overhead.  To make this a lot
> >    easier to code, having something akin to the inode_update_time()
> >    which does all of the i_[acm]time updates as appropriate.
> 
> Am I missing something? That would make it two file types, no? I bet
> there's more overhead in handling that problem than just writing the time.

Not necessarily.  Most filesystems don't even have space for storing a
sub-second time resolution, so having the extra time resolution is
irrelevant.  For filesystems which do have room for sub-second timestamps
they currently just fill in 0 there, and if the sub-second time is here
they will fill in that field, so still no incompatible on-disk formats.

As for ext3 having sub-second timestamps, this will be done in a way
which makes it compatible with older filesystem, so whether those
timestamps are written or not written, the filesystem will still be
readable on older kernels.

The "inode" space that I'm referring to is the in-memory inode struct,
and the presence of that would be determined at compile time.  Granted,
it would only be 12 bytes added to the inode, but if you have thousands
or millions of inodes resident you start to feel the pinch.

> > 2) Updating i_atime based on comparing the nsec timestamp is going to be
> >    a killer.  I think AKPM saw dramatic performance improvements when he
> >    changed the code to only do the update once/second, and even though
> >    you are "only" updating the atime if the times are different, in
> >    practise this will be always.  Even without the "per superblock interval"
> >    you suggest we should probably only update the atime once a second (I
> >    don't think anything is keyed off such high resolution atimes, unlike
> >    make and mtime/ctime).
> 
> find -anewer seems to use as much resolution as it has. More to the point,
> what is the overhead of updating the time when an i/o is done? It would
> seem pretty trivial.

It would be trivial if you are already updating the inode (and we should
optimize for this case), but if you are reading a file in 5-byte chunks
and you update the atime a thousand times a second it most certainly IS
a lot of overhead.  We currently limit atime updates to 1/second by
checking if the atime has changed or not.  The proposed patch checks if
the atime.ts_nsec has changed, and it most certainly will have, so this
will always be updating the atime on disk.

Cheers, Andreas
--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-29 16:30     ` Andreas Dilger
@ 2002-10-29 20:37       ` Bill Davidsen
  2002-10-30  0:44         ` Jamie Lokier
  0 siblings, 1 reply; 41+ messages in thread
From: Bill Davidsen @ 2002-10-29 20:37 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Andi Kleen, linux-kernel

On Tue, 29 Oct 2002, Andreas Dilger wrote:

> On Oct 29, 2002  10:01 -0500, Bill Davidsen wrote:
> > On Sun, 27 Oct 2002, Andreas Dilger wrote:
> > > 1) It would be good if it were possible to select this with a config
> > >    option (I don't care which way the default goes), so that people who
> > >    don't need/care about the increased resolution don't need the extra
> > >    space in their inodes and minor extra overhead.  To make this a lot
> > >    easier to code, having something akin to the inode_update_time()
> > >    which does all of the i_[acm]time updates as appropriate.
> > 
> > Am I missing something? That would make it two file types, no? I bet
> > there's more overhead in handling that problem than just writing the time.
> 
> Not necessarily.  Most filesystems don't even have space for storing a
> sub-second time resolution, so having the extra time resolution is
> irrelevant.  For filesystems which do have room for sub-second timestamps
> they currently just fill in 0 there, and if the sub-second time is here
> they will fill in that field, so still no incompatible on-disk formats.

That was my concern.
 
> As for ext3 having sub-second timestamps, this will be done in a way
> which makes it compatible with older filesystem, so whether those
> timestamps are written or not written, the filesystem will still be
> readable on older kernels.

I was more thinking of a kernel compiled without the hi-res timer code, if
that should be done as an option.
 
> The "inode" space that I'm referring to is the in-memory inode struct,
> and the presence of that would be determined at compile time.  Granted,
> it would only be 12 bytes added to the inode, but if you have thousands
> or millions of inodes resident you start to feel the pinch.

I admit to being one of the "thousands" people, and even if I have 100k
inodes (more likely to be 10% of that) it's in the order of a MB, and any
machine which has 100k inodes open is likely to be large enough to ignore
a MB. One advantage of keeping the HRT in the in-core inode is that it
allows parallel make to work correctly even on a filesystem which doesn't
have space to save that information.

Feel free to tell me if that last isn't true.
 
> > > 2) Updating i_atime based on comparing the nsec timestamp is going to be
> > >    a killer.  I think AKPM saw dramatic performance improvements when he
> > >    changed the code to only do the update once/second, and even though
> > >    you are "only" updating the atime if the times are different, in
> > >    practise this will be always.  Even without the "per superblock interval"
> > >    you suggest we should probably only update the atime once a second (I
> > >    don't think anything is keyed off such high resolution atimes, unlike
> > >    make and mtime/ctime).
> > 
> > find -anewer seems to use as much resolution as it has. More to the point,
> > what is the overhead of updating the time when an i/o is done? It would
> > seem pretty trivial.
> 
> It would be trivial if you are already updating the inode (and we should
> optimize for this case), but if you are reading a file in 5-byte chunks
> and you update the atime a thousand times a second it most certainly IS
> a lot of overhead.  We currently limit atime updates to 1/second by
> checking if the atime has changed or not.  The proposed patch checks if
> the atime.ts_nsec has changed, and it most certainly will have, so this
> will always be updating the atime on disk.

1 - any program which does unbuffered 5 byte reads is probably going to
beat the machine to death anyway. Then the sysadmin will mount noatime.

2 - The patch isn't written in stone, going back to one per second
shouldn't matter except in the case of network or devices shared between
multiple systems (3.0?). processes on the same machine whould use the
in-core information.

3 - updating once/sec could still be default, with HRT being a mount
option like noatime.

4 - the time could be stored in register values, ticks, or whatever else,
avoiding any conversion to ns. Then the time could be converted only when
the inode was read, written out, etc. 

I'd really like your comments on these, you probably see things I've
missed.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-29 20:37       ` Bill Davidsen
@ 2002-10-30  0:44         ` Jamie Lokier
  2002-10-30 21:12           ` Bill Davidsen
  0 siblings, 1 reply; 41+ messages in thread
From: Jamie Lokier @ 2002-10-30  0:44 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Andreas Dilger, Andi Kleen, linux-kernel

Bill Davidsen wrote:
> I admit to being one of the "thousands" people, and even if I have 100k
> inodes (more likely to be 10% of that) it's in the order of a MB, and any
> machine which has 100k inodes open is likely to be large enough to ignore
> a MB. One advantage of keeping the HRT in the in-core inode is that it
> allows parallel make to work correctly even on a filesystem which doesn't
> have space to save that information.
> 
> Feel free to tell me if that last isn't true.

It isn't true if the parallel make actually uses your RAM for
something, thus flushing some of the inodes from RAM.

Admittedly it is no worse than we have at the moment.  However, at the
moment it is possible, to construct a "make" or other program of that
ilk which can always make a safe decision: if it's ambiguous whether a
file needs to be remade, then remake the file.

As soon as we have inodes time stamp resolution being spontanously
lowered (because some of the inodes are flushed from RAM and some
aren't), then it's not possible to make a safe program like that
anymore, unless you simply ignore the high resolution time stamps
_all_ the time, even when they are present.

You can just do that - it's correct behaviour.  But it would be better
to use the high precision when available, as that reduces the number
of unnecessary remakes.

> 4 - the time could be stored in register values, ticks, or whatever else,
> avoiding any conversion to ns. Then the time could be converted only when
> the inode was read, written out, etc. 
> 
> I'd really like your comments on these, you probably see things I've
> missed.

I know of exactly one application which depends on atime information:
checking whether you have new mail in your inbox.  That's done by
comparing atime and mtime on the mailbox.  Mail readers read the file
after writing it, MTAs will simply write it.

For this to function correctly, what's important is that the atime is
updated to be at least the mtime.  So for nanosecond atime updates, it
makes sense that the _first_ read following a write should update the
atime -- if not using the current clock, then simply copying the mtime
value.

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-30  0:44         ` Jamie Lokier
@ 2002-10-30 21:12           ` Bill Davidsen
  2002-10-30 22:17             ` Jamie Lokier
  0 siblings, 1 reply; 41+ messages in thread
From: Bill Davidsen @ 2002-10-30 21:12 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andreas Dilger, Andi Kleen, linux-kernel

On Wed, 30 Oct 2002, Jamie Lokier wrote:

> Bill Davidsen wrote:
> > I admit to being one of the "thousands" people, and even if I have 100k
> > inodes (more likely to be 10% of that) it's in the order of a MB, and any
> > machine which has 100k inodes open is likely to be large enough to ignore
> > a MB. One advantage of keeping the HRT in the in-core inode is that it
> > allows parallel make to work correctly even on a filesystem which doesn't
> > have space to save that information.
> > 
> > Feel free to tell me if that last isn't true.
> 
> It isn't true if the parallel make actually uses your RAM for
> something, thus flushing some of the inodes from RAM.

Hopefully it is being smart about doing that, or rather not doing that.
But that would be a good thing to add to my responsiveness benchmark, to
access a file, do a stat, and then do another stat later. Thanks for the
idea, I expect to release a new version sometime this weekend.
 
> Admittedly it is no worse than we have at the moment.  However, at the
> moment it is possible, to construct a "make" or other program of that
> ilk which can always make a safe decision: if it's ambiguous whether a
> file needs to be remade, then remake the file.
> 
> As soon as we have inodes time stamp resolution being spontanously
> lowered (because some of the inodes are flushed from RAM and some
> aren't), then it's not possible to make a safe program like that
> anymore, unless you simply ignore the high resolution time stamps
> _all_ the time, even when they are present.
> 
> You can just do that - it's correct behaviour.  But it would be better
> to use the high precision when available, as that reduces the number
> of unnecessary remakes.

I have to think about the point you raise of doing it one way or the other
but not mixing. I had assumed that the inode of a file which was open
would remain in core, and I want to look at the code before I form an
opinion. If the file is not open or the inode is a non-file...
 
> > 4 - the time could be stored in register values, ticks, or whatever else,
> > avoiding any conversion to ns. Then the time could be converted only when
> > the inode was read, written out, etc. 
> > 
> > I'd really like your comments on these, you probably see things I've
> > missed.
> 
> I know of exactly one application which depends on atime information:
> checking whether you have new mail in your inbox.  That's done by
> comparing atime and mtime on the mailbox.  Mail readers read the file
> after writing it, MTAs will simply write it.
> 
> For this to function correctly, what's important is that the atime is
> updated to be at least the mtime.  So for nanosecond atime updates, it
> makes sense that the _first_ read following a write should update the
> atime -- if not using the current clock, then simply copying the mtime
> value.

I think you may have missed the point of (4), some of the overhead of
keeping HRT is the conversion of data to ns from some machine dependent
information. Where possible the base information, such as a register,
could be stored with a flag, avoiding the "convert to ns" CPU usage. The
conversion could be done when the data was used, before save, at the time
of a stat, etc. I have the feeling that would take some of the sting out
of keeping HRT. It doesn't matter if it's atime, mtime or ctime, the atime
was in response to "nobody uses HRT atime" in an earlier post.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-30 21:12           ` Bill Davidsen
@ 2002-10-30 22:17             ` Jamie Lokier
  2002-10-31  0:34               ` H. Peter Anvin
  2002-11-01  1:57               ` Bill Davidsen
  0 siblings, 2 replies; 41+ messages in thread
From: Jamie Lokier @ 2002-10-30 22:17 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Andreas Dilger, Andi Kleen, linux-kernel

Bill Davidsen wrote:
> I have to think about the point you raise of doing it one way or the other
> but not mixing. I had assumed that the inode of a file which was open
> would remain in core, and I want to look at the code before I form an
> opinion. If the file is not open or the inode is a non-file...

Oh, the inode of a file which is open does remain in core.  It's just
that between runs of a program like "make", the file's aren't open are
they?

> I think you may have missed the point of (4), some of the overhead of
> keeping HRT is the conversion of data to ns from some machine dependent
> information. Where possible the base information, such as a register,
> could be stored with a flag, avoiding the "convert to ns" CPU usage. The
> conversion could be done when the data was used, before save, at the time
> of a stat, etc. I have the feeling that would take some of the sting out
> of keeping HRT. It doesn't matter if it's atime, mtime or ctime, the atime
> was in response to "nobody uses HRT atime" in an earlier post.

That's some of the overhead.  The other overhead is reading the clock,
which is quite high on x86 when TSC is not available.  On a Pentium
with no reliable TSC, I think that the time for a read() system call
is comparable to the time to read the clock.

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-30 22:17             ` Jamie Lokier
@ 2002-10-31  0:34               ` H. Peter Anvin
  2002-11-01  1:57               ` Bill Davidsen
  1 sibling, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2002-10-31  0:34 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20021030221724.GA25231@bjl1.asuk.net>
By author:    Jamie Lokier <lk@tantalophile.demon.co.uk>
In newsgroup: linux.dev.kernel
> 
> That's some of the overhead.  The other overhead is reading the clock,
> which is quite high on x86 when TSC is not available.  On a Pentium
> with no reliable TSC, I think that the time for a read() system call
> is comparable to the time to read the clock.
> 

Typically the way you deal with not having a usably cheap
nanosecond-resolution clock is that you use the best available clock
(say if HZ=1000 you'll increment by 1000000 each timer tick), and then
simply use an atomic counter for the smaller divisions.  This makes
the relation "is A newer than B" correct, while avoiding the overhead
of producing exact timestamps below the available resolution.

	-hpa
-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
"Unix gives you enough rope to shoot yourself in the foot."
http://www.zytor.com/~hpa/puzzle.txt	<amsp@zytor.com>

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-30 22:17             ` Jamie Lokier
  2002-10-31  0:34               ` H. Peter Anvin
@ 2002-11-01  1:57               ` Bill Davidsen
  2002-11-01  3:32                 ` Jamie Lokier
  1 sibling, 1 reply; 41+ messages in thread
From: Bill Davidsen @ 2002-11-01  1:57 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Andreas Dilger, Andi Kleen, linux-kernel

On Wed, 30 Oct 2002, Jamie Lokier wrote:

> Bill Davidsen wrote:
> > I have to think about the point you raise of doing it one way or the other
> > but not mixing. I had assumed that the inode of a file which was open
> > would remain in core, and I want to look at the code before I form an
> > opinion. If the file is not open or the inode is a non-file...
> 
> Oh, the inode of a file which is open does remain in core.  It's just
> that between runs of a program like "make", the file's aren't open are
> they?

I thought we were talking about parallel make, rather than "between runs."
Your point is valid, but given the certainty that the inode has been
recently used, hopefully the kernel is smart on releasing them.

My first thought is that the commonly used filesystems, other than ext2,
do or will support high resolution time. NFS is its own nasty little
problem.
 
> > I think you may have missed the point of (4), some of the overhead of
> > keeping HRT is the conversion of data to ns from some machine dependent
> > information. Where possible the base information, such as a register,
> > could be stored with a flag, avoiding the "convert to ns" CPU usage. The
> > conversion could be done when the data was used, before save, at the time
> > of a stat, etc. I have the feeling that would take some of the sting out
> > of keeping HRT. It doesn't matter if it's atime, mtime or ctime, the atime
> > was in response to "nobody uses HRT atime" in an earlier post.
> 
> That's some of the overhead.  The other overhead is reading the clock,
> which is quite high on x86 when TSC is not available.  On a Pentium
> with no reliable TSC, I think that the time for a read() system call
> is comparable to the time to read the clock.

Who uses a CPU without TSC? I guess the embedded folks and the people
using really old systems. There was a suggestion on handling that posted,
but I don't have it handy. Using the field as just a counter was the idea
if I remember correctly. The NUMA folks have their own set of problems, I
won't presume to even have an opinion on how they solve it, but if it
needs doing I'm sure they can do it.

Thinking out loud:
  To avoid overhead, the kernel needs to be smart about when the updated
inode info is written to storage Perhaps on writes when the data written
actually falls off the elevator or transferred to a network peer.  Until
then the time can stay in memory, if the system goes down write data is
lost, so having the inode reflect the time of the last completed write to
storage isn't wildly wrong mtime. 

  For reads, having some bounded delay between the time of a system call
to read() and the time saved in the inode is of limited impact, as long as
the time to update the inode to storage doesn't get wildly behind the time
of the read. The one second you mentioned is probably aggressive if
anything. That might have to be a tunable.

  I haven't forgotten access via execute, I don't know if it differs from
read in practice.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-11-01  1:57               ` Bill Davidsen
@ 2002-11-01  3:32                 ` Jamie Lokier
  0 siblings, 0 replies; 41+ messages in thread
From: Jamie Lokier @ 2002-11-01  3:32 UTC (permalink / raw)
  To: Bill Davidsen; +Cc: Andreas Dilger, Andi Kleen, linux-kernel

Bill Davidsen wrote:
> > Oh, the inode of a file which is open does remain in core.  It's just
> > that between runs of a program like "make", the file's aren't open are
> > they?
> 
> I thought we were talking about parallel make, rather than "between runs."

A parallel build often does call "make" separately many times, in
parallel but not guaranteed to overlap all file opens.  Between those,
the files are closed.

> Your point is valid, but given the certainty that the inode has been
> recently used, hopefully the kernel is smart on releasing them.

That's a "hopefully", and it depends on how much RAM you have as well
as pure luck.  I can live with that for building programs at home, but
there are many applications where "hopefully" affecting correctness of
behaviour is not acceptable.

> My first thought is that the commonly used filesystems, other than ext2,
> do or will support high resolution time. NFS is its own nasty little
> problem.

Do they support nanosecond time, though, or do they round it to
microseconds or something like that?

> [stuff about atime]

There seems to be general agreement that atime is not a very important
value, with which I concur.  (Why do we even bother with nanosecond atimes?)

I am only concerned about mtime, which is very useful indeed when we
talk about building things which can detect changes to files.

Andi, I belive there is space in every architecture's stat64 (i.e. all
those that have one) for a word describing the mtime resolution.  If I
code a patch to create that field, would you be interested?

-- Jamie

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-10-27 22:54   ` H. Peter Anvin
  2002-10-28  1:23     ` Chris Friesen
@ 2002-11-06 13:27     ` Gabriel Paubert
  2002-11-06 18:00       ` H. Peter Anvin
  1 sibling, 1 reply; 41+ messages in thread
From: Gabriel Paubert @ 2002-11-06 13:27 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel



On 31 Oct 2002, H. Peter Anvin wrote:

> Followup to:  <20021027214913.GA17533@clusterfs.com>
> By author:    Andreas Dilger <adilger@clusterfs.com>
> In newsgroup: linux.dev.kernel
> >
> > 3) The fields you are usurping in struct stat are actually there for the
> >    Y2038 problem (when time_t wraps).  At least that's what Ted said when
> >    we were looking into nsec times for ext2/3.  Granted, we may all be
> >    using 64-bit systems by 2038...  I've always thought 64 bits is much
> >    to large for time_t, so we could always use 20 or 30 bits for sub-second
> >    times, and the remaining bits for extending time_t at the high end,
> >    and mask those off for now, but that is a separate issue...
> >
>
> 64-bit time_t is nice because you don't *ever* need to worry about
> overflow; it's capable of handling times on a galactic lifespan
> scale.  It's overkill, of course, but it's the *right* kind of
> overkill.

Indeed.

>
> We probably need to revamp struct stat anyway, to support a larger
> dev_t, and possibly a larger ino_t (we should account for 64-bit ino_t
> at least if we have to redesign the structure.)  At that point I would
> really like to advocate for int64_t ts_sec and uint32_t ts_nsec and
> quite possibly a int32_t ts_taidelta to deal with leap seconds... I'd
> personally like struct timespec to look like the above everywhere.

I basically agree but I suspect that filesystem writers will not be very
happy if you want to use 16 bytes for each timestamp, especially when 8 of
the bytes (the 32 high order bits from the second count and the TAI-UT
offset) do not change very often. (besides that tv_nsec is defined as a
long, i.e.  64 bit on 64 bit machines and _signed_ , stupid if you ask me
but I digress).

The goal as I understand it is to avoid first the possibility of ambiguous
timestamps, but then we have to be careful also not to break existing
applications (although they already broken wrt leap seconds).

I don't know how to trim the highly repeated most significant bytes of the
tv_sec field (it's probably file system specific), but 4 bytes can easily
be shaved from the on-disk structure by packing the leap second
information in the high order bits of the nsec field: since the number of
nanoseconds per second is unlikely to ever need more than 30 bits to be
encoded ;-), the 2 most significant bits can be used to encode inserted
leap seconds. Actually 1 bit should be sufficient but some texts claim
that up to 2 leap seconds can be inserted, this has however actually never
happened AFAICT and I believe that NTP for example does not support 2 leap
seconds in a row.

Converting this encoding to the format you suggest for stat(2) is trivial:
it only needs a table of leap seconds. I don't care whether it's in the
kernel or in user space: it's small and grows slowly.

For now I have more problems with the fact that gettimeofday and friends
do not properly handle leap seconds and lead to ambiguous timestamps.
Once this problem (a real killer for astronomical data acquisition, leap
seconds are infrequent but they are a problem) is solved, filesystems can
be updated.

What could be important now is to mask the low 30 bits of the nsec field
and declare the 2 MSB reserved so that no kernel is out in the wild that
simply copies the full nsec field to user space.

	Regards,
	Gabriel.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: New nanosecond stat patch for 2.5.44
  2002-11-06 13:27     ` Gabriel Paubert
@ 2002-11-06 18:00       ` H. Peter Anvin
  0 siblings, 0 replies; 41+ messages in thread
From: H. Peter Anvin @ 2002-11-06 18:00 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linux-kernel

Gabriel Paubert wrote:
> 
> I basically agree but I suspect that filesystem writers will not be very
> happy if you want to use 16 bytes for each timestamp, especially when 8 of
> the bytes (the 32 high order bits from the second count and the TAI-UT
> offset) do not change very often. (besides that tv_nsec is defined as a
> long, i.e.  64 bit on 64 bit machines and _signed_ , stupid if you ask me
> but I digress).
> 

The filesystem writers can compact things as they see fit.  I'm mostly 
talking about the stat(2) format.

	-hpa


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
       [not found] <3ED66C83.8070608@austin.ibm.com.suse.lists.linux.kernel>
@ 2003-05-29 21:11 ` Andi Kleen
  2003-05-29 21:25   ` David S. Miller
                     ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Andi Kleen @ 2003-05-29 21:11 UTC (permalink / raw)
  To: Mark Peloquin; +Cc: linux-kernel, davem

Mark Peloquin <peloquin@austin.ibm.com> writes:

> We have dedicated a machine and thrown together some scripts that will grab
> and build the latest kernel files, execute the regression suite,
> collecting (hopefully)
> enough system state information to allow meaningful analysis of any peculiar
> results encountered.

How about doing a LTP run too with some difference file for new FAILs/BROKs ?
That's not strictly a benchmark, but would help catching regressions
quickly.

I notice your benchmark mix is very IO heavy, it would be nice to test other
aspects of the system too. Perhaps lmbench and reaim compute workload?

It would be nice if we had a new linux-testresults list where such
updates could be posted regularly. I don't think it belong on l-k
because it would be too noisy. Perhaps such a list could be added to 
vger. David, what do you think?

-Andi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:11 ` Nightly regression runs against current bk tree Andi Kleen
@ 2003-05-29 21:25   ` David S. Miller
  2003-05-29 21:29     ` Andi Kleen
  2003-05-29 22:51     ` Mark Peloquin
  2003-05-29 22:03   ` Nathan
  2003-05-29 22:48   ` Mark Peloquin
  2 siblings, 2 replies; 41+ messages in thread
From: David S. Miller @ 2003-05-29 21:25 UTC (permalink / raw)
  To: ak; +Cc: peloquin, linux-kernel

   From: Andi Kleen <ak@suse.de>
   Date: 29 May 2003 23:11:17 +0200
   
   David, what do you think?
   
Would it have a single poster?

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:25   ` David S. Miller
@ 2003-05-29 21:29     ` Andi Kleen
  2003-05-29 21:38       ` Randy.Dunlap
  2003-05-29 21:50       ` Craig Thomas
  2003-05-29 22:51     ` Mark Peloquin
  1 sibling, 2 replies; 41+ messages in thread
From: Andi Kleen @ 2003-05-29 21:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, peloquin, linux-kernel

On Thu, May 29, 2003 at 02:25:15PM -0700, David S. Miller wrote:
>    From: Andi Kleen <ak@suse.de>
>    Date: 29 May 2003 23:11:17 +0200
>    
>    David, what do you think?
>    
> Would it have a single poster?

OSDL, Mark's IBM team and possible LTP ?

I assume there will be more once the list exists; automated regression 
tests seem to be currently in fashion.

-Andi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:29     ` Andi Kleen
@ 2003-05-29 21:38       ` Randy.Dunlap
  2003-05-29 21:48         ` David S. Miller
  2003-05-29 21:50       ` Craig Thomas
  1 sibling, 1 reply; 41+ messages in thread
From: Randy.Dunlap @ 2003-05-29 21:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: davem, ak, peloquin, linux-kernel

On Thu, 29 May 2003 23:29:29 +0200 Andi Kleen <ak@suse.de> wrote:

| On Thu, May 29, 2003 at 02:25:15PM -0700, David S. Miller wrote:
| >    From: Andi Kleen <ak@suse.de>
| >    Date: 29 May 2003 23:11:17 +0200
| >    
| >    David, what do you think?
| >    
| > Would it have a single poster?
| 
| OSDL, Mark's IBM team and possible LTP ?
| 
| I assume there will be more once the list exists; automated regression 
| tests seem to be currently in fashion.

If DaveM doesn't want to do it, I think that we can do it.
(I say without checking.... :)

--
~Randy

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:38       ` Randy.Dunlap
@ 2003-05-29 21:48         ` David S. Miller
  2003-05-29 22:04           ` Craig Thomas
  0 siblings, 1 reply; 41+ messages in thread
From: David S. Miller @ 2003-05-29 21:48 UTC (permalink / raw)
  To: rddunlap; +Cc: ak, peloquin, linux-kernel

   From: "Randy.Dunlap" <rddunlap@osdl.org>
   Date: Thu, 29 May 2003 14:38:20 -0700

   On Thu, 29 May 2003 23:29:29 +0200 Andi Kleen <ak@suse.de> wrote:
   
   | On Thu, May 29, 2003 at 02:25:15PM -0700, David S. Miller wrote:
   | > Would it have a single poster?
   | 
   | OSDL, Mark's IBM team and possible LTP ?
   | 
   | I assume there will be more once the list exists; automated regression 
   | tests seem to be currently in fashion.
   
   If DaveM doesn't want to do it, I think that we can do it.
   (I say without checking.... :)

Please do :-)

The issue is that I'm easier about adding a new list if I can
restrict the poster list.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:29     ` Andi Kleen
  2003-05-29 21:38       ` Randy.Dunlap
@ 2003-05-29 21:50       ` Craig Thomas
  2003-05-29 22:03         ` Andi Kleen
  1 sibling, 1 reply; 41+ messages in thread
From: Craig Thomas @ 2003-05-29 21:50 UTC (permalink / raw)
  To: Andi Kleen; +Cc: David S. Miller, peloquin, linux-kernel

On Thu, 2003-05-29 at 14:29, Andi Kleen wrote:
> On Thu, May 29, 2003 at 02:25:15PM -0700, David S. Miller wrote:
> >    From: Andi Kleen <ak@suse.de>
> >    Date: 29 May 2003 23:11:17 +0200
> >    
> >    David, what do you think?
> >    
> > Would it have a single poster?
> 
> OSDL, Mark's IBM team and possible LTP ?
> 
> I assume there will be more once the list exists; automated regression 
> tests seem to be currently in fashion.
> 
> -Andi
> -

OSDL has a linux stabilization web page where several tests are run
automatically when a new kernel is built.  It currently runs Linus'
kernel as well as the -mm series.  We do run LTP, I/O tests, memory
tests, reaim, and database tests as part of an automated regression 
run. Some of you are familiar with the web page, but for those who are
not, it is located here: http://www.osdl.org/projects/linstab/  

In addition, there are links to other sites, most notably IBM's LTC
test results.

We have just completed a physical move to a new office and we believe
we have all of our systems working again, so test results for the
latest kernels are a bit behind.  We hope to have completed runs for
all tests by the weekend.  Note, we are experiencing some test failures
but we suspect it is due to the move and not the kernels at the moment.

-- 
Craig Thomas
craiger@osdl.org


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:11 ` Nightly regression runs against current bk tree Andi Kleen
  2003-05-29 21:25   ` David S. Miller
@ 2003-05-29 22:03   ` Nathan
  2003-05-29 23:08     ` Mark Peloquin
  2003-05-29 22:48   ` Mark Peloquin
  2 siblings, 1 reply; 41+ messages in thread
From: Nathan @ 2003-05-29 22:03 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mark Peloquin, linux-kernel, davem

On Thu, May 29, 2003 at 11:11:17PM +0200, Andi Kleen wrote:
> It would be nice if we had a new linux-testresults list where such
> updates could be posted regularly. I don't think it belong on l-k
> because it would be too noisy. Perhaps such a list could be added to 
> vger. David, what do you think?

The OSDL has a serious amount of automated testing we could point the
results of to a separate list if it is created.

Right now we avoid pointing that sort of thing to l-k because it would
drive people nuts.  On average we complete 40+ tests a day.

With all the testing efforts going on, a central list to post and
analyze results would be good.  People interested in helping out could
easily work with testers to look for trends and help with root cause
analysis.

When results are found to contain significant data, we can always notify l-k.

-Nathan Dabney

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:50       ` Craig Thomas
@ 2003-05-29 22:03         ` Andi Kleen
  2003-05-29 22:16           ` Cliff White
                             ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Andi Kleen @ 2003-05-29 22:03 UTC (permalink / raw)
  To: Craig Thomas; +Cc: Andi Kleen, David S. Miller, peloquin, linux-kernel

> OSDL has a linux stabilization web page where several tests are run

[...] Would you be willing to change your scripts to report
any new results to this new list?

-Andi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:48         ` David S. Miller
@ 2003-05-29 22:04           ` Craig Thomas
  2003-05-29 22:05             ` Andi Kleen
  0 siblings, 1 reply; 41+ messages in thread
From: Craig Thomas @ 2003-05-29 22:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: rddunlap, ak, peloquin, linux-kernel

On Thu, 2003-05-29 at 14:48, David S. Miller wrote:
>    From: "Randy.Dunlap" <rddunlap@osdl.org>
>    Date: Thu, 29 May 2003 14:38:20 -0700
> 
>    On Thu, 29 May 2003 23:29:29 +0200 Andi Kleen <ak@suse.de> wrote:
>    
>    | On Thu, May 29, 2003 at 02:25:15PM -0700, David S. Miller wrote:
>    | > Would it have a single poster?
>    | 
>    | OSDL, Mark's IBM team and possible LTP ?
>    | 
>    | I assume there will be more once the list exists; automated regression 
>    | tests seem to be currently in fashion.
>    
>    If DaveM doesn't want to do it, I think that we can do it.
>    (I say without checking.... :)
> 
> Please do :-)
> 
> The issue is that I'm easier about adding a new list if I can
> restrict the poster list.
> 

OSDL has a mail list that is used to discuss the stability of the linux
kernel.  This would be a perfect list to use for posting test results.
The list name is linstab@osdl.org. It is a public list administered by
OSDL.  To subscribe: http://www.osdl.org/mailman/listinfo/linstab


-- 
Craig Thomas
craiger@osdl.org


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:04           ` Craig Thomas
@ 2003-05-29 22:05             ` Andi Kleen
  2003-05-29 22:25               ` Cliff White
  2003-05-29 23:41               ` David S. Miller
  0 siblings, 2 replies; 41+ messages in thread
From: Andi Kleen @ 2003-05-29 22:05 UTC (permalink / raw)
  To: Craig Thomas; +Cc: David S. Miller, rddunlap, ak, peloquin, linux-kernel

> OSDL has a mail list that is used to discuss the stability of the linux
> kernel.  This would be a perfect list to use for posting test results.
> The list name is linstab@osdl.org. It is a public list administered by
> OSDL.  To subscribe: http://www.osdl.org/mailman/listinfo/linstab

That's fairly obscure (Nobody knew of it before). Perhaps a well publicized
list on vger would be better.

-Andi

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:03         ` Andi Kleen
@ 2003-05-29 22:16           ` Cliff White
  2003-05-29 22:23           ` Nathan
  2003-05-29 23:10           ` Mark Peloquin
  2 siblings, 0 replies; 41+ messages in thread
From: Cliff White @ 2003-05-29 22:16 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Craig Thomas, David S. Miller, peloquin, linux-kernel, cliffw

> > OSDL has a linux stabilization web page where several tests are run
> 
> [...] Would you be willing to change your scripts to report
> any new results to this new list?
> 

I would be very interested in this if it leads to more people *looking* at the
tests results. 

Automating this stuff is the easy part - getting intelligence out of the 
results is harder.
The more eyeballs we can get to look, the easier this gets. 

If a new list, or better use of an old list will help, i'll change whatever is 
necessary.
If people like the Web, but don't like our paper layout, i'll change that too. 

cliffw

As 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:03         ` Andi Kleen
  2003-05-29 22:16           ` Cliff White
@ 2003-05-29 22:23           ` Nathan
  2003-05-29 23:10           ` Mark Peloquin
  2 siblings, 0 replies; 41+ messages in thread
From: Nathan @ 2003-05-29 22:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Craig Thomas, linux-kernel

On Fri, May 30, 2003 at 12:03:54AM +0200, Andi Kleen wrote:
> > OSDL has a linux stabilization web page where several tests are run
> 
> [...] Would you be willing to change your scripts to report
> any new results to this new list?

The linux stabilization web page uses results from the STP runs I
mentioned (40+ per day).  The STP emails results summaries after test 
runs so we could easily redirect the results to this new list.

-Nathan Dabney

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:05             ` Andi Kleen
@ 2003-05-29 22:25               ` Cliff White
  2003-05-29 23:41               ` David S. Miller
  1 sibling, 0 replies; 41+ messages in thread
From: Cliff White @ 2003-05-29 22:25 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Craig Thomas, David S. Miller, rddunlap, peloquin, linux-kernel, cliffw

> > OSDL has a mail list that is used to discuss the stability of the linux
> > kernel.  This would be a perfect list to use for posting test results.
> > The list name is linstab@osdl.org. It is a public list administered by
> > OSDL.  To subscribe: http://www.osdl.org/mailman/listinfo/linstab
> 
> That's fairly obscure (Nobody knew of it before). Perhaps a well publicized
> list on vger would be better.

Perhaps - though we can publicize any new list. 
We're content to leave the decision to DaveM and the list team -  
if they don't want the extra work, we're always glad to help. 
cliffw
OSDL

> 
> -Andi
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:11 ` Nightly regression runs against current bk tree Andi Kleen
  2003-05-29 21:25   ` David S. Miller
  2003-05-29 22:03   ` Nathan
@ 2003-05-29 22:48   ` Mark Peloquin
  2003-05-29 23:17     ` Andreas Dilger
  2 siblings, 1 reply; 41+ messages in thread
From: Mark Peloquin @ 2003-05-29 22:48 UTC (permalink / raw)
  To: Andi Kleen, linux-kernel



Andi Kleen wrote:

>Mark Peloquin <peloquin@austin.ibm.com> writes:
>
>  
>
>>We have dedicated a machine and thrown together some scripts that will grab
>>and build the latest kernel files, execute the regression suite,
>>collecting (hopefully)
>>enough system state information to allow meaningful analysis of any peculiar
>>results encountered.
>>    
>>
>
>How about doing a LTP run too with some difference file for new FAILs/BROKs ?
>That's not strictly a benchmark, but would help catching regressions
>quickly.
>

I'm under the impression that LTP and other test efforts seemed to focus 
more on functional evaluation, which is fine.  We are trying to focus 
purely on the performance differences seen from day to day.

>
>I notice your benchmark mix is very IO heavy, it would be nice to test other
>aspects of the system too. Perhaps lmbench and reaim compute workload?
>

Your correct. We're just getting started with this effort and we used 
this mix to get things going. Once ppl are happy with the presentation 
of data, we planned to add more tests to provide a more balanced mix. 
But since you asked, we have added lmbench to our -bk3 regression run. :)

Mark


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 21:25   ` David S. Miller
  2003-05-29 21:29     ` Andi Kleen
@ 2003-05-29 22:51     ` Mark Peloquin
  1 sibling, 0 replies; 41+ messages in thread
From: Mark Peloquin @ 2003-05-29 22:51 UTC (permalink / raw)
  To: David S. Miller; +Cc: ak, linux-kernel



David S. Miller wrote:

>   From: Andi Kleen <ak@suse.de>
>   Date: 29 May 2003 23:11:17 +0200
>   
>   David, what do you think?
>   
>Would it have a single poster?
>

Our intention was to have one "main" poster and another person for backup.

Mark


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:03   ` Nathan
@ 2003-05-29 23:08     ` Mark Peloquin
  0 siblings, 0 replies; 41+ messages in thread
From: Mark Peloquin @ 2003-05-29 23:08 UTC (permalink / raw)
  To: Nathan; +Cc: Andi Kleen, linux-kernel, davem



Nathan wrote:

>On Thu, May 29, 2003 at 11:11:17PM +0200, Andi Kleen wrote:
>  
>
>>It would be nice if we had a new linux-testresults list where such
>>updates could be posted regularly. I don't think it belong on l-k
>>because it would be too noisy. Perhaps such a list could be added to 
>>vger. David, what do you think?
>>    
>>
>
>The OSDL has a serious amount of automated testing we could point the
>results of to a separate list if it is created.
>
>Right now we avoid pointing that sort of thing to l-k because it would
>drive people nuts.  On average we complete 40+ tests a day.
>
>With all the testing efforts going on, a central list to post and
>analyze results would be good.  People interested in helping out could
>easily work with testers to look for trends and help with root cause
>analysis.
>
>When results are found to contain significant data, we can always notify l-k.
>

Easy of viewing should be considered. We have tried to show a high level 
summary that allows the users to quickly, looking in one place, 
determine if any significant data is found. When the users seems 
something of interest, they only need follow the links to see the 
details. Its shouldn't be necessary for users to sift through one email 
for each test. If finding signficant data was easier, and I think it can 
be made easier, users would look at it themselves and there wouldn't be 
the need to have to notify l-k.

I'm not trying to be competetive here. I just think results and 
comparisons can be made that covers a large amount of tests in a single 
page or note. One note per day does not IMHO seem like too much. That 
note can always be the "tip of the iceberg" pointing to many other 
things. Thus those not interested can simply skip that note.

Mark



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:03         ` Andi Kleen
  2003-05-29 22:16           ` Cliff White
  2003-05-29 22:23           ` Nathan
@ 2003-05-29 23:10           ` Mark Peloquin
  2 siblings, 0 replies; 41+ messages in thread
From: Mark Peloquin @ 2003-05-29 23:10 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Craig Thomas, David S. Miller, linux-kernel



Andi Kleen wrote:

>>OSDL has a linux stabilization web page where several tests are run
>>    
>>
>
>[...] Would you be willing to change your scripts to report
>any new results to this new list?
>
>-Andi
>

We have not automated the posting process ... yet, and will be happy to 
post whereever is acceptable.

Mark


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:48   ` Mark Peloquin
@ 2003-05-29 23:17     ` Andreas Dilger
  2003-05-29 23:30       ` Cliff White
  0 siblings, 1 reply; 41+ messages in thread
From: Andreas Dilger @ 2003-05-29 23:17 UTC (permalink / raw)
  To: Mark Peloquin; +Cc: Andi Kleen, linux-kernel

On May 29, 2003  17:48 -0500, Mark Peloquin wrote:
> Your correct. We're just getting started with this effort and we used 
> this mix to get things going. Once ppl are happy with the presentation 
> of data, we planned to add more tests to provide a more balanced mix. 
> But since you asked, we have added lmbench to our -bk3 regression run. :)

Mark, it would be nice to get a graph of the combined results for each
test.  Something like:

                 tiobench sequential write rate
  |                                  +++++++++++++++++      + = -mm-ext3
M |        ++++++++++++++++++++++++++*****************      * = linus-ext3
B | +++++++*****************   ######                       # = -ac-ext3
/ |                                                         . = -mm-XFS
s |                                                         = = -ac-XFS
  |                         *********                       etc
  |
  +----------------------------------------------------
			date

This allows at-a-glance trends for each group of tests and (as in the
example above you could see easily when a performance bug was added
and fixed in -ac before in the linus kernel, for example).  Probably
having all of the comparable results on the same page, or even in the
same graph results is a win.

Bonus points if you can click on a spot in the graph and get the results
page for that date/test ;-).

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 23:17     ` Andreas Dilger
@ 2003-05-29 23:30       ` Cliff White
  0 siblings, 0 replies; 41+ messages in thread
From: Cliff White @ 2003-05-29 23:30 UTC (permalink / raw)
  To: Mark Peloquin, Andi Kleen, linux-kernel

> On May 29, 2003  17:48 -0500, Mark Peloquin wrote:
> > Your correct. We're just getting started with this effort and we used 
> > this mix to get things going. Once ppl are happy with the presentation 
> > of data, we planned to add more tests to provide a more balanced mix. 
> > But since you asked, we have added lmbench to our -bk3 regression run. :)
> 
> Mark, it would be nice to get a graph of the combined results for each
> test.  Something like:
> 
>                  tiobench sequential write rate
>   |                                  +++++++++++++++++      + = -mm-ext3
> M |        ++++++++++++++++++++++++++*****************      * = linus-ext3
> B | +++++++*****************   ######                       # = -ac-ext3
> / |                                                         . = -mm-XFS
> s |                                                         = = -ac-XFS
>   |                         *********                       etc
>   |
>   +----------------------------------------------------
> 			date
> 
> This allows at-a-glance trends for each group of tests and (as in the
> example above you could see easily when a performance bug was added
> and fixed in -ac before in the linus kernel, for example).  Probably
> having all of the comparable results on the same page, or even in the
> same graph results is a win.
> 
> Bonus points if you can click on a spot in the graph and get the results
> page for that date/test ;-).

This idea the STP team really, really likes. We'll start working on getting 
this type of report
into our framework. 

Thanks much, Andreas - great explaination. 
cliffw

> 
> Cheers, Andreas
> --
> Andreas Dilger
> http://sourceforge.net/projects/ext2resize/
> http://www-mddsp.enel.ucalgary.ca/People/adilger/
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
  2003-05-29 22:05             ` Andi Kleen
  2003-05-29 22:25               ` Cliff White
@ 2003-05-29 23:41               ` David S. Miller
  1 sibling, 0 replies; 41+ messages in thread
From: David S. Miller @ 2003-05-29 23:41 UTC (permalink / raw)
  To: ak; +Cc: craiger, rddunlap, peloquin, linux-kernel

   From: Andi Kleen <ak@suse.de>
   Date: Fri, 30 May 2003 00:05:40 +0200

   > To subscribe: http://www.osdl.org/mailman/listinfo/linstab
   
   That's fairly obscure (Nobody knew of it before). Perhaps a well publicized
   list on vger would be better.
   
I don't see why the OSDL list cannot be widely publicized and
the vger variant could.  Don't be rediculious Andi :-)

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: Nightly regression runs against current bk tree
@ 2003-06-03 16:04 Paul Larson
  0 siblings, 0 replies; 41+ messages in thread
From: Paul Larson @ 2003-06-03 16:04 UTC (permalink / raw)
  To: lkml

Sorry I didn't see this sooner, I'm unsubscribed for the moment until my
email provider can get exim/procmail talking nicely.

LTP has had a mailing list for a long time that is explicitly for the
purpose of posting results.  It's currently underutilized so I'd love to
see more results getting posted there again.  Please consider using that
one for posting results of all types (LTP and non-ltp)

ltp-results@lists.sourceforge.net

Thanks,
Paul Larson




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Nightly regression runs against current bk tree
@ 2003-05-29 20:24 Mark Peloquin
  0 siblings, 0 replies; 41+ messages in thread
From: Mark Peloquin @ 2003-05-29 20:24 UTC (permalink / raw)
  To: linux-kernel


Our team would like to assist the community in quickly identifying 
patches that provide
performance improvements or regressions in the 2.5 kernel tree. The way 
to do this
will be to run a nightly regression test suite against the current bk 
tree, and then compare
the results against the previous night's results, showing the 
differences. Additionally,
also comparing against the 2.5 point release.

We have dedicated a machine and thrown together some scripts that will grab
and build the latest kernel files, execute the regression suite, 
collecting (hopefully)
enough system state information to allow meaningful analysis of any peculiar
results encountered.

Here are links to the current regression results obtained:

2.5.70 vs 2.5.70-bk1:
http://www.ibm.com/developerworks/oss/linuxperf/regression/2.5.70-bk1/2.5.70-vs-2.5.70-bk1/

2.5.70 vs 2.5.70-bk2:
http://www.ibm.com/developerworks/oss/linuxperf/regression/2.5.70-bk2/2.5.70-vs-2.5.70-bk2/
2.5.70-bk1 vs 2.5.70-bk2
http://www.ibm.com/developerworks/oss/linuxperf/regression/2.5.70-bk2/2.5.70-bk1-vs-2.5.70-bk2/

The regression suite executes in about 7.5 hours currently. We would 
like to keep
the execution time below 12 hours, so when a problem is encountered, we 
will have
time to recover without falling behind on the daily snapshots. We have 
attempted
to strike a balance between test execution time and test coverage. Work 
is still
ongoing in the area to provide the best balance and maintain repeatability.

Currently the regression suite operates on the 2.5 kernel bk tree. We do 
plan on
adding another machine that will perform similiar regression comparisons 
for the
-mm and -mjb patches.

Please bear in mind this is work in progress and there might be a few 
rough edges.
However, with your input, we feel it can provide a useful function. 
Please do not
hesitate to provide feedback or suggestions on improvements including 
content
and presentation.

Mark Peloquin
IBM Linux performance team


^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2003-06-03 15:50 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3ED66C83.8070608@austin.ibm.com.suse.lists.linux.kernel>
2003-05-29 21:11 ` Nightly regression runs against current bk tree Andi Kleen
2003-05-29 21:25   ` David S. Miller
2003-05-29 21:29     ` Andi Kleen
2003-05-29 21:38       ` Randy.Dunlap
2003-05-29 21:48         ` David S. Miller
2003-05-29 22:04           ` Craig Thomas
2003-05-29 22:05             ` Andi Kleen
2003-05-29 22:25               ` Cliff White
2003-05-29 23:41               ` David S. Miller
2003-05-29 21:50       ` Craig Thomas
2003-05-29 22:03         ` Andi Kleen
2003-05-29 22:16           ` Cliff White
2003-05-29 22:23           ` Nathan
2003-05-29 23:10           ` Mark Peloquin
2003-05-29 22:51     ` Mark Peloquin
2003-05-29 22:03   ` Nathan
2003-05-29 23:08     ` Mark Peloquin
2003-05-29 22:48   ` Mark Peloquin
2003-05-29 23:17     ` Andreas Dilger
2003-05-29 23:30       ` Cliff White
2003-06-03 16:04 Paul Larson
  -- strict thread matches above, loose matches on Subject: below --
2003-05-29 20:24 Mark Peloquin
2002-10-27 12:13 New nanosecond stat patch for 2.5.44 Andi Kleen
2002-10-27 14:33 ` New nanosecond stat patch for 2.5.44 - new patch II Andi Kleen
2002-10-27 21:49 ` New nanosecond stat patch for 2.5.44 Andreas Dilger
2002-10-27 22:54   ` H. Peter Anvin
2002-10-28  1:23     ` Chris Friesen
2002-10-28  1:35       ` Rob Landley
2002-11-06 13:27     ` Gabriel Paubert
2002-11-06 18:00       ` H. Peter Anvin
2002-10-27 23:16   ` Horst von Brand
2002-10-28 17:10     ` Andreas Dilger
2002-10-29 15:01   ` Bill Davidsen
2002-10-29 16:30     ` Andreas Dilger
2002-10-29 20:37       ` Bill Davidsen
2002-10-30  0:44         ` Jamie Lokier
2002-10-30 21:12           ` Bill Davidsen
2002-10-30 22:17             ` Jamie Lokier
2002-10-31  0:34               ` H. Peter Anvin
2002-11-01  1:57               ` Bill Davidsen
2002-11-01  3:32                 ` Jamie Lokier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).