On Mon, 2006-07-31 at 23:00 -0400, Theodore Tso wrote: > The problem is that many benchmarks (such as taring and untaring the > kernel sources in reiser4 sort order) are overly simplistic, in that > they don't really reflect how people use the filesystem in real life. > (How many times can you guarantee that files will be written in the > precise hash/tree order so that the filesystem gets the best possible > time?) A more subtle version of this problem happens for filesystems > where their performance degrades dramatically over-time without a > repacker. If the benchmark doesn't take into account the need for > repacker, or if the repacker is disabled or fails to run during the > benchmark, the filesystem are in effect "cheating" on the benchmark > because there is critical work which is necessary for the long-term > health of the filesystem which is getting deferred until after the > benchmark has finished measuring the performance of the system under > test. If the file system that requires a repacker can do X operations in 1/2 the time all week long, even if the repacker takes several hours once a week to run, you're still ahead of the game. The load averages on the vast majority of servers have significant peaks and valleys throughout the day, and throughout the week, it wouldn't be hard to find a time where a online repacker is virtually unnoticeable to users. Delaying certain work until the server is less busy might be considered "cheating" on benchmarks, but in the real world most people would consider it good use of resources. Just like RAID rebuilds, where you can set maximum IO speeds I could see a repacker working in a similar fashion. Obviously there are some servers where this is unacceptable and in such cases, don't use Reiser4, but I would guess they are few and far between. No file system is perfect for 100% of the work loads thrown at it. PostgreSQL and its vacuum process comes to mind as something similar to a repacker. PostgreSQL puts off doing some work to later, and it has proven itself over and over again, especially when it comes to scalability. PostgreSQL has recently (v8.0 I believe) moved to a system where it can automatically detect the need to vacuum specific tables, so tables that need it are vacuumed more often, and tables that don't are rarely touched. I don't see any reason why a repacker couldn't work in a similar fashion, once it detects a file to be fragmented over some value, it gets scheduled for repacking when there is idle disk IO available. The bottom line is once you have a online repacker you instantly open up all sorts of doors. Its well known that disk drives have different sustained read/write performance depending on if the data is on the inside or outside tracks. Perhaps the repacker could also move files around on the disk to get further gains, for instance larger or most commonly used files could be moved to where the disk has the highest sustained read/write performance, and smaller less used files could be moved to the slowest sustained read/write performance areas. -- Mike Benoit