* [ANNOUNCE] Ext3 vs Reiserfs benchmarks @ 2002-07-12 16:21 Dax Kelson 2002-07-12 17:05 ` Andreas Dilger ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: Dax Kelson @ 2002-07-12 16:21 UTC (permalink / raw) To: linux-kernel Tested: ext3 data=ordered ext3 data=writeback reiserfs reiserfs notail http://www.gurulabs.com/ext3-reiserfs.html Any suggestions or comments appreciated. Dax Kelson Guru Labs ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-12 16:21 [ANNOUNCE] Ext3 vs Reiserfs benchmarks Dax Kelson @ 2002-07-12 17:05 ` Andreas Dilger 2002-07-12 17:26 ` kwijibo 2002-07-12 20:34 ` Chris Mason ` (2 subsequent siblings) 3 siblings, 1 reply; 90+ messages in thread From: Andreas Dilger @ 2002-07-12 17:05 UTC (permalink / raw) To: Dax Kelson; +Cc: linux-kernel On Jul 12, 2002 10:21 -0600, Dax Kelson wrote: > ext3 data=ordered > ext3 data=writeback > reiserfs > reiserfs notail > > http://www.gurulabs.com/ext3-reiserfs.html > > Any suggestions or comments appreciated. Did you try data=journal mode on ext3? For real-life workloads sync-IO workloads like mail (e.g. not benchmarks where the system is 100% busy) you can have considerable performance benefits from doing the sync IO directly to the journal instead of partly to the journal and partly to the rest of the filesystem. The reason why "real life" is important here is because the data=journal mode writes all the files to disk twice - once to the journal and again to the filesystem, so you must have some "slack" in your disk bandwidth in order to benefit from this increased throughput on the part of the mail transport. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-12 17:05 ` Andreas Dilger @ 2002-07-12 17:26 ` kwijibo 2002-07-12 17:36 ` Andreas Dilger 0 siblings, 1 reply; 90+ messages in thread From: kwijibo @ 2002-07-12 17:26 UTC (permalink / raw) To: Andreas Dilger; +Cc: Dax Kelson, linux-kernel I compared reiserfs with notails and with tails to ext3 in journaled mode about a month ago. Strangely enough the machine that was being built is eventually slated for a mail machine. I used postmark to simulate the mail environment. Benchmarks are available here: http://labs.zianet.com Let me know if I am missing any info on there. Steven Andreas Dilger wrote: >On Jul 12, 2002 10:21 -0600, Dax Kelson wrote: > > >>ext3 data=ordered >>ext3 data=writeback >>reiserfs >>reiserfs notail >> >>http://www.gurulabs.com/ext3-reiserfs.html >> >>Any suggestions or comments appreciated. >> >> > >Did you try data=journal mode on ext3? For real-life workloads sync-IO >workloads like mail (e.g. not benchmarks where the system is 100% busy) >you can have considerable performance benefits from doing the sync IO >directly to the journal instead of partly to the journal and partly to >the rest of the filesystem. > >The reason why "real life" is important here is because the data=journal >mode writes all the files to disk twice - once to the journal and again >to the filesystem, so you must have some "slack" in your disk bandwidth >in order to benefit from this increased throughput on the part of the >mail transport. > >Cheers, Andreas >-- >Andreas Dilger >http://www-mddsp.enel.ucalgary.ca/People/adilger/ >http://sourceforge.net/projects/ext2resize/ > >- >To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >the body of a message to majordomo@vger.kernel.org >More majordomo info at http://vger.kernel.org/majordomo-info.html >Please read the FAQ at http://www.tux.org/lkml/ > > > > ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-12 17:26 ` kwijibo @ 2002-07-12 17:36 ` Andreas Dilger 0 siblings, 0 replies; 90+ messages in thread From: Andreas Dilger @ 2002-07-12 17:36 UTC (permalink / raw) To: kwijibo; +Cc: Dax Kelson, linux-kernel On Jul 12, 2002 11:26 -0600, kwijibo@zianet.com wrote: > I compared reiserfs with notails and with tails to > ext3 in journaled mode about a month ago. > Strangely enough the machine that was being > built is eventually slated for a mail machine. I used > postmark to simulate the mail environment. > > Benchmarks are available here: > http://labs.zianet.com > > Let me know if I am missing any info on there. Yes, I saw this benchmark when it was first posted. It isn't clear from the web pages that you are using data=journal for ext3. Note that this is only a benefit for sync I/O workloads like mail and NFS, but not other types of usage. Also, for sync I/O workloads you can get a big boost by using an external journal device. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-12 16:21 [ANNOUNCE] Ext3 vs Reiserfs benchmarks Dax Kelson 2002-07-12 17:05 ` Andreas Dilger @ 2002-07-12 20:34 ` Chris Mason 2002-07-13 4:44 ` Daniel Phillips 2002-07-14 20:40 ` Dax Kelson 3 siblings, 0 replies; 90+ messages in thread From: Chris Mason @ 2002-07-12 20:34 UTC (permalink / raw) To: Dax Kelson; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 1192 bytes --] On Fri, 2002-07-12 at 12:21, Dax Kelson wrote: > Tested: > > ext3 data=ordered > ext3 data=writeback > reiserfs > reiserfs notail > > http://www.gurulabs.com/ext3-reiserfs.html > > Any suggestions or comments appreciated. postmark is an interesting workload, but it does not do fsync or renames on the working set, and postfix does lots of both while delivering. postmark does do a good job of showing the difference between lots of files in one directory (great for reiserfs) and lots of directories with fewer files in each (better for ext3). Andreas Dilger already mentioned -o data=journal on ext3, you can try the beta reiserfs patches that add support for data=journal and data=ordered at: ftp.suse.com/pub/people/mason/patches/data-logging They improve reiserfs performance for just about everything, but data=journal is especially good for fsync/O_SYNC heavy workloads. Andrew Morton sent me a benchmark of his that tries to simulate postfix. He has posted it to l-k before but a quick google search found dead links only, so I'm attaching it. What I like about his synctest is the results are consistent and you can play with various fsync/rename/unlink options. -chris [-- Attachment #2: synctest.c --] [-- Type: text/x-c, Size: 7672 bytes --] /* * Test and benchmark synchronous operations. * stolen from Andrew Morton */ #undef _XOPEN_SOURCE /* MAP_ANONYMOUS */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <fcntl.h> #include <errno.h> #include <stdarg.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/resource.h> #include <sys/wait.h> #include <sys/mman.h> /* * Lots of yummy globals! */ char *progname, *dirname; int verbose, use_fsync, use_osync; int fsync_dir; int n_threads = 1, n_iters = 100; int *child_status; int this_child_index; int dir_fd; int show_tids; int threads_per_dir = 1; int thread_group; int do_unlink; int rename_pass; #define N_FILES 100 #define UNLINK_LAG 30 #define RENAME_PASSES 3 void show(char *fmt, ...) { if (verbose) { va_list ap; va_start(ap, fmt); vfprintf(stdout, fmt, ap); fflush( stdout ); va_end(ap); } } /* * - Create a file. * - Write some data to it * - Maybe fsync() it. * - Close it * - Maybe fsync() its parent dir * - rename() it. * - maybe fsync() its parent dir * - rename() it. * - maybe fsync() its parent dir * - rename() it. * - maybe fsync() its parent dir * - UNLINK_LAG files later, maybe unlink it. * - maybe fsync() its parent dir * * Repeat the above N_FILES times */ char *mk_dirname(void) { char *ret = malloc(strlen(dirname) + 64); sprintf(ret, "%s/%05d", dirname, thread_group); return ret; } char *mk_filename(int fileno) { char *ret = malloc(strlen(dirname) + 64); sprintf(ret, "%s/%05d/%05d-%05d", dirname, thread_group, getpid(), fileno); return ret; } char *mk_new_filename(int fileno, int pass) { char *ret = malloc(strlen(dirname) + 64); sprintf(ret, "%s/%05d/%02d-%05d-%05d", dirname, thread_group, pass, getpid(), fileno); return ret; } void sync_dir(void) { if (fsync_dir) { show("fsync(%s)\n", dirname); if (fsync(dir_fd) < 0) { fprintf(stderr, "%s: failed to fsync dir `%s': %s\n", progname, dirname, strerror(errno)); exit(1); } } } void make_dir(void) { char *n = mk_dirname(); show("mkdir(%s)\n", n); if (mkdir(n, 0777) < 0) { fprintf(stderr, "%s: Cannot make directory `%s': %s\n", progname, n, strerror(errno)); exit(1); } free(n); } void remove_dir(void) { char *n = mk_dirname(); show("rmdir(%s)\n", n); rmdir(n); free(n); } void write_stuff_to(int fd, char *name) { static char buf[500000]; static int to_write = 5000; show("write %d bytes to `%s'\n", sizeof(buf), name); if (write(fd, buf, to_write) != to_write) { fprintf(stderr, "%s: failed to write %d bytes to `%s': %s\n", progname, to_write, name, strerror(errno)); exit(1); } to_write *= 1.1; if (to_write > 250000) to_write = 5000; } void unlink_one_file(int fileno, int pass) { if (do_unlink) { char *name = mk_new_filename(fileno, pass); show("unlink(%s)\n", name); if (unlink(name) < 0) { fprintf(stderr, "%s: failed to unlink `%s': %s\n", progname, name, strerror(errno)); exit(1); } sync_dir(); free(name); } } void do_one_file(int fileno) { char *name = mk_filename(fileno); int fd, flags; flags = O_RDWR|O_CREAT|O_TRUNC; if (use_osync) flags |= O_SYNC; show("open(%s)\n", name); fd = open(name, flags, 0666); if (fd < 0) { fprintf(stderr, "%s: failed to create file `%s': %s\n", progname, name, strerror(errno)); exit(1); } write_stuff_to(fd, name); if (use_fsync) { show("fsync(%s)\n", name); if (fsync(fd) < 0) { fprintf(stderr, "%s: failed to fsync `%s': %s\n", progname, name, strerror(errno)); exit(1); } } show("close(%s)\n", name); if (close(fd) < 0) { fprintf(stderr, "%s: failed to close `%s': %s\n", progname, name, strerror(errno)); exit(1); } sync_dir(); for (rename_pass = 0; rename_pass < RENAME_PASSES; rename_pass++) { char *newname = mk_new_filename(fileno, rename_pass); show("rename(%s, %s)\n", name, newname); if (rename(name, newname) < 0) { fprintf(stderr, "%s: failed to rename `%s' to `%s': %s\n", progname, name, newname, strerror(errno)); exit(1); } sync_dir(); free(name); name = newname; } rename_pass--; free(name); } void do_child(void) { int fileno; char *dn = mk_dirname(); int dotcount; dir_fd = open(dn, O_RDONLY); if (dir_fd < 0) { fprintf(stderr, "%s: failed to open dir `%s': %s\n", progname, dn, strerror(errno)); exit(1); } free(dn); dotcount = N_FILES / 10; if (dotcount == 0) dotcount = 1; for (fileno = 0; fileno < N_FILES; fileno++) { if (fileno % dotcount == 0) { printf("."); fflush(stdout); } do_one_file(fileno); if (fileno >= UNLINK_LAG) unlink_one_file(fileno - UNLINK_LAG, RENAME_PASSES - 1); } for (fileno = N_FILES - UNLINK_LAG; fileno < N_FILES; fileno++) unlink_one_file(fileno, RENAME_PASSES - 1); } void doit(void) { int child; int children_left; child_status = (int *)mmap( 0, n_threads * sizeof(*child_status), PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0); if (child_status == MAP_FAILED) { perror("mmap"); exit(1); } memset(child_status, 0, n_threads * sizeof(*child_status)); thread_group = -1; for (this_child_index = 0; this_child_index < n_threads; this_child_index++) { if (this_child_index % threads_per_dir == 0) { thread_group++; make_dir(); } if (fork() == 0) { int iter; for (iter = 0; iter < n_iters; iter++) do_child(); child_status[this_child_index] = 1; exit(0); } } /* Parent */ children_left = n_threads; while (children_left) { int status; if( wait3(&status, 0, 0) < 0 ) { if( errno != EINTR ) { perror("wait3"); exit(1); } continue; } for (child = 0; child < n_threads; child++) { if (child_status[child] == 1) { child_status[child] = 2; printf("*"); fflush(stdout); children_left--; } } } for (thread_group = 0; thread_group < ( n_threads / threads_per_dir ); thread_group++ ) remove_dir(); printf("\n"); } void usage(void) { fprintf(stderr, "Usage: %s [-fFosuv] [-p threads-pre-dir ][-n iters] [-t threads] dirname\n", progname); fprintf(stderr, " -f: Use fsync() on close\n"); fprintf(stderr, " -F: Use fsync() on parent dir\n"); fprintf(stderr, " -n: Number of iterations\n"); fprintf(stderr, " -o: Open files O_SYNC\n"); fprintf(stderr, " -p: Number of threads per directory\n"); fprintf(stderr, " -t: Number of threads\n"); fprintf(stderr, " -u: Unlink files during test\n"); fprintf(stderr, " -v: Verbose\n"); fprintf(stderr, " dirname: Directory to run tests in\n"); exit(1); } int main(int argc, char *argv[]) { int c; progname = argv[0]; while ((c = getopt(argc, argv, "vFfout:n:p:")) != -1) { switch (c) { case 'f': use_fsync++; break; case 'F': fsync_dir++; break; case 'n': n_iters = strtol(optarg, NULL, 10); break; case 'o': use_osync++; break; case 'p': threads_per_dir = strtol(optarg, NULL, 10); break; case 't': n_threads = strtol(optarg, NULL, 10); break; case 'u': do_unlink++; break; case 'v': verbose++; break; } } if (optind == argc) usage(); dirname = argv[optind++]; if (optind != argc) usage(); doit(); exit(0); } ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-12 16:21 [ANNOUNCE] Ext3 vs Reiserfs benchmarks Dax Kelson 2002-07-12 17:05 ` Andreas Dilger 2002-07-12 20:34 ` Chris Mason @ 2002-07-13 4:44 ` Daniel Phillips 2002-07-14 20:40 ` Dax Kelson 3 siblings, 0 replies; 90+ messages in thread From: Daniel Phillips @ 2002-07-13 4:44 UTC (permalink / raw) To: Dax Kelson, linux-kernel On Friday 12 July 2002 18:21, Dax Kelson wrote: > Any suggestions or comments appreciated. "it is clear that IF your server is stable and not prone to crashing, and/or you have the write cache on your hard drives battery backed, you should strongly consider using the writeback journaling mode of Ext3 versus ordered." You probably want to suggest UPS there rather than battery backed disk cache, since the writeback caching is predominantly on the cpu side. -- Daniel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-12 16:21 [ANNOUNCE] Ext3 vs Reiserfs benchmarks Dax Kelson ` (2 preceding siblings ...) 2002-07-13 4:44 ` Daniel Phillips @ 2002-07-14 20:40 ` Dax Kelson 2002-07-15 8:26 ` Sam Vilain 3 siblings, 1 reply; 90+ messages in thread From: Dax Kelson @ 2002-07-14 20:40 UTC (permalink / raw) To: linux-kernel On Fri, 2002-07-12 at 10:21, Dax Kelson wrote: > > Any suggestions or comments appreciated. > Thanks for the feedback. Look for more testing from us soon addressing the suggestions brought up. Dax ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-14 20:40 ` Dax Kelson @ 2002-07-15 8:26 ` Sam Vilain 2002-07-15 12:30 ` Alan Cox 0 siblings, 1 reply; 90+ messages in thread From: Sam Vilain @ 2002-07-15 8:26 UTC (permalink / raw) To: Dax Kelson; +Cc: linux-kernel Dax Kelson <dax@gurulabs.com> wrote: > > Any suggestions or comments appreciated. > Thanks for the feedback. Look for more testing from us soon addressing > the suggestions brought up. One more thing - can I just make the comment that testing freshly formatted filesystems is not going to show up ext2's real weaknesses, that happen to old filesystems - particularly those where the filesystem has been allowed to fill up. I timed *15 minutes* for a system I admin to unlink a single 1G file on a fairly old ext2 filesystem the other day (perhaps ext3 would have improved this, I'm not sure). It took 30 minutes to scan a snort log directory log on ext2, but less than 2 minutes on reiser - and only 3 seconds once it was in the buffercache. You are testing for a mail server - how many mailboxes are in your spool directory for the tests? Try it with about five to ten thousand mailboxes and see how your results vary. -- Sam Vilain, sam@vilain.net WWW: http://sam.vilain.net/ 7D74 2A09 B2D3 C30F F78E GPG: http://sam.vilain.net/sam.asc 278A A425 30A9 05B5 2F13 Although Mr Chavez 'was democratically elected,' one had to bear in mind that 'Legitimacy is something that is conferred not just by a majority of the voters'" - The office of George "Dubya" Bush commenting on the Venezuelan election ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 8:26 ` Sam Vilain @ 2002-07-15 12:30 ` Alan Cox 2002-07-15 12:02 ` Sam Vilain 2002-07-15 12:09 ` Matti Aarnio 0 siblings, 2 replies; 90+ messages in thread From: Alan Cox @ 2002-07-15 12:30 UTC (permalink / raw) To: Sam Vilain; +Cc: Dax Kelson, linux-kernel On Mon, 2002-07-15 at 09:26, Sam Vilain wrote: > You are testing for a mail server - how many mailboxes are in your spool > directory for the tests? Try it with about five to ten thousand > mailboxes and see how your results vary. If your mail server can't get heirarchical mail spools right, get one that can. Alan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 12:30 ` Alan Cox @ 2002-07-15 12:02 ` Sam Vilain 2002-07-15 13:23 ` Alan Cox ` (2 more replies) 2002-07-15 12:09 ` Matti Aarnio 1 sibling, 3 replies; 90+ messages in thread From: Sam Vilain @ 2002-07-15 12:02 UTC (permalink / raw) To: Alan Cox; +Cc: dax, linux-kernel Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > You are testing for a mail server - how many mailboxes are in your spool > > directory for the tests? Try it with about five to ten thousand > > mailboxes and see how your results vary. > If your mail server can't get heirarchical mail spools right, get one > that can. Translation "Yes, we know that there is no directory hashing in ext2/3. You'll have to find another solution to the problem, I'm afraid. Why not ease the burden on the filesystem by breaking up the task for it, and giving it to it in small pieces. That way it's much less likely to choke." :-) Sure, you could set up hierarchical mail spools. But it sure stinks of a temporary solution for a long-term problem. What about the next application that grows to massive proportions? Hey, while I've got your attention, how do you go about debugging your kernel? I'm trying to add fair scheduling to the new O(1) scheduler, something of a token bucket filter counting jiffies used by a process/user/s_context (in scheduler_tick()) and tweaking their priority accordingly (in effective_prio()). It'd be really nice if I could run it under UML or something like that so I can trace through it with gdb, but I couldn't get the UML patch to apply to your tree. Any hints? -- Sam Vilain, sam@vilain.net WWW: http://sam.vilain.net/ 7D74 2A09 B2D3 C30F F78E GPG: http://sam.vilain.net/sam.asc 278A A425 30A9 05B5 2F13 ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 12:02 ` Sam Vilain @ 2002-07-15 13:23 ` Alan Cox 2002-07-15 13:40 ` Chris Mason 2002-07-15 15:12 ` Andrea Arcangeli 2002-07-15 16:03 ` Andreas Dilger 2 siblings, 1 reply; 90+ messages in thread From: Alan Cox @ 2002-07-15 13:23 UTC (permalink / raw) To: Sam Vilain; +Cc: dax, linux-kernel On Mon, 2002-07-15 at 13:02, Sam Vilain wrote: > Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > "Yes, we know that there is no directory hashing in ext2/3. You'll have > to find another solution to the problem, I'm afraid. Why not ease the > burden on the filesystem by breaking up the task for it, and giving it > to it in small pieces. That way it's much less likely to choke." Actually there are several other reasons for it. It sucks a lot less when you need to use ls and friends to inspect part of the spool. It also makes it much easier to split the mail spool over multiple disks as it grows without having to backup/restore the spool area > Sure, you could set up hierarchical mail spools. But it sure stinks of a > temporary solution for a long-term problem. What about the next > application that grows to massive proportions? JFS ? > Hey, while I've got your attention, how do you go about debugging your > kernel? I'm trying to add fair scheduling to the new O(1) scheduler, > something of a token bucket filter counting jiffies used by a > process/user/s_context (in scheduler_tick()) and tweaking their > priority accordingly (in effective_prio()). It'd be really nice if I > could run it under UML or something like that so I can trace through > it with gdb, but I couldn't get the UML patch to apply to your tree. > Any hints? The UML tree and my tree don't quite merge easily. Your best bet is to grab the Red Hat Limbo beta packages for the kernel source, which if I remember rightly are both -ac based and include the option to build UML Alan ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 13:23 ` Alan Cox @ 2002-07-15 13:40 ` Chris Mason 2002-07-15 19:40 ` Andrew Morton 0 siblings, 1 reply; 90+ messages in thread From: Chris Mason @ 2002-07-15 13:40 UTC (permalink / raw) To: Alan Cox; +Cc: Sam Vilain, dax, linux-kernel On Mon, 2002-07-15 at 09:23, Alan Cox wrote: > On Mon, 2002-07-15 at 13:02, Sam Vilain wrote: > > Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > "Yes, we know that there is no directory hashing in ext2/3. You'll have > > to find another solution to the problem, I'm afraid. Why not ease the > > burden on the filesystem by breaking up the task for it, and giving it > > to it in small pieces. That way it's much less likely to choke." > > Actually there are several other reasons for it. It sucks a lot less > when you need to use ls and friends to inspect part of the spool. It > also makes it much easier to split the mail spool over multiple disks as > it grows without having to backup/restore the spool area Another good reason is i_sem. If you've got more than one process doing something to that directory, you spend lots of time waiting for the semaphore. I think it was andrew that reminded me i_sem is held on fsync, so fync(dir) to make things safe after a rename can slow things down. reiserfs only needs fsync(file), ext3 needs fsync(anything on the fs). If ext3 would promise to make fsync(file) sufficient forever, it might help the mta authors tune. -chris ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 13:40 ` Chris Mason @ 2002-07-15 19:40 ` Andrew Morton 0 siblings, 0 replies; 90+ messages in thread From: Andrew Morton @ 2002-07-15 19:40 UTC (permalink / raw) To: Chris Mason; +Cc: Alan Cox, Sam Vilain, dax, linux-kernel Chris Mason wrote: > > ... > If ext3 would promise to make fsync(file) sufficient forever, it might > help the mta authors tune. ext3 promises. This side-effect is bolted firmly into the design of ext3 and it's hard to see any way in which it will go away. - ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 12:02 ` Sam Vilain 2002-07-15 13:23 ` Alan Cox @ 2002-07-15 15:12 ` Andrea Arcangeli 2002-07-15 16:03 ` Andreas Dilger 2 siblings, 0 replies; 90+ messages in thread From: Andrea Arcangeli @ 2002-07-15 15:12 UTC (permalink / raw) To: Sam Vilain; +Cc: Alan Cox, dax, linux-kernel, Jeff Dike On Mon, Jul 15, 2002 at 01:02:01PM +0100, Sam Vilain wrote: > Hey, while I've got your attention, how do you go about debugging your > kernel? I'm trying to add fair scheduling to the new O(1) scheduler, > something of a token bucket filter counting jiffies used by a > process/user/s_context (in scheduler_tick()) and tweaking their > priority accordingly (in effective_prio()). It'd be really nice if I > could run it under UML or something like that so I can trace through > it with gdb, but I couldn't get the UML patch to apply to your tree. > Any hints? -aa ships with both uml and o1 scheduler. I need uml for everything non hardware related so expect it to be always uptodate there. However since I merged the O(1) scheduler there is the annoyance that sometime wakeup events don't arrive at least until kupdate reschedule or something like that (of course only with uml, not with real hardware). Also pressing keys is enough to unblock it. I didn't debugged it hard yet. Accoring to Jeff it's a problem with cli that masks signals. Andrea ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 12:02 ` Sam Vilain 2002-07-15 13:23 ` Alan Cox 2002-07-15 15:12 ` Andrea Arcangeli @ 2002-07-15 16:03 ` Andreas Dilger 2002-07-15 16:12 ` Daniel Phillips 2002-07-15 17:48 ` Sam Vilain 2 siblings, 2 replies; 90+ messages in thread From: Andreas Dilger @ 2002-07-15 16:03 UTC (permalink / raw) To: Sam Vilain; +Cc: Alan Cox, dax, linux-kernel On Jul 15, 2002 13:02 +0100, Sam Vilain wrote: > "Yes, we know that there is no directory hashing in ext2/3. You'll > have to find another solution to the problem, I'm afraid. Why not ease > the burden on the filesystem by breaking up the task for it, and giving > it to it in small pieces. That way it's much less likely to choke." Amusingly, there IS directory hashing available for ext2 and ext3, and it is just as fast as reiserfs hashed directories. See: http://people.nl.linux.org/~phillips/htree/paper/htree.html Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 16:03 ` Andreas Dilger @ 2002-07-15 16:12 ` Daniel Phillips 2002-07-15 17:48 ` Sam Vilain 1 sibling, 0 replies; 90+ messages in thread From: Daniel Phillips @ 2002-07-15 16:12 UTC (permalink / raw) To: Andreas Dilger, Sam Vilain; +Cc: Alan Cox, dax, linux-kernel On Monday 15 July 2002 18:03, Andreas Dilger wrote: > On Jul 15, 2002 13:02 +0100, Sam Vilain wrote: > > "Yes, we know that there is no directory hashing in ext2/3. You'll > > have to find another solution to the problem, I'm afraid. Why not ease > > the burden on the filesystem by breaking up the task for it, and giving > > it to it in small pieces. That way it's much less likely to choke." > > Amusingly, there IS directory hashing available for ext2 and ext3, and > it is just as fast as reiserfs hashed directories. See: > > http://people.nl.linux.org/~phillips/htree/paper/htree.html Faster, last time I checked. I really must test against XFS and JFS at some point. -- Daniel ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 16:03 ` Andreas Dilger 2002-07-15 16:12 ` Daniel Phillips @ 2002-07-15 17:48 ` Sam Vilain 2002-07-15 18:47 ` Mathieu Chouquet-Stringer ` (2 more replies) 1 sibling, 3 replies; 90+ messages in thread From: Sam Vilain @ 2002-07-15 17:48 UTC (permalink / raw) To: Andreas Dilger; +Cc: dax, linux-kernel Andreas Dilger <adilger@clusterfs.com> wrote: > Amusingly, there IS directory hashing available for ext2 and ext3, and > it is just as fast as reiserfs hashed directories. See: > http://people.nl.linux.org/~phillips/htree/paper/htree.html You learn something new every day. So, with that in mind - what has reiserfs got that ext2 doesn't? - tail merging, giving much more efficient space usage for lots of small files. - B*Tree allocation offering ``a 1/3rd reduction in internal fragmentation in return for slightly more complicated insertions and deletion algorithms'' (from the htree paper). - online resizing in the main kernel (ext2 needs a patch - http://ext2resize.sourceforge.net/). - Resizing does not require the use of `ext2prepare' run on the filesystem while unmounted to resize over arbitrary boundaries. - directory hashing in the main kernel On the flipside, ext2 over reiserfs: - support for attributes without a patch or 2.4.19-pre4+ kernel - support for filesystem quotas without a patch - there is a `dump' command (but it's useless, because it hangs when you run it on mounted filesystems - come on, who REALLY unmounts their filesystems for a nightly dump? You need a 3 way mirror to do it while guaranteeing filesystem availability...) I'd be very interested in seeing postmark results without the hierarchical directory structure (which an unpatched postfix doesn't support), with about 5000 mailboxes with and without the htree patch (or with the htree patch but without that directory indexed, if that is possible). -- Sam Vilain, sam@vilain.net WWW: http://sam.vilain.net/ 7D74 2A09 B2D3 C30F F78E GPG: http://sam.vilain.net/sam.asc 278A A425 30A9 05B5 2F13 Try to be the best of what you are, even if what you are is no good. ASHLEIGH BRILLIANT ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 17:48 ` Sam Vilain @ 2002-07-15 18:47 ` Mathieu Chouquet-Stringer 2002-07-15 19:26 ` Sam Vilain 2002-07-16 8:18 ` Stelian Pop 2002-07-15 21:14 ` Andreas Dilger 2002-07-16 8:15 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks Stelian Pop 2 siblings, 2 replies; 90+ messages in thread From: Mathieu Chouquet-Stringer @ 2002-07-15 18:47 UTC (permalink / raw) To: Sam Vilain; +Cc: linux-kernel sam@vilain.net (Sam Vilain) writes: > - there is a `dump' command (but it's useless, because it hangs when you > run it on mounted filesystems - come on, who REALLY unmounts their > filesystems for a nightly dump? You need a 3 way mirror to do it > while guaranteeing filesystem availability...) According to everybody, dump is deprecated (and it shouldn't work reliably with 2.4, in two words: "forget it")... -- Mathieu Chouquet-Stringer E-Mail : mathieu@newview.com It is exactly because a man cannot do a thing that he is a proper judge of it. -- Oscar Wilde ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 18:47 ` Mathieu Chouquet-Stringer @ 2002-07-15 19:26 ` Sam Vilain 2002-07-16 8:18 ` Stelian Pop 1 sibling, 0 replies; 90+ messages in thread From: Sam Vilain @ 2002-07-15 19:26 UTC (permalink / raw) To: Mathieu Chouquet-Stringer; +Cc: linux-kernel Mathieu Chouquet-Stringer <mathieu@newview.com> wrote: > > - there is a `dump' command (but it's useless, because it hangs when you > > run it on mounted filesystems - come on, who REALLY unmounts their > > filesystems for a nightly dump? You need a 3 way mirror to do it > > while guaranteeing filesystem availability...) > According to everybody, dump is deprecated (and it shouldn't work reliably > with 2.4, in two words: "forget it")... It's a shame, because `tar' doesn't save things like inode attributes and places unnecessary load on the VFS layer. It also takes considerably longer than dump did on one backup server I admin - like ~12 hours to back up ~26G in ~414k inodes to a tape capable of about 1MB/sec. But that's probably the old directory hashing thing again, there are some reeeeaaallllllly large directories on that machine... Ah, the joys of legacy. -- Sam Vilain, sam@vilain.net WWW: http://sam.vilain.net/ 7D74 2A09 B2D3 C30F F78E GPG: http://sam.vilain.net/sam.asc 278A A425 30A9 05B5 2F13 If you think the United States has stood still, who built the largest shopping center in the world? RICHARD M NIXON ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 18:47 ` Mathieu Chouquet-Stringer 2002-07-15 19:26 ` Sam Vilain @ 2002-07-16 8:18 ` Stelian Pop 2002-07-16 12:22 ` Gerhard Mack 1 sibling, 1 reply; 90+ messages in thread From: Stelian Pop @ 2002-07-16 8:18 UTC (permalink / raw) To: Mathieu Chouquet-Stringer; +Cc: linux-kernel On Mon, Jul 15, 2002 at 02:47:04PM -0400, Mathieu Chouquet-Stringer wrote: > According to everybody, dump is deprecated (and it shouldn't work reliably > with 2.4, in two words: "forget it")... This needs to be "according to Linus, dump is deprecated". Given the interest Linus has manifested for backups, I wouldn't really rely on his statement :-) Stelian. -- Stelian Pop <stelian.pop@fr.alcove.com> Alcove - http://www.alcove.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 8:18 ` Stelian Pop @ 2002-07-16 12:22 ` Gerhard Mack 2002-07-16 12:49 ` Stelian Pop 0 siblings, 1 reply; 90+ messages in thread From: Gerhard Mack @ 2002-07-16 12:22 UTC (permalink / raw) To: Stelian Pop; +Cc: Mathieu Chouquet-Stringer, linux-kernel On Tue, 16 Jul 2002, Stelian Pop wrote: > On Mon, Jul 15, 2002 at 02:47:04PM -0400, Mathieu Chouquet-Stringer wrote: > > > According to everybody, dump is deprecated (and it shouldn't work reliably > > with 2.4, in two words: "forget it")... > > This needs to be "according to Linus, dump is deprecated". Given the > interest Linus has manifested for backups, I wouldn't really rely on > his statement :-) Either way dump is not likely to give you a reliable backup when used with a 2.4.x kernel. Gerhard -- Gerhard Mack gmack@innerfire.net <>< As a computer I find your faith in technology amusing. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 12:22 ` Gerhard Mack @ 2002-07-16 12:49 ` Stelian Pop 2002-07-16 15:11 ` Gerhard Mack 0 siblings, 1 reply; 90+ messages in thread From: Stelian Pop @ 2002-07-16 12:49 UTC (permalink / raw) To: Gerhard Mack; +Cc: Mathieu Chouquet-Stringer, linux-kernel On Tue, Jul 16, 2002 at 08:22:53AM -0400, Gerhard Mack wrote: > > This needs to be "according to Linus, dump is deprecated". Given the > > interest Linus has manifested for backups, I wouldn't really rely on > > his statement :-) > > Either way dump is not likely to give you a reliable backup when used > with a 2.4.x kernel. Since you are so well informed, maybe you could share your knowledge with us. I'm the dump maintainer, so I'll be very interested in knowing how comes that dump works for me and many other users... :-) Stelian. -- Stelian Pop <stelian.pop@fr.alcove.com> Alcove - http://www.alcove.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 12:49 ` Stelian Pop @ 2002-07-16 15:11 ` Gerhard Mack 2002-07-16 15:22 ` Andrea Arcangeli 2002-07-16 15:39 ` Stelian Pop 0 siblings, 2 replies; 90+ messages in thread From: Gerhard Mack @ 2002-07-16 15:11 UTC (permalink / raw) To: Stelian Pop; +Cc: Mathieu Chouquet-Stringer, linux-kernel On Tue, 16 Jul 2002, Stelian Pop wrote: > Date: Tue, 16 Jul 2002 14:49:56 +0200 > From: Stelian Pop <stelian.pop@fr.alcove.com> > To: Gerhard Mack <gmack@innerfire.net> > Cc: Mathieu Chouquet-Stringer <mathieu@newview.com>, > linux-kernel@vger.kernel.org > Subject: Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks > > On Tue, Jul 16, 2002 at 08:22:53AM -0400, Gerhard Mack wrote: > > > > This needs to be "according to Linus, dump is deprecated". Given the > > > interest Linus has manifested for backups, I wouldn't really rely on > > > his statement :-) > > > > Either way dump is not likely to give you a reliable backup when used > > with a 2.4.x kernel. > > Since you are so well informed, maybe you could share your knowledge > with us. > > I'm the dump maintainer, so I'll be very interested in knowing how > comes that dump works for me and many other users... :-) > I'll save myself the trouble when Linus said it better than I could: Note that dump simply won't work reliably at all even in 2.4.x: the buffer cache and the page cache (where all the actual data is) are not coherent. This is only going to get even worse in 2.5.x, when the directories are moved into the page cache as well. So anybody who depends on "dump" getting backups right is already playing russian rulette with their backups. It's not at all guaranteed to get the right results - you may end up having stale data in the buffer cache that ends up being "backed up". In other words you have a backup system that works some of the time or even most of the time... brilliant! Gerhard -- Gerhard Mack gmack@innerfire.net <>< As a computer I find your faith in technology amusing. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 15:11 ` Gerhard Mack @ 2002-07-16 15:22 ` Andrea Arcangeli 2002-07-16 15:39 ` Stelian Pop 1 sibling, 0 replies; 90+ messages in thread From: Andrea Arcangeli @ 2002-07-16 15:22 UTC (permalink / raw) To: Gerhard Mack; +Cc: Stelian Pop, Mathieu Chouquet-Stringer, linux-kernel On Tue, Jul 16, 2002 at 11:11:20AM -0400, Gerhard Mack wrote: > On Tue, 16 Jul 2002, Stelian Pop wrote: > > > Date: Tue, 16 Jul 2002 14:49:56 +0200 > > From: Stelian Pop <stelian.pop@fr.alcove.com> > > To: Gerhard Mack <gmack@innerfire.net> > > Cc: Mathieu Chouquet-Stringer <mathieu@newview.com>, > > linux-kernel@vger.kernel.org > > Subject: Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks > > > > On Tue, Jul 16, 2002 at 08:22:53AM -0400, Gerhard Mack wrote: > > > > > > This needs to be "according to Linus, dump is deprecated". Given the > > > > interest Linus has manifested for backups, I wouldn't really rely on > > > > his statement :-) > > > > > > Either way dump is not likely to give you a reliable backup when used > > > with a 2.4.x kernel. > > > > Since you are so well informed, maybe you could share your knowledge > > with us. > > > > I'm the dump maintainer, so I'll be very interested in knowing how > > comes that dump works for me and many other users... :-) > > > > I'll save myself the trouble when Linus said it better than I could: > > Note that dump simply won't work reliably at all even in > 2.4.x: the buffer cache and the page cache (where all the > actual data is) are not coherent. This is only going to > get even worse in 2.5.x, when the directories are moved > into the page cache as well. > > So anybody who depends on "dump" getting backups right is > already playing russian rulette with their backups. It's > not at all guaranteed to get the right results - you may > end up having stale data in the buffer cache that ends up > being "backed up". > > > In other words you have a backup system that works some of the time or > even most of the time... brilliant! just to clarify, the above implicitly assumes the fs is mounted read-write while you're dumping it. if the fs is mounted readonly or if it's unmounted, there is no problem with dumping it. Also note that dump has the same problem with read-write mounted fs also in 2.2, and I guess in 2.0 too, it's nothing new of 2.4, it just gets more visible the more logical dirty caches we have. Andrea ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 15:11 ` Gerhard Mack 2002-07-16 15:22 ` Andrea Arcangeli @ 2002-07-16 15:39 ` Stelian Pop 2002-07-16 19:45 ` Matthias Andree 1 sibling, 1 reply; 90+ messages in thread From: Stelian Pop @ 2002-07-16 15:39 UTC (permalink / raw) To: Gerhard Mack; +Cc: Mathieu Chouquet-Stringer, linux-kernel On Tue, Jul 16, 2002 at 11:11:20AM -0400, Gerhard Mack wrote: > In other words you have a backup system that works some of the time or > even most of the time... brilliant! Dump is a backup system that works 100% of the time when used as it was designed to: on unmounted filesystems (or mounted R/O). It is indeed brilliant to have it work, even most of the time, in conditions it wasn't designed for. Stelian. -- Stelian Pop <stelian.pop@fr.alcove.com> Alcove - http://www.alcove.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 15:39 ` Stelian Pop @ 2002-07-16 19:45 ` Matthias Andree 2002-07-16 20:04 ` Shawn 0 siblings, 1 reply; 90+ messages in thread From: Matthias Andree @ 2002-07-16 19:45 UTC (permalink / raw) To: linux-kernel; +Cc: Stelian Pop, Gerhard Mack, Mathieu Chouquet-Stringer On Tue, 16 Jul 2002, Stelian Pop wrote: > On Tue, Jul 16, 2002 at 11:11:20AM -0400, Gerhard Mack wrote: > > > In other words you have a backup system that works some of the time or > > even most of the time... brilliant! > > Dump is a backup system that works 100% of the time when used as > it was designed to: on unmounted filesystems (or mounted R/O). Practical question: how do I get a file system mounted R/O for backup with dump without putting that system into single-user mode? Particularly when running automated backups, this is an issue. I cannot kill all writers (syslog, Postfix, INN, CVS server, ...) on my production machines just for the sake of taking a backup. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 19:45 ` Matthias Andree @ 2002-07-16 20:04 ` Shawn 2002-07-16 20:11 ` Mathieu Chouquet-Stringer 0 siblings, 1 reply; 90+ messages in thread From: Shawn @ 2002-07-16 20:04 UTC (permalink / raw) To: linux-kernel, Stelian Pop, Gerhard Mack, Mathieu Chouquet-Stringer You don't. This is where you have a filesystem where syslog, xinetd, blogd, bloatd-config-d2, raffle-ticketd DO NOT LIVE. People forget so easily the wonders of multiple partitions. On 07/16, Matthias Andree said something like: > On Tue, 16 Jul 2002, Stelian Pop wrote: > > > On Tue, Jul 16, 2002 at 11:11:20AM -0400, Gerhard Mack wrote: > > > > > In other words you have a backup system that works some of the time or > > > even most of the time... brilliant! > > > > Dump is a backup system that works 100% of the time when used as > > it was designed to: on unmounted filesystems (or mounted R/O). > > Practical question: how do I get a file system mounted R/O for backup > with dump without putting that system into single-user mode? > Particularly when running automated backups, this is an issue. I cannot > kill all writers (syslog, Postfix, INN, CVS server, ...) on my > production machines just for the sake of taking a backup. > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- Shawn Leas core@enodev.com So, do you live around here often? -- Stephen Wright ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 20:04 ` Shawn @ 2002-07-16 20:11 ` Mathieu Chouquet-Stringer 2002-07-16 20:22 ` Shawn 0 siblings, 1 reply; 90+ messages in thread From: Mathieu Chouquet-Stringer @ 2002-07-16 20:11 UTC (permalink / raw) To: Shawn; +Cc: linux-kernel, Stelian Pop, Gerhard Mack On Tue, Jul 16, 2002 at 03:04:22PM -0500, Shawn wrote: > You don't. > > This is where you have a filesystem where syslog, xinetd, blogd, > bloatd-config-d2, raffle-ticketd DO NOT LIVE. > > People forget so easily the wonders of multiple partitions. I'm sorry, but I don't understand how it's going to change anything. For sure, it makes your life easier because you don't have to shutdown all your programs that have files opened in R/W mode. But in the end, you will have to shutdown something to remount the partition in R/O mode and usually you don't want or can't afford to do that. -- Mathieu Chouquet-Stringer E-Mail : mathieu@newview.com It is exactly because a man cannot do a thing that he is a proper judge of it. -- Oscar Wilde ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 20:11 ` Mathieu Chouquet-Stringer @ 2002-07-16 20:22 ` Shawn 2002-07-16 20:27 ` Mathieu Chouquet-Stringer 2002-07-17 11:45 ` Matthias Andree 0 siblings, 2 replies; 90+ messages in thread From: Shawn @ 2002-07-16 20:22 UTC (permalink / raw) To: Mathieu Chouquet-Stringer, Shawn, linux-kernel, Stelian Pop, Gerhard Mack In this case, can you use a RAID mirror or something, then break it? Also, there's the LVM snapshot at the block layer someone already mentioned, which when used with smaller partions is less overhead. (less FS delta) This problem isn't that complex. On 07/16, Mathieu Chouquet-Stringer said something like: > On Tue, Jul 16, 2002 at 03:04:22PM -0500, Shawn wrote: > > You don't. > > > > This is where you have a filesystem where syslog, xinetd, blogd, > > bloatd-config-d2, raffle-ticketd DO NOT LIVE. > > > > People forget so easily the wonders of multiple partitions. > > I'm sorry, but I don't understand how it's going to change anything. For > sure, it makes your life easier because you don't have to shutdown all your > programs that have files opened in R/W mode. But in the end, you will have > to shutdown something to remount the partition in R/O mode and usually you > don't want or can't afford to do that. > > -- > Mathieu Chouquet-Stringer E-Mail : mathieu@newview.com > It is exactly because a man cannot do a thing that he is a > proper judge of it. > -- Oscar Wilde -- Shawn Leas core@enodev.com I bought my brother some gift-wrap for Christmas. I took it to the Gift Wrap department and told them to wrap it, but in a different print so he would know when to stop unwrapping. -- Stephen Wright ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 20:22 ` Shawn @ 2002-07-16 20:27 ` Mathieu Chouquet-Stringer 2002-07-17 11:45 ` Matthias Andree 1 sibling, 0 replies; 90+ messages in thread From: Mathieu Chouquet-Stringer @ 2002-07-16 20:27 UTC (permalink / raw) To: Shawn; +Cc: linux-kernel, Stelian Pop, Gerhard Mack On Tue, Jul 16, 2002 at 03:22:31PM -0500, Shawn wrote: > In this case, can you use a RAID mirror or something, then break it? > > Also, there's the LVM snapshot at the block layer someone already > mentioned, which when used with smaller partions is less overhead. > (less FS delta) > > This problem isn't that complex. I agree but I guess that if Matthias asked the question that way, he probably meant he doesn't have a raid mirror or "something" (as you say)... If you didn't plan your install (meaning you don't have the nice raid or anything else), you're basically screwed... -- Mathieu Chouquet-Stringer E-Mail : mathieu@newview.com It is exactly because a man cannot do a thing that he is a proper judge of it. -- Oscar Wilde ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 20:22 ` Shawn 2002-07-16 20:27 ` Mathieu Chouquet-Stringer @ 2002-07-17 11:45 ` Matthias Andree 2002-07-17 19:02 ` Andreas Dilger 1 sibling, 1 reply; 90+ messages in thread From: Matthias Andree @ 2002-07-17 11:45 UTC (permalink / raw) To: linux-kernel On Tue, 16 Jul 2002, Shawn wrote: > In this case, can you use a RAID mirror or something, then break it? > > Also, there's the LVM snapshot at the block layer someone already > mentioned, which when used with smaller partions is less overhead. > (less FS delta) All these "solutions" don't work out, I cannot remount R/O my partition, and LVM low-level snapshots or breaking a RAID mirror simply won't work out. I would have to remount r/o the partition to get a consistent image in the first place, so the first step must fail already... ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-17 11:45 ` Matthias Andree @ 2002-07-17 19:02 ` Andreas Dilger 2002-07-18 9:29 ` Matthias Andree 2002-07-19 8:29 ` Matthias Andree 0 siblings, 2 replies; 90+ messages in thread From: Andreas Dilger @ 2002-07-17 19:02 UTC (permalink / raw) To: linux-kernel On Jul 17, 2002 13:45 +0200, Matthias Andree wrote: > On Tue, 16 Jul 2002, Shawn wrote: > > In this case, can you use a RAID mirror or something, then break it? > > > > Also, there's the LVM snapshot at the block layer someone already > > mentioned, which when used with smaller partions is less overhead. > > (less FS delta) > > All these "solutions" don't work out, I cannot remount R/O my partition, > and LVM low-level snapshots or breaking a RAID mirror simply won't work > out. I would have to remount r/o the partition to get a consistent image > in the first place, so the first step must fail already... Have you been reading my emails at all? LVM snapshots DO ensure that the snapshot filesystem is consistent for journaled filesystems. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-17 19:02 ` Andreas Dilger @ 2002-07-18 9:29 ` Matthias Andree 2002-07-19 8:29 ` Matthias Andree 1 sibling, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-18 9:29 UTC (permalink / raw) To: linux-kernel On Wed, 17 Jul 2002, Andreas Dilger wrote: > On Jul 17, 2002 13:45 +0200, Matthias Andree wrote: > > On Tue, 16 Jul 2002, Shawn wrote: > > > In this case, can you use a RAID mirror or something, then break it? > > > > > > Also, there's the LVM snapshot at the block layer someone already > > > mentioned, which when used with smaller partions is less overhead. > > > (less FS delta) > > > > All these "solutions" don't work out, I cannot remount R/O my partition, > > and LVM low-level snapshots or breaking a RAID mirror simply won't work > > out. I would have to remount r/o the partition to get a consistent image > > in the first place, so the first step must fail already... > > Have you been reading my emails at all? LVM snapshots DO ensure that > the snapshot filesystem is consistent for journaled filesystems. Please apologize, I have been busy and only reading partial threads, and had not come across your LVM-snapshot related mails when I wrote the previous mail. -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-17 19:02 ` Andreas Dilger 2002-07-18 9:29 ` Matthias Andree @ 2002-07-19 8:29 ` Matthias Andree 2002-07-19 16:39 ` Andreas Dilger 1 sibling, 1 reply; 90+ messages in thread From: Matthias Andree @ 2002-07-19 8:29 UTC (permalink / raw) To: linux-kernel On Wed, 17 Jul 2002, Andreas Dilger wrote: > On Jul 17, 2002 13:45 +0200, Matthias Andree wrote: > > On Tue, 16 Jul 2002, Shawn wrote: > > > In this case, can you use a RAID mirror or something, then break it? > > > > > > Also, there's the LVM snapshot at the block layer someone already > > > mentioned, which when used with smaller partions is less overhead. > > > (less FS delta) > > > > All these "solutions" don't work out, I cannot remount R/O my partition, > > and LVM low-level snapshots or breaking a RAID mirror simply won't work > > out. I would have to remount r/o the partition to get a consistent image > > in the first place, so the first step must fail already... > > Have you been reading my emails at all? LVM snapshots DO ensure that > the snapshot filesystem is consistent for journaled filesystems. What kernel version is necessary to achieve this on production kernels (i. e. 2.4)? Does "consistent" mean "fsck proof"? Here's what I tried, on Linux-2.4.19-pre10-ac3 (IIRC) (ext3fs): (from memory, history not available, different machine): lvcreate --snapshot snap /dev/vg0/home e2fsck -f /dev/vg0/snap dump -0 ... It reported zero dtime for one file and two bitmap differences. Does "consistent" mean "consistent after you replay the log?" If so, that's still a losing game, because I cannot fsck the snapshot (it's R/O in the LVM case at least) to replay the journal -- and I don't assume dump 0.4b29 (which I'm using) goes fishing in the journal, but did not use the dump source code. dump did not complain however, and given what e2fsck had to complain, I'd happily force mount such a file system when just a deletion has not completed. -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-19 8:29 ` Matthias Andree @ 2002-07-19 16:39 ` Andreas Dilger 2002-07-19 20:01 ` Shawn 0 siblings, 1 reply; 90+ messages in thread From: Andreas Dilger @ 2002-07-19 16:39 UTC (permalink / raw) To: linux-kernel On Jul 19, 2002 10:29 +0200, Matthias Andree wrote: > What kernel version is necessary to achieve this on production kernels > (i. e. 2.4)? > > Does "consistent" mean "fsck proof"? > > Here's what I tried, on Linux-2.4.19-pre10-ac3 (IIRC) (ext3fs): > > (from memory, history not available, different machine): > lvcreate --snapshot snap /dev/vg0/home > e2fsck -f /dev/vg0/snap > dump -0 ... > > It reported zero dtime for one file and two bitmap differences. That is because one critical piece is missing from 2.4, the VFS lock patch. It is part of the LVM sources at sistina.com. Chris Mason has been trying to get it in, but it is delayed until 2.4.19 is out. > dump did not complain however, and given what e2fsck had to complain, > I'd happily force mount such a file system when just a deletion has not > completed. You cannot mount a dirty ext3 filesystem from read-only media. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-19 16:39 ` Andreas Dilger @ 2002-07-19 20:01 ` Shawn 2002-07-19 20:47 ` Andreas Dilger 0 siblings, 1 reply; 90+ messages in thread From: Shawn @ 2002-07-19 20:01 UTC (permalink / raw) To: linux-kernel On 07/19, Andreas Dilger said something like: > On Jul 19, 2002 10:29 +0200, Matthias Andree wrote: > > What kernel version is necessary to achieve this on production kernels > > (i. e. 2.4)? > > > > Does "consistent" mean "fsck proof"? > > > > Here's what I tried, on Linux-2.4.19-pre10-ac3 (IIRC) (ext3fs): > > > > (from memory, history not available, different machine): > > lvcreate --snapshot snap /dev/vg0/home > > e2fsck -f /dev/vg0/snap > > dump -0 ... > > > > It reported zero dtime for one file and two bitmap differences. > > That is because one critical piece is missing from 2.4, the VFS lock > patch. It is part of the LVM sources at sistina.com. Chris Mason has > been trying to get it in, but it is delayed until 2.4.19 is out. > > > dump did not complain however, and given what e2fsck had to complain, > > I'd happily force mount such a file system when just a deletion has not > > completed. > > You cannot mount a dirty ext3 filesystem from read-only media. I thought you could "mount -t ext2" ext3 volumes, and thought you could force mount ext2. I'm no Andreas Dilger, so don't take this like I'm disagreeing... -- Shawn Leas core@enodev.com I went to the bank and asked to borrow a cup of money. They said, "What for?" I said, "I'm going to buy some sugar." -- Stephen Wright ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-19 20:01 ` Shawn @ 2002-07-19 20:47 ` Andreas Dilger 0 siblings, 0 replies; 90+ messages in thread From: Andreas Dilger @ 2002-07-19 20:47 UTC (permalink / raw) To: Shawn; +Cc: linux-kernel On Jul 19, 2002 15:01 -0500, Shawn wrote: > On 07/19, Andreas Dilger said something like: > > You cannot mount a dirty ext3 filesystem from read-only media. > > I thought you could "mount -t ext2" ext3 volumes, and thought you could > force mount ext2. This is true if the ext3 filesystem is unmounted cleanly. Otherwise there is a flag in the superblock which tells the kernel it can't mount the filesystem because there is something there it doesn't understand (namely the dirty journal with all of the recent changes). This flag (EXT3_FEATURE_INCOMPAT_RECOVERY) is cleared when the filesystem is unmounted properly, when e2fsck or a r/w mount recovers the journal, and not coincidentally when an LVM snapshot is created. In case you are more curious, there are a couple of paragraphs in linux/Documentation/filesystems/ext2.txt about the compat flags, which are really one of the great features of ext2. You may think that an overstatement, but without the feature flags, none of the other enhancements that have been added to ext2 over the last few years (and in the next few years too) would have been so easily done. As for mounting a dirty ext2 filesystem, yes that is possible with only a warning at mount time. That is why nobody has put much effort into adding the snapshot hooks into ext2 yet. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 17:48 ` Sam Vilain 2002-07-15 18:47 ` Mathieu Chouquet-Stringer @ 2002-07-15 21:14 ` Andreas Dilger 2002-07-17 18:41 ` bill davidsen 2002-07-16 8:15 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks Stelian Pop 2 siblings, 1 reply; 90+ messages in thread From: Andreas Dilger @ 2002-07-15 21:14 UTC (permalink / raw) To: Sam Vilain; +Cc: dax, linux-kernel On Jul 15, 2002 18:48 +0100, Sam Vilain wrote: > Andreas Dilger <adilger@clusterfs.com> wrote: > > > Amusingly, there IS directory hashing available for ext2 and ext3, and > > it is just as fast as reiserfs hashed directories. See: > > http://people.nl.linux.org/~phillips/htree/paper/htree.html > > You learn something new every day. So, with that in mind - what has > reiserfs got that ext2 doesn't? > > - tail merging, giving much more efficient space usage for lots of small > files. Well, there was a tail merging patch for ext2, but it has been dropped for now. In reality, any benchmarks with reiserfs (except the very-small-files case) will run with tail packing disabled because it kills performance. > - B*Tree allocation offering ``a 1/3rd reduction in internal > fragmentation in return for slightly more complicated insertions and > deletion algorithms'' (from the htree paper). > - online resizing in the main kernel (ext2 needs a patch - > http://ext2resize.sourceforge.net/). Yes, I wrote it... > - Resizing does not require the use of `ext2prepare' run on the > filesystem while unmounted to resize over arbitrary boundaries. That is coming this summer. It will be part of some changes to support "meta blockgroups", and the resizing comes for free at the same time. > - directory hashing in the main kernel Probably will happen in 2.5, as Andrew is already testing htree support for ext3. It is also in the ext3 CVS tree for 2.4, so I wouldn't be surprised if it shows up in 2.4 also. > On the flipside, ext2 over reiserfs: > > - support for attributes without a patch or 2.4.19-pre4+ kernel > - support for filesystem quotas without a patch > - there is a `dump' command (but it's useless, because it hangs when you > run it on mounted filesystems - come on, who REALLY unmounts their > filesystems for a nightly dump? You need a 3 way mirror to do it > while guaranteeing filesystem availability...) Well, the dump can only be inconsistent for files that are being changed during the dump itself. As for hanging the system, that would be a bug regardless of whether it was dump or "dd" reading from the block device. A bug related to this was fixed, probably in 2.4.19-preX somewhere. > I'd be very interested in seeing postmark results without the > hierarchical directory structure (which an unpatched postfix doesn't > support), with about 5000 mailboxes with and without the htree patch > (or with the htree patch but without that directory indexed, if that > is possible). Let me know what you find. It is possible to use an htree-patched kernel and not have indexed directories - just don't mount with "-o index". Note that there is a data-corrupting bug somewhere in the ext3 htree code, so I wouldn't suggest using indexed directories except for test. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:14 ` Andreas Dilger @ 2002-07-17 18:41 ` bill davidsen 2002-07-17 19:47 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks (whither dump?) Lew Wolfgang 0 siblings, 1 reply; 90+ messages in thread From: bill davidsen @ 2002-07-17 18:41 UTC (permalink / raw) To: linux-kernel In article <20020715211448.GI442@clusterfs.com>, Andreas Dilger <adilger@clusterfs.com> wrote: | Well, the dump can only be inconsistent for files that are being changed | during the dump itself. As for hanging the system, that would be a bug | regardless of whether it was dump or "dd" reading from the block device. | A bug related to this was fixed, probably in 2.4.19-preX somewhere. Any dump on a live f/s would seem to have the problem that files are changing as they are read and may not be consistant. I suppose there could be some kind of "fsync and journal lock" on a file, allowing all writes to a file to be journaled while the file is backed up. However, such things don't scale well for big files with lots of writes, and the file, while unchanging, may not be valid. Backups of running files are best done by the application, like Oracle as a for-instance. Neither the o/s nor the backup can be sure when/if the data is in a valid state. Tar has this problem, although not the same issues with data on the fly in buffers. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks (whither dump?) 2002-07-17 18:41 ` bill davidsen @ 2002-07-17 19:47 ` Lew Wolfgang 0 siblings, 0 replies; 90+ messages in thread From: Lew Wolfgang @ 2002-07-17 19:47 UTC (permalink / raw) To: linux-kernel Hi Folks, As an old dump user (dumpster?) I have to admit that we've avoided ext3 and Reiserfs because of this issue. We couldn't live without the "Tower of Hanoi". I remember using, many years ago (SunOS 3.4), a patched dump binary that allowed safe dumps from live UFS filesystems. I don't remember all the details (it was 16-years ago) but this dump would compare somehow, files before and after writing to tape. If there was a difference it would back out the dumped file and preserve the consistency of the tape. I don't remember if it would go back and try the file again. I haven't the foggest notion if this would work in these modern times, I'm just offering it as food for thought. Regards, Lew Wolfgang ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 17:48 ` Sam Vilain 2002-07-15 18:47 ` Mathieu Chouquet-Stringer 2002-07-15 21:14 ` Andreas Dilger @ 2002-07-16 8:15 ` Stelian Pop 2002-07-16 12:27 ` Matthias Andree 2 siblings, 1 reply; 90+ messages in thread From: Stelian Pop @ 2002-07-16 8:15 UTC (permalink / raw) To: Sam Vilain; +Cc: dax, linux-kernel On Mon, Jul 15, 2002 at 06:48:05PM +0100, Sam Vilain wrote: > On the flipside, ext2 over reiserfs: [...] > - there is a `dump' command (but it's useless, because it hangs when you > run it on mounted filesystems - come on, who REALLY unmounts their > filesystems for a nightly dump? You need a 3 way mirror to do it > while guaranteeing filesystem availability...) dump(8) doesn't hang when dumping mounted filesystems. You are refering to a genuine bug which was fixed some time ago. However, in some rare occasions, dump can save corrupted data when saving a mounted and generaly highly active filesystem. Even then, in 99% of the cases it doesn't really matter because the corrupted files will get saved by the next incremental dump. Come on, who REALLY expects to have consistent backups without either unmounting the filesystem or using some snapshot techniques ? Stelian. -- Stelian Pop <stelian.pop@fr.alcove.com> Alcove - http://www.alcove.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 8:15 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks Stelian Pop @ 2002-07-16 12:27 ` Matthias Andree 2002-07-16 12:43 ` Stelian Pop 0 siblings, 1 reply; 90+ messages in thread From: Matthias Andree @ 2002-07-16 12:27 UTC (permalink / raw) To: linux-kernel; +Cc: Stelian Pop, Sam Vilain, dax On Tue, 16 Jul 2002, Stelian Pop wrote: > Come on, who REALLY expects to have consistent backups without > either unmounting the filesystem or using some snapshot techniques ? The who uses [s|g]tar, cpio, afio, dsmc (Tivoli distributed storage manager), ... Low-level snapshots don't do any good, they just freeze the "halfway there" on-disk structure. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 12:27 ` Matthias Andree @ 2002-07-16 12:43 ` Stelian Pop 2002-07-16 12:53 ` Matthias Andree 0 siblings, 1 reply; 90+ messages in thread From: Stelian Pop @ 2002-07-16 12:43 UTC (permalink / raw) To: linux-kernel, Sam Vilain, dax On Tue, Jul 16, 2002 at 02:27:56PM +0200, Matthias Andree wrote: > > Come on, who REALLY expects to have consistent backups without > > either unmounting the filesystem or using some snapshot techniques ? > > The who uses [s|g]tar, cpio, afio, dsmc (Tivoli distributed storage > manager), ... > > Low-level snapshots don't do any good, they just freeze the "halfway > there" on-disk structure. But [s|g]tar, cpio, afio (don't know about dsmc) also freeze the "halfway there" data, but at the file level instead (application instead of filesystem)... Stelian. -- Stelian Pop <stelian.pop@fr.alcove.com> Alcove - http://www.alcove.com ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 12:43 ` Stelian Pop @ 2002-07-16 12:53 ` Matthias Andree 2002-07-16 13:05 ` Christoph Hellwig 2002-07-17 18:51 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks bill davidsen 0 siblings, 2 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-16 12:53 UTC (permalink / raw) To: linux-kernel; +Cc: Stelian Pop, Sam Vilain, dax On Tue, 16 Jul 2002, Stelian Pop wrote: > > Low-level snapshots don't do any good, they just freeze the "halfway > > there" on-disk structure. > > But [s|g]tar, cpio, afio (don't know about dsmc) also freeze the > "halfway there" data, but at the file level instead (application > instead of filesystem)... Not if some day somebody implements file system level snapshots for Linux. Until then, better have garbled file contents constrained to a file than random data as on-disk layout changes with hefty directory updates. dsmc fstat()s the file it is currently reading regularly and retries the dump as the changes, and gives up if it is updated too often. Not sure about the server side, and certainly not a useful option for sequential devices that you directly write on. Looks like a cache for the biggest file is necessary. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 12:53 ` Matthias Andree @ 2002-07-16 13:05 ` Christoph Hellwig 2002-07-16 19:38 ` Matthias Andree 2002-07-17 18:51 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks bill davidsen 1 sibling, 1 reply; 90+ messages in thread From: Christoph Hellwig @ 2002-07-16 13:05 UTC (permalink / raw) To: linux-kernel, Stelian Pop, Sam Vilain, dax On Tue, Jul 16, 2002 at 02:53:01PM +0200, Matthias Andree wrote: > Not if some day somebody implements file system level snapshots for > Linux. Until then, better have garbled file contents constrained to a > file than random data as on-disk layout changes with hefty directory > updates. or the blockdevice-level snapshots already implemented in Linux.. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 13:05 ` Christoph Hellwig @ 2002-07-16 19:38 ` Matthias Andree 2002-07-16 19:49 ` Andreas Dilger 2002-07-16 20:11 ` Thunder from the hill 0 siblings, 2 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-16 19:38 UTC (permalink / raw) To: linux-kernel; +Cc: Christoph Hellwig, Stelian Pop, Sam Vilain, dax On Tue, 16 Jul 2002, Christoph Hellwig wrote: > On Tue, Jul 16, 2002 at 02:53:01PM +0200, Matthias Andree wrote: > > Not if some day somebody implements file system level snapshots for > > Linux. Until then, better have garbled file contents constrained to a > > file than random data as on-disk layout changes with hefty directory > > updates. > > or the blockdevice-level snapshots already implemented in Linux.. That would require three atomic steps: 1. mount read-only, flushing all pending updates 2. take snapshot 3. mount read-write and then backup the snapshot. A snapshots of a live file system won't do, it can be as inconsistent as it desires -- if your corrupt target is moving or not, dumping it is not of much use. -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 19:38 ` Matthias Andree @ 2002-07-16 19:49 ` Andreas Dilger 2002-07-16 20:11 ` Thunder from the hill 1 sibling, 0 replies; 90+ messages in thread From: Andreas Dilger @ 2002-07-16 19:49 UTC (permalink / raw) To: linux-kernel, Christoph Hellwig, Stelian Pop, Sam Vilain, dax On Jul 16, 2002 21:38 +0200, Matthias Andree wrote: > On Tue, 16 Jul 2002, Christoph Hellwig wrote: > > On Tue, Jul 16, 2002 at 02:53:01PM +0200, Matthias Andree wrote: > > > Not if some day somebody implements file system level snapshots for > > > Linux. Until then, better have garbled file contents constrained to a > > > file than random data as on-disk layout changes with hefty directory > > > updates. > > > > or the blockdevice-level snapshots already implemented in Linux.. > > That would require three atomic steps: > > 1. mount read-only, flushing all pending updates > 2. take snapshot > 3. mount read-write > > and then backup the snapshot. A snapshots of a live file system won't > do, it can be as inconsistent as it desires -- if your corrupt target is > moving or not, dumping it is not of much use. Luckily, there is already an interface which does this - sync_supers_lockfs(), which the LVM code will use if it is patched in. Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 19:38 ` Matthias Andree 2002-07-16 19:49 ` Andreas Dilger @ 2002-07-16 20:11 ` Thunder from the hill 2002-07-16 21:06 ` Matthias Andree 1 sibling, 1 reply; 90+ messages in thread From: Thunder from the hill @ 2002-07-16 20:11 UTC (permalink / raw) To: Matthias Andree Cc: linux-kernel, Christoph Hellwig, Stelian Pop, Sam Vilain, dax Hi, On Tue, 16 Jul 2002, Matthias Andree wrote: > > or the blockdevice-level snapshots already implemented in Linux.. > > That would require three atomic steps: > > 1. mount read-only, flushing all pending updates > 2. take snapshot > 3. mount read-write > > and then backup the snapshot. A snapshots of a live file system won't > do, it can be as inconsistent as it desires -- if your corrupt target is > moving or not, dumping it is not of much use. Well, couldn't we just kindof lock the file system so that while backing up no writes get through to the real filesystem? This will possibly require a lot of memory (or another space to write to), but it might be done? Regards, Thunder -- (Use http://www.ebb.org/ungeek if you can't decode) ------BEGIN GEEK CODE BLOCK------ Version: 3.12 GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$ N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G e++++ h* r--- y- ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 20:11 ` Thunder from the hill @ 2002-07-16 21:06 ` Matthias Andree 2002-07-16 21:23 ` Andreas Dilger 2002-07-16 22:19 ` Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) stoffel 0 siblings, 2 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-16 21:06 UTC (permalink / raw) To: linux-kernel [-- Attachment #1: msg.pgp --] [-- Type: application/pgp, Size: 1226 bytes --] ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 21:06 ` Matthias Andree @ 2002-07-16 21:23 ` Andreas Dilger 2002-07-16 21:38 ` Thunder from the hill ` (2 more replies) 2002-07-16 22:19 ` Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) stoffel 1 sibling, 3 replies; 90+ messages in thread From: Andreas Dilger @ 2002-07-16 21:23 UTC (permalink / raw) To: linux-kernel On Jul 16, 2002 23:06 +0200, Matthias Andree wrote: > On Tue, 16 Jul 2002, Thunder from the hill wrote: > > On Tue, 16 Jul 2002, Matthias Andree wrote: > > > That would require three atomic steps: > > > > > > 1. mount read-only, flushing all pending updates > > > 2. take snapshot > > > 3. mount read-write > > > > > > and then backup the snapshot. A snapshots of a live file system won't > > > do, it can be as inconsistent as it desires -- if your corrupt target is > > > moving or not, dumping it is not of much use. > > > > Well, couldn't we just kindof lock the file system so that while backing > > up no writes get through to the real filesystem? This will possibly > > require a lot of memory (or another space to write to), but it might be > > done? > > But you would want to backup a consistent file system, so when entering > the freeze or snapshot mode, you must flush all pending data in such a > way that the snapshot is consistent (i. e. needs not fsck action > whatsoever). This is all done already for both LVM and EVMS snapshots. The filesystem (ext3, reiserfs, XFS, JFS) flushes the outstanding operations and is frozen, the snapshot is created, and the filesystem becomes active again. It takes a second or less. Then dump will guarantee 100% correct backups of the snapshot filesystem. You would have to do a backup on the snapshot to guarantee 100% correctness even with tar. Most people don't care, because they don't even do backups in the first place, until they have lost a lot of their data and they learn. Even without snapshots, while dump isn't guaranteed to be 100% correct for rapidly changing filesystems, I have been using it for years on both 2.2 and 2.4 without any problems on my home systems. I have even restored data from those same backups... Cheers, Andreas -- Andreas Dilger http://www-mddsp.enel.ucalgary.ca/People/adilger/ http://sourceforge.net/projects/ext2resize/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 21:23 ` Andreas Dilger @ 2002-07-16 21:38 ` Thunder from the hill 2002-07-17 11:47 ` Matthias Andree 2002-07-18 14:50 ` Bill Davidsen 2 siblings, 0 replies; 90+ messages in thread From: Thunder from the hill @ 2002-07-16 21:38 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-kernel Hi, On Tue, 16 Jul 2002, Andreas Dilger wrote: > This is all done already for both LVM and EVMS snapshots. The filesystem > (ext3, reiserfs, XFS, JFS) flushes the outstanding operations and is > frozen, the snapshot is created, and the filesystem becomes active again. > It takes a second or less. Anyway, we could do that in parallel if we did it like that: sync -> significant data is being written lock -> data writes stay cached, but aren't written snapshot unlock -> data is getting written now unmount the snapshout (clean it) write the modified snapshot to disk... Regards, Thunder -- (Use http://www.ebb.org/ungeek if you can't decode) ------BEGIN GEEK CODE BLOCK------ Version: 3.12 GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$ N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G e++++ h* r--- y- ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 21:23 ` Andreas Dilger 2002-07-16 21:38 ` Thunder from the hill @ 2002-07-17 11:47 ` Matthias Andree 2002-07-18 14:50 ` Bill Davidsen 2 siblings, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-17 11:47 UTC (permalink / raw) To: linux-kernel On Tue, 16 Jul 2002, Andreas Dilger wrote: > This is all done already for both LVM and EVMS snapshots. The filesystem > (ext3, reiserfs, XFS, JFS) flushes the outstanding operations and is > frozen, the snapshot is created, and the filesystem becomes active again. > It takes a second or less. Then dump will guarantee 100% correct backups > of the snapshot filesystem. You would have to do a backup on the snapshot > to guarantee 100% correctness even with tar. Sure. On some machines, they will go with dsmc anyhow which reads the file and rereads if it changes under dsmc's hands. -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 21:23 ` Andreas Dilger 2002-07-16 21:38 ` Thunder from the hill 2002-07-17 11:47 ` Matthias Andree @ 2002-07-18 14:50 ` Bill Davidsen 2002-07-18 15:09 ` Rik van Riel 2 siblings, 1 reply; 90+ messages in thread From: Bill Davidsen @ 2002-07-18 14:50 UTC (permalink / raw) To: Andreas Dilger; +Cc: linux-kernel On Tue, 16 Jul 2002, Andreas Dilger wrote: > This is all done already for both LVM and EVMS snapshots. The filesystem > (ext3, reiserfs, XFS, JFS) flushes the outstanding operations and is > frozen, the snapshot is created, and the filesystem becomes active again. > It takes a second or less. Then dump will guarantee 100% correct backups > of the snapshot filesystem. You would have to do a backup on the snapshot > to guarantee 100% correctness even with tar. I think I'm missing a part of this, the "a snapshot is created" sounds a lot like "here a miracle occurs." Where is this snapshot saved? And how do you take it in one sec regardless of f/s size? Is this one of those theoretical things which requires two mirrored copies of the f/s so you will still have RAID-1 after you break one? Or are changes journaled somewhere until the snapshot is transferred to external media? And how do you force applications to stop with their files in a valid state? -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-18 14:50 ` Bill Davidsen @ 2002-07-18 15:09 ` Rik van Riel 0 siblings, 0 replies; 90+ messages in thread From: Rik van Riel @ 2002-07-18 15:09 UTC (permalink / raw) To: Bill Davidsen; +Cc: Andreas Dilger, linux-kernel On Thu, 18 Jul 2002, Bill Davidsen wrote: > I think I'm missing a part of this, the "a snapshot is created" sounds a > lot like "here a miracle occurs." Where is this snapshot saved? And how > do you take it in one sec regardless of f/s size? LVM. Systems like LVM already provide a logical->physical block mapping on disk, so they might as well provide multiple mappings. If the live filesystem writes to a particular disk block, the snapshot will keep referencing the old blocks while the filesystem gets to work on its own data. Copy on Write snapshots for block devices... regards, Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) 2002-07-16 21:06 ` Matthias Andree 2002-07-16 21:23 ` Andreas Dilger @ 2002-07-16 22:19 ` stoffel 2002-07-16 22:33 ` Thunder from the hill ` (2 more replies) 1 sibling, 3 replies; 90+ messages in thread From: stoffel @ 2002-07-16 22:19 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel It's really quite simple in theory to do proper backups. But you need to have application support to make it work in most cases. It would flow like this: 1. lock application(s), flush any outstanding transactions. 2. lock filesystems, flush any outstanding transactions. 3a. lock mirrored volume, flush any outstanding transactions, break mirror. --or-- 3b. snapshot filesystem to another volume. 4. unlock volume 5. unlock filesystem 6. unlock application(s). 7. do backup against quiescent volume/filesystem. In reality, people didn't lock filesystems (remount R/O) unless they had too (ClearCase, Oracle, any DBMS, etc are the exceptions), since the time hit was too much. The chances of getting a bad backup on user home directories or mail spools wasn't worth the extra cost to be sure to get a clean backup. For the exceptions, that's why god made backup windows and such. These days, those windows are miniscule, so the seven steps outlined above are what needs to happen these days for a trully reliable backup of important data. John John Stoffel - Senior Unix Systems Administrator - Lucent Technologies stoffel@lucent.com - http://www.lucent.com - 978-399-0479 ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) 2002-07-16 22:19 ` Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) stoffel @ 2002-07-16 22:33 ` Thunder from the hill 2002-07-18 15:04 ` Bill Davidsen 2002-07-19 15:28 ` Sam Vilain 2 siblings, 0 replies; 90+ messages in thread From: Thunder from the hill @ 2002-07-16 22:33 UTC (permalink / raw) To: stoffel; +Cc: Matthias Andree, linux-kernel Hi, I do it like this: -> Reconfigure port switch to use B server -> Backup A server -> Replay B server journals on A server -> Switch to A server -> Backup B server -> Replay A server journals on B server -> Reconfigure port switch to dynamic mode Regards, Thunder -- (Use http://www.ebb.org/ungeek if you can't decode) ------BEGIN GEEK CODE BLOCK------ Version: 3.12 GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$ N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G e++++ h* r--- y- ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) 2002-07-16 22:19 ` Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) stoffel 2002-07-16 22:33 ` Thunder from the hill @ 2002-07-18 15:04 ` Bill Davidsen 2002-07-18 15:27 ` Rik van Riel 2002-07-18 15:50 ` stoffel 2002-07-19 15:28 ` Sam Vilain 2 siblings, 2 replies; 90+ messages in thread From: Bill Davidsen @ 2002-07-18 15:04 UTC (permalink / raw) To: stoffel; +Cc: Linux Kernel Mailing List On Tue, 16 Jul 2002 stoffel@lucent.com wrote: > 3a. lock mirrored volume, flush any outstanding transactions, break > mirror. > --or-- > 3b. snapshot filesystem to another volume. Good summary. The problem is that 3a either requires a double morror or leaving the f/s un mirrored, and 3b can take a very long time for a big f/s. In general mauch of this can be addressed by only backing up small f/s and using an application backup utility to backup the big stuff. Fortunately the most common problem apps are databases and and they include this capability. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) 2002-07-18 15:04 ` Bill Davidsen @ 2002-07-18 15:27 ` Rik van Riel 2002-07-18 15:50 ` stoffel 1 sibling, 0 replies; 90+ messages in thread From: Rik van Riel @ 2002-07-18 15:27 UTC (permalink / raw) To: Bill Davidsen; +Cc: stoffel, Linux Kernel Mailing List On Thu, 18 Jul 2002, Bill Davidsen wrote: > On Tue, 16 Jul 2002 stoffel@lucent.com wrote: > > > 3a. lock mirrored volume, flush any outstanding transactions, break > > mirror. > > --or-- > > 3b. snapshot filesystem to another volume. > > Good summary. The problem is that 3a either requires a double morror or > leaving the f/s un mirrored, and 3b can take a very long time for a big > f/s. 3b should be fairly quick since you only need to do an in-memory copy of some LVM metadata. Rik -- Bravely reimplemented by the knights who say "NIH". http://www.surriel.com/ http://distro.conectiva.com/ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) 2002-07-18 15:04 ` Bill Davidsen 2002-07-18 15:27 ` Rik van Riel @ 2002-07-18 15:50 ` stoffel 2002-07-18 16:29 ` Bill Davidsen 1 sibling, 1 reply; 90+ messages in thread From: stoffel @ 2002-07-18 15:50 UTC (permalink / raw) To: Bill Davidsen; +Cc: stoffel, Linux Kernel Mailing List Bill> On Tue, 16 Jul 2002 stoffel@lucent.com wrote: >> 3a. lock mirrored volume, flush any outstanding transactions, break >> mirror. >> --or-- >> 3b. snapshot filesystem to another volume. Bill> Good summary. The problem is that 3a either requires a double Bill> morror or leaving the f/s un mirrored, and 3b can take a very Bill> long time for a big f/s. Yup, 3a isn't a totally perfect solution, though triple mirrors (if you can afford them) work well. We actually do this for some servers where we can't afford the application down time of locking the DB for extended times, but we also don't have triple mirrors either. It's a tradeoff. I really prefer 3b, since it's more efficient, faster, and more robust. To snapshot a filesystem, all you need to do is: - create backing store for the snapshot, usually around 10-15% of the size of the original volume. Depends on volatility of data. - lock the app(s). - lock the filesystem and flush pending transactions. - copy the metadata describing the filesystem - insert a COW handler into the FS block write path - mount the snapshot elsewhere - unlock the FS - unlock the app Whenever the app writes a block into the FS, copy the original block to the backing store, then write the new block to storage. All the backups see if the quiescent data store, so it can do a clean backup. When you're done, just unmount the snapshot and delete it, then remove the backing store. There is an overhead for doing this, but it's better than having to unmirror/remirror whole block devices to do a backup. And cheaper in terms of disk space too. Bill> In general mauch of this can be addressed by only backing up Bill> small f/s and using an application backup utility to backup the Bill> big stuff. Fortunately the most common problem apps are Bill> databases and and they include this capability. Define what a small file system is these days, since it could be 100gb for some people. *grin*. It's a matter of making the tools scale well so that the data can be secured properly. To do a proper backup requires that all layers talk to each other, and have some means of doing a RW lock and flush of pending transactions. If you have that, you can do it. If you don't, you need to either goto single user mode, re-mount RO, or pray. John ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) 2002-07-18 15:50 ` stoffel @ 2002-07-18 16:29 ` Bill Davidsen 0 siblings, 0 replies; 90+ messages in thread From: Bill Davidsen @ 2002-07-18 16:29 UTC (permalink / raw) To: stoffel; +Cc: Linux Kernel Mailing List On Thu, 18 Jul 2002 stoffel@lucent.com wrote: > I really prefer 3b, since it's more efficient, faster, and more > robust. To snapshot a filesystem, all you need to do is: > > - create backing store for the snapshot, usually around 10-15% of the > size of the original volume. Depends on volatility of data. > - lock the app(s). > - lock the filesystem and flush pending transactions. > - copy the metadata describing the filesystem > - insert a COW handler into the FS block write path > - mount the snapshot elsewhere > - unlock the FS > - unlock the app > > Whenever the app writes a block into the FS, copy the original block > to the backing store, then write the new block to storage. Okay, other than the overhead and having enough filespace for Tbkup sec (min, hr, day) of operation this is practical. In general most times you would be doing an incremental, and the time would not be much. > Bill> In general mauch of this can be addressed by only backing up > Bill> small f/s and using an application backup utility to backup the > Bill> big stuff. Fortunately the most common problem apps are > Bill> databases and and they include this capability. > > Define what a small file system is these days, since it could be 100gb > for some people. *grin*. It's a matter of making the tools scale > well so that the data can be secured properly. Obviously a small f/s is one you can backup without operator intervantion to change media and in a reasonable time, which might be 10min..few hours depending on your taste. That's kind of my rule of thumb, you're welcome to suggest others, but if someone has to change media I can't call it small any more. > To do a proper backup requires that all layers talk to each other, and > have some means of doing a RW lock and flush of pending transactions. > If you have that, you can do it. If you don't, you need to either > goto single user mode, re-mount RO, or pray. With some people, pray or ignore the problem are popular. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) 2002-07-16 22:19 ` Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) stoffel 2002-07-16 22:33 ` Thunder from the hill 2002-07-18 15:04 ` Bill Davidsen @ 2002-07-19 15:28 ` Sam Vilain 2 siblings, 0 replies; 90+ messages in thread From: Sam Vilain @ 2002-07-19 15:28 UTC (permalink / raw) To: stoffel; +Cc: matthias.andree, linux-kernel stoffel@lucent.com wrote: > 1. lock application(s), flush any outstanding transactions. > 2. lock filesystems, flush any outstanding transactions. > 3a. lock mirrored volume, flush any outstanding transactions, break > mirror. > 3b. snapshot filesystem to another volume. Or, to avoid the penalty of locking everything and bringing it down and stuff: 1. set a flag. 2. start backing up blocks (read them raw of course, don't want to load those stressed higher level systems) 3. If something wants to write to a block, quickly back up the old contents of the block before you write the new contents. Unless of course you've already backed up that block. Of course, step 3 does place a bit more unschedulable load on the disk. Heck, when the backups have just started, you're doubling the latency of the devices. You can avoid this with a transaction journal; in fact, the cockier RDBMSes out there (eg, DMSII) don't even bother to do this and assume that your transaction journal is on a mirrored device - and hence there's no point in backing up the old data, you just want to do one sweep of the disk - and replay the journal to get current. (note: implicit assumption: you're dealing with applications using synchronous I/O, where it needs to be written to all mirrors before it's trusted to be stored) Ah, moot points - the Linux MD/LVM drivers are far too unsophisticated to have journal devices ;-) -- Sam Vilain, sam@vilain.net WWW: http://sam.vilain.net/ 7D74 2A09 B2D3 C30F F78E GPG: http://sam.vilain.net/sam.asc 278A A425 30A9 05B5 2F13 Law of Computability Applied to Social Sciences: If at first you don't suceed, transform your data set. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 12:53 ` Matthias Andree 2002-07-16 13:05 ` Christoph Hellwig @ 2002-07-17 18:51 ` bill davidsen 2002-07-18 9:32 ` Matthias Andree 1 sibling, 1 reply; 90+ messages in thread From: bill davidsen @ 2002-07-17 18:51 UTC (permalink / raw) To: linux-kernel In article <20020716125301.GI4576@merlin.emma.line.org>, Matthias Andree <matthias.andree@stud.uni-dortmund.de> wrote: | dsmc fstat()s the file it is currently reading regularly and retries the | dump as the changes, and gives up if it is updated too often. Not sure | about the server side, and certainly not a useful option for sequential | devices that you directly write on. Looks like a cache for the biggest | file is necessary. Which doesn't address the issue of data in files A, B and C, with indices in X and Y. This only works if you flush and freeze all the files at one time, making a perfect backup of one at a time results in corruption if the database is busy. My favorite example is usenet news on INN, a bunch of circular spools, a linear history with two index files, 30-40k overview files, and all of it changing with perhaps 3.5MB/sec data and 20-50/sec index writes. Far better done with an application backup! The point is, backups are hard, for many systems dump is optimal because it's fast. After that I like cpio (-Hcrc) but that's personal preference. All have fail cases on volatile data. -- bill davidsen <davidsen@tmr.com> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-17 18:51 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks bill davidsen @ 2002-07-18 9:32 ` Matthias Andree 0 siblings, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-18 9:32 UTC (permalink / raw) To: linux-kernel On Wed, 17 Jul 2002, bill davidsen wrote: > In article <20020716125301.GI4576@merlin.emma.line.org>, > Matthias Andree <matthias.andree@stud.uni-dortmund.de> wrote: > > | dsmc fstat()s the file it is currently reading regularly and retries the > | dump as the changes, and gives up if it is updated too often. Not sure > | about the server side, and certainly not a useful option for sequential > | devices that you directly write on. Looks like a cache for the biggest > | file is necessary. > > Which doesn't address the issue of data in files A, B and C, with > indices in X and Y. This only works if you flush and freeze all the > files at one time, making a perfect backup of one at a time results in > corruption if the database is busy. Right, but this would have to be taken up with Tivoli "do snapshot as dsmc starts, backup from snapshot and discard snapshot on exit" > My favorite example is usenet news on INN, a bunch of circular spools, a > linear history with two index files, 30-40k overview files, and all of > it changing with perhaps 3.5MB/sec data and 20-50/sec index writes. Far > better done with an application backup! In that case, when you are restoring from backups, you can also regenerate index files (at least with tradspool, I never looked at the "News in Dosen" aggregated spools like CNFS or whatever). It's really hard if you have .dir/.pag style dbm data bases that don't mirror some other single-file format. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 12:30 ` Alan Cox 2002-07-15 12:02 ` Sam Vilain @ 2002-07-15 12:09 ` Matti Aarnio 1 sibling, 0 replies; 90+ messages in thread From: Matti Aarnio @ 2002-07-15 12:09 UTC (permalink / raw) To: Sam Vilain; +Cc: Dax Kelson, linux-kernel On Mon, Jul 15, 2002 at 01:30:51PM +0100, Alan Cox wrote: > On Mon, 2002-07-15 at 09:26, Sam Vilain wrote: > > You are testing for a mail server - how many mailboxes are in your spool > > directory for the tests? Try it with about five to ten thousand > > mailboxes and see how your results vary. > > If your mail server can't get heirarchical mail spools right, get one > that can. Long ago (10-15 internet-years ago..) I followed testing of FFS-family of filesystems in Squid cache. We noticed at Solaris machines using UFS, than when the directory data size grew above the number of blocks directly addressable by the direct-index pointers in the i-node, system speed plummeted. (Or perhaps it was something a bit smaller, like 32 kB) Consider: 4 kB block size, 12 direct indexes: 48 kB directory size. Spend 16 bytes for each file name + auxiliary data: 3000 files/subdirs Optimal would be to store the files inside only the first block, e.g. the directory shall not grow over 4k (or 1k, or ..) Name subdirs as: 00 thru 7F (128+2, 12 bytes ?) Possibly do that in 2 layers: 128^2 = 16384 subdirs, each with 50 long named users (even more files?): 820 000 users. Tune the subdir hashing function to suit your application, and you should be happy. Putting all your eggs in one basket (files in one directory) is not a smart thing. > Alan /Matti Aarnio ^ permalink raw reply [flat|nested] 90+ messages in thread
[parent not found: <20020712162306$aa7d@traf.lcs.mit.edu>]
[parent not found: <mit.lcs.mail.linux-kernel/20020712162306$aa7d@traf.lcs.mit.edu>]
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks [not found] ` <mit.lcs.mail.linux-kernel/20020712162306$aa7d@traf.lcs.mit.edu> @ 2002-07-15 15:22 ` Patrick J. LoPresti 2002-07-15 17:31 ` Chris Mason ` (3 more replies) 0 siblings, 4 replies; 90+ messages in thread From: Patrick J. LoPresti @ 2002-07-15 15:22 UTC (permalink / raw) To: linux-kernel Consider this argument: Given: On ext3, fsync() of any file on a partition commits all outstanding transactions on that partition to the log. Given: data=ordered forces pending data writes for a file to happen before related transactions are committed to the log. Therefore: With data=ordered, fsync() of any file on a partition syncs the outstanding writes of EVERY file on that partition. Is this argument correct? If so, it suggests that data=ordered is actually the *worst* possible journalling mode for a mail spool. One other thing. I think this statement is misleading: IF your server is stable and not prone to crashing, and/or you have the write cache on your hard drives battery backed, you should strongly consider using the writeback journaling mode of Ext3 versus ordered. This makes it sound like data=writeback is somehow unsafe when machines crash. I do not think this is true. If your application (e.g., Postfix) is written correctly (which it is), so it calls fsync() when it is supposed to, then data=writeback is *exactly* as safe as any other journalling mode. "Battery backed caches" and the like have nothing to do with it. And if your application is written incorrectly, then other journalling modes will reduce but not eliminate the chances for things to break catastrophically on a crash. So if the partition is dedicated to correct applications, like a mail spool is, then data=writeback is perfectly safe. If it is faster, too, then it really is a no-brainer. - Pat ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 15:22 ` Patrick J. LoPresti @ 2002-07-15 17:31 ` Chris Mason 2002-07-15 18:33 ` Matthias Andree ` (2 subsequent siblings) 3 siblings, 0 replies; 90+ messages in thread From: Chris Mason @ 2002-07-15 17:31 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: linux-kernel On Mon, 2002-07-15 at 11:22, Patrick J. LoPresti wrote: > Consider this argument: > > Given: On ext3, fsync() of any file on a partition commits all > outstanding transactions on that partition to the log. > > Given: data=ordered forces pending data writes for a file to happen > before related transactions are committed to the log. > > Therefore: With data=ordered, fsync() of any file on a partition > syncs the outstanding writes of EVERY file on that > partition. > > Is this argument correct? If so, it suggests that data=ordered is > actually the *worst* possible journalling mode for a mail spool. > Yes. In practice this doesn't hurt as much as it could, because ext3 does a good job of letting more writers come in before forcing the commit. What hurts you is when a forced commit comes in the middle of creating the file. A data write that could have been contiguous gets broken into two or more writes instead. > One other thing. I think this statement is misleading: > > IF your server is stable and not prone to crashing, and/or you > have the write cache on your hard drives battery backed, you > should strongly consider using the writeback journaling mode of > Ext3 versus ordered. > > This makes it sound like data=writeback is somehow unsafe when > machines crash. I do not think this is true. If your application > (e.g., Postfix) is written correctly (which it is), so it calls > fsync() when it is supposed to, then data=writeback is *exactly* as > safe as any other journalling mode. Almost. data=writeback makes it possible for the old contents of a block to end up in a newly grown file. There are a few ways this can screw you up: 1) that newly grown file is someone's inbox, and the old contents of the new block include someone else's private message. 2) That newly grown file is a control file for the application, and the application expects it to contain valid data within (think sendmail). > "Battery backed caches" and the > like have nothing to do with it. Nope, battery backed caches don't make data=writeback more or less safe (with respect to the data anyway). They do make data=ordered and data=journal more safe. > And if your application is written > incorrectly, then other journalling modes will reduce but not > eliminate the chances for things to break catastrophically on a crash. > > So if the partition is dedicated to correct applications, like a mail > spool is, then data=writeback is perfectly safe. If it is faster, > too, then it really is a no-brainer. For mail servers, data=journal is your friend. ext3 sometimes needs a bigger log for it (reiserfs data=journal patches don't), but the performance increase can be significant. -chris ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 15:22 ` Patrick J. LoPresti 2002-07-15 17:31 ` Chris Mason @ 2002-07-15 18:33 ` Matthias Andree [not found] ` <20020715173337$acad@traf.lcs.mit.edu> 2002-07-16 7:07 ` Dax Kelson 3 siblings, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-15 18:33 UTC (permalink / raw) To: linux-kernel On Mon, 15 Jul 2002, Patrick J. LoPresti wrote: > One other thing. I think this statement is misleading: > > IF your server is stable and not prone to crashing, and/or you > have the write cache on your hard drives battery backed, you > should strongly consider using the writeback journaling mode of > Ext3 versus ordered. > > This makes it sound like data=writeback is somehow unsafe when > machines crash. I do not think this is true. If your application Well, if your fsync() completes... > (e.g., Postfix) is written correctly (which it is), so it calls > fsync() when it is supposed to, then data=writeback is *exactly* as > safe as any other journalling mode. "Battery backed caches" and the > like have nothing to do with it. And if your application is written > incorrectly, then other journalling modes will reduce but not > eliminate the chances for things to break catastrophically on a crash. ...then you're right. If the machine crashes amidst the fsync() operation, but has scheduled meta data before file contents, then journal recovery can present you a file that contains bogus data which will confuse some applications. I believe Postfix will recover from this condition either way, see its file is hosed and ignore or discard it (depending on what it is), but software that blindly relies on a special format without checking will barf. All of this assumes two things: 1. the application actually calls fsync() 2. the application can detect if fsync() succeeded before the crash (like fsync -> fchmod -> fsync, structured file contents, whatever). > So if the partition is dedicated to correct applications, like a mail > spool is, then data=writeback is perfectly safe. If it is faster, > too, then it really is a no-brainer. These ordering promises also apply to applications that do not call fsync() or that cannot detect hosed files. Been there, seen that, with CVS on unpatched ReiserFS as of Linux-2.4.19-presomething: suddenly one ,v file contained NUL blocks. The server barfed, the (remote!) client segfaulted... yes, it's almost as bad as it can get. Not catastrophic, tape backup available, but it gave some time to restore the file and investigate this issue nonetheless. It boiled down to "nobody's fault, but missing feature". With data=ordered or data=journal, I would have either had my old ,v file around or a proper new one. I'm now using Chris Mason's data-logging patches to try and see how things work out, I had one crash with an old version, then updated to the -11 version and have yet to see something break again. I'd certainly appreciate if these patches were merged early in 2.4.20-pre so they get some testing and can be in 2.4.20 and Linux had two file systems with data=ordered to choose from. Disclaimer: I don't know anything except the bare existence, about XFS or JFS. Feel free to add comments. -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
[parent not found: <20020715173337$acad@traf.lcs.mit.edu>]
[parent not found: <mit.lcs.mail.linux-kernel/20020715173337$acad@traf.lcs.mit.edu>]
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks [not found] ` <mit.lcs.mail.linux-kernel/20020715173337$acad@traf.lcs.mit.edu> @ 2002-07-15 19:13 ` Patrick J. LoPresti 2002-07-15 20:55 ` Matthias Andree 2002-07-15 21:14 ` Chris Mason 0 siblings, 2 replies; 90+ messages in thread From: Patrick J. LoPresti @ 2002-07-15 19:13 UTC (permalink / raw) To: linux-kernel Chris Mason <mason@suse.com> writes: > > One other thing. I think this statement is misleading: > > > > IF your server is stable and not prone to crashing, and/or you > > have the write cache on your hard drives battery backed, you > > should strongly consider using the writeback journaling mode of > > Ext3 versus ordered. > > > > This makes it sound like data=writeback is somehow unsafe when > > machines crash. I do not think this is true. If your application > > (e.g., Postfix) is written correctly (which it is), so it calls > > fsync() when it is supposed to, then data=writeback is *exactly* as > > safe as any other journalling mode. > > Almost. data=writeback makes it possible for the old contents of a > block to end up in a newly grown file. Only if the application is already broken. > There are a few ways this can screw you up: > > 1) that newly grown file is someone's inbox, and the old contents of the > new block include someone else's private message. > > 2) That newly grown file is a control file for the application, and the > application expects it to contain valid data within (think sendmail). In a correctly-written application, neither of these things can happen. (See my earlier message today on fsync() and MTAs.) To get a file onto disk reliably, the application must 1) flush the data, and then 2) flush a "validity" indicator. This could be a sequence like: create temp file flush data to temp file rename temp file flush rename operation In this sequence, the file's existence under a particular name is the indicator of its validity. If you skip either of these flush operations, you are not behaving reliably. Skipping the first flush means the validity indicator might hit the disk before the data; so after a crash, you might see invalid data in an allegedly valid file. Skipping the second flush means you do not know that the validity indicator has been set, so you cannot report success to whoever is waiting for this "reliable write" to happen. It is possible to make an application which relies on data=ordered semantics; for example, skipping the "flush data to temp file" step above. But such an application would be broken for every version of Unix *except* Linux in data=ordered mode. I would call that an incorrect application. > Nope, battery backed caches don't make data=writeback more or less safe > (with respect to the data anyway). They do make data=ordered and > data=journal more safe. A theorist would say that "more safe" is a sloppy concept. Either an operation is safe or it is not. As I said in my last message, data=ordered (and data=journal) can reduce the risk for poorly written apps. But they cannot eliminate that risk, and for a correctly written app, data=writeback is 100% as safe. - Pat ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 19:13 ` Patrick J. LoPresti @ 2002-07-15 20:55 ` Matthias Andree 2002-07-15 21:23 ` Patrick J. LoPresti 2002-07-15 22:55 ` Alan Cox 2002-07-15 21:14 ` Chris Mason 1 sibling, 2 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-15 20:55 UTC (permalink / raw) To: linux-kernel On Mon, 15 Jul 2002, Patrick J. LoPresti wrote: > In a correctly-written application, neither of these things can > happen. (See my earlier message today on fsync() and MTAs.) To get a > file onto disk reliably, the application must 1) flush the data, and > then 2) flush a "validity" indicator. This could be a sequence like: > > create temp file > flush data to temp file > rename temp file > flush rename operation > > In this sequence, the file's existence under a particular name is the > indicator of its validity. Assume that most applications are broken then. I assume that most will just call close() or fclose() and exit() right away. Does fclose() imply fsync()? Some applications will not even check the [f]close() return value... > It is possible to make an application which relies on data=ordered > semantics; for example, skipping the "flush data to temp file" step > above. But such an application would be broken for every version of > Unix *except* Linux in data=ordered mode. I would call that an > incorrect application. Or very specific, at least. > > Nope, battery backed caches don't make data=writeback more or less safe > > (with respect to the data anyway). They do make data=ordered and > > data=journal more safe. > > A theorist would say that "more safe" is a sloppy concept. Either an > operation is safe or it is not. As I said in my last message, > data=ordered (and data=journal) can reduce the risk for poorly written > apps. But they cannot eliminate that risk, and for a correctly > written app, data=writeback is 100% as safe. IF that application uses a marker to mark completion. If it does not, data=ordered will be the safe bet, regardless of fsync() or not. The machine can crash BEFORE the fsync() is called. -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 20:55 ` Matthias Andree @ 2002-07-15 21:23 ` Patrick J. LoPresti 2002-07-15 21:38 ` Thunder from the hill 2002-07-15 21:59 ` Ketil Froyn 2002-07-15 22:55 ` Alan Cox 1 sibling, 2 replies; 90+ messages in thread From: Patrick J. LoPresti @ 2002-07-15 21:23 UTC (permalink / raw) To: linux-kernel Matthias Andree <matthias.andree@stud.uni-dortmund.de> writes: > I assume that most will just call close() or fclose() and exit() right > away. Does fclose() imply fsync()? Not according to my close(2) man page: A successful close does not guarantee that the data has been successfully saved to disk, as the kernel defers writes. It is not common for a filesystem to flush the buffers when the stream is closed. If you need to be sure that the data is physically stored use fsync(2). (It will depend on the disk hardware at this point.) Note that this means writing a truly reliable shell or Perl script is tricky. I suppose you can "use POSIX qw(fsync);" in Perl. But what do you do for a shell script? /bin/sync :-) ? > Some applications will not even check the [f]close() return value... Such applications are broken, of course. > > It is possible to make an application which relies on data=ordered > > semantics; for example, skipping the "flush data to temp file" step > > above. But such an application would be broken for every version of > > Unix *except* Linux in data=ordered mode. I would call that an > > incorrect application. > > Or very specific, at least. Hm. Does BSD with soft updates guarantee anything about write ordering on fsync()? In particular, does it promise to commit the data before the metadata? > > A theorist would say that "more safe" is a sloppy concept. Either an > > operation is safe or it is not. As I said in my last message, > > data=ordered (and data=journal) can reduce the risk for poorly written > > apps. But they cannot eliminate that risk, and for a correctly > > written app, data=writeback is 100% as safe. > > IF that application uses a marker to mark completion. If it does not, > data=ordered will be the safe bet, regardless of fsync() or not. The > machine can crash BEFORE the fsync() is called. Without marking completion, there is no safe bet. Without calling fsync(), you *never* know when the data will hit the disk. It is very hard to build a reliable system that way... For an MTA, for example, you can never safely inform the remote mailer that you have accepted the message. But this problem goes beyond MTAs; very few applications live in a vacuum. Reliable systems are tricky. I guess this is why Oracle and Sybase make all that money. - Pat ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:23 ` Patrick J. LoPresti @ 2002-07-15 21:38 ` Thunder from the hill 2002-07-16 12:31 ` Matthias Andree 2002-07-15 21:59 ` Ketil Froyn 1 sibling, 1 reply; 90+ messages in thread From: Thunder from the hill @ 2002-07-15 21:38 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: linux-kernel Hi, On 15 Jul 2002, Patrick J. LoPresti wrote: > Note that this means writing a truly reliable shell or Perl script is > tricky. I suppose you can "use POSIX qw(fsync);" in Perl. But what do > you do for a shell script? /bin/sync :-) ? Write a binary (/usr/bin/fsync) which opens a fd, fsync it, close it, be done with it. Regards, Thunder -- (Use http://www.ebb.org/ungeek if you can't decode) ------BEGIN GEEK CODE BLOCK------ Version: 3.12 GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$ N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G e++++ h* r--- y- ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:38 ` Thunder from the hill @ 2002-07-16 12:31 ` Matthias Andree 2002-07-16 15:53 ` Thunder from the hill 0 siblings, 1 reply; 90+ messages in thread From: Matthias Andree @ 2002-07-16 12:31 UTC (permalink / raw) To: linux-kernel On Mon, 15 Jul 2002, Thunder from the hill wrote: > Hi, > > On 15 Jul 2002, Patrick J. LoPresti wrote: > > Note that this means writing a truly reliable shell or Perl script is > > tricky. I suppose you can "use POSIX qw(fsync);" in Perl. But what do > > you do for a shell script? /bin/sync :-) ? > > Write a binary (/usr/bin/fsync) which opens a fd, fsync it, close it, be > done with it. Or steal one from FreeBSD (written by Paul Saab), fix the err() function and be done with it. .../usr.bin/fsync/fsync.{1,c} Interesting side note -- mind the O_RDONLY: for (i = 1; i < argc; ++i) { if ((fd = open(argv[i], O_RDONLY)) < 0) err(1, "open %s", argv[i]); if (fsync(fd) != 0) err(1, "fsync %s", argv[1]); close(fd); } ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 12:31 ` Matthias Andree @ 2002-07-16 15:53 ` Thunder from the hill 2002-07-16 19:26 ` Matthias Andree 0 siblings, 1 reply; 90+ messages in thread From: Thunder from the hill @ 2002-07-16 15:53 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel Hi, On Tue, 16 Jul 2002, Matthias Andree wrote: > > Write a binary (/usr/bin/fsync) which opens a fd, fsync it, close it, be > > done with it. > > Or steal one from FreeBSD (written by Paul Saab), fix the err() function > and be done with it. > > .../usr.bin/fsync/fsync.{1,c} > > Interesting side note -- mind the O_RDONLY: > > for (i = 1; i < argc; ++i) { > if ((fd = open(argv[i], O_RDONLY)) < 0) > err(1, "open %s", argv[i]); > > if (fsync(fd) != 0) > err(1, "fsync %s", argv[1]); > close(fd); > } Pretty much the thing I had in mind, except that the close return code is disregarded here... Regards, Thunder -- (Use http://www.ebb.org/ungeek if you can't decode) ------BEGIN GEEK CODE BLOCK------ Version: 3.12 GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$ N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G e++++ h* r--- y- ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 15:53 ` Thunder from the hill @ 2002-07-16 19:26 ` Matthias Andree 2002-07-16 19:38 ` Thunder from the hill 0 siblings, 1 reply; 90+ messages in thread From: Matthias Andree @ 2002-07-16 19:26 UTC (permalink / raw) To: linux-kernel On Tue, 16 Jul 2002, Thunder from the hill wrote: > > if (fsync(fd) != 0) > > err(1, "fsync %s", argv[1]); > > close(fd); > > } > > Pretty much the thing I had in mind, except that the close return code is > disregarded here... Indeed, but OTOH, what error is close to report when the file is opened read-only? ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 19:26 ` Matthias Andree @ 2002-07-16 19:38 ` Thunder from the hill 0 siblings, 0 replies; 90+ messages in thread From: Thunder from the hill @ 2002-07-16 19:38 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel Hi, On Tue, 16 Jul 2002, Matthias Andree wrote: > Indeed, but OTOH, what error is close to report when the file is opened > read-only? Well, you can still get EIO, EINTR, EBADF. Whatever you say, disregarding the close return code is never any good. Regards, Thunder -- (Use http://www.ebb.org/ungeek if you can't decode) ------BEGIN GEEK CODE BLOCK------ Version: 3.12 GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$ N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G e++++ h* r--- y- ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:23 ` Patrick J. LoPresti 2002-07-15 21:38 ` Thunder from the hill @ 2002-07-15 21:59 ` Ketil Froyn 2002-07-15 23:08 ` Matti Aarnio 1 sibling, 1 reply; 90+ messages in thread From: Ketil Froyn @ 2002-07-15 21:59 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: linux-kernel On 15 Jul 2002, Patrick J. LoPresti wrote: > Without calling fsync(), you *never* know when the data will hit the > disk. Doesn't bdflush ensure that data is written to disk within 30 seconds or some tunable number of seconds? Ketil ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:59 ` Ketil Froyn @ 2002-07-15 23:08 ` Matti Aarnio 2002-07-16 12:33 ` Matthias Andree 0 siblings, 1 reply; 90+ messages in thread From: Matti Aarnio @ 2002-07-15 23:08 UTC (permalink / raw) To: Ketil Froyn; +Cc: linux-kernel On Mon, Jul 15, 2002 at 11:59:48PM +0200, Ketil Froyn wrote: > On 15 Jul 2002, Patrick J. LoPresti wrote: > > Without calling fsync(), you *never* know when the data will hit the > > disk. > > Doesn't bdflush ensure that data is written to disk within 30 seconds or > some tunable number of seconds? It TRIES TO, it does not guarantee anything. The MTA systems are an example of software suites which have transaction requirements. The goal has been usually stated as: must not fail to deliver. Practical implementations without full-blown all encompassing transactions will usually mean that the message "will be delivered at least once", e.g. double-delivery can happen. One view to MTA behaviour is moving the message from one substate to another during its processing. These days, usually, the transaction database for MTAs is UNIX filesystem. For ZMailer I have considered (although not actually done - yet) using SleepyCat DB files for the transaction subsystem. There are great challenges in failure compartementalisation, and integrity, when using that kind of integrated database mechanisms. Getting SEGV is potentially _very_ bad thing! > Ketil /Matti Aarnio ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 23:08 ` Matti Aarnio @ 2002-07-16 12:33 ` Matthias Andree 0 siblings, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-16 12:33 UTC (permalink / raw) To: linux-kernel On Tue, 16 Jul 2002, Matti Aarnio wrote: > These days, usually, the transaction database for MTAs is UNIX > filesystem. For ZMailer I have considered (although not actually > done - yet) using SleepyCat DB files for the transaction subsystem. > There are great challenges in failure compartementalisation, and > integrity, when using that kind of integrated database mechanisms. > Getting SEGV is potentially _very_ bad thing! Read: lethal to the spool. Has SleepyCat DB learned to recover from ENOSPC in the meanwhile? I had a db1.85 file corrupt after ENOSPC once... -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 20:55 ` Matthias Andree 2002-07-15 21:23 ` Patrick J. LoPresti @ 2002-07-15 22:55 ` Alan Cox 2002-07-15 21:58 ` Matthias Andree 1 sibling, 1 reply; 90+ messages in thread From: Alan Cox @ 2002-07-15 22:55 UTC (permalink / raw) To: Matthias Andree; +Cc: linux-kernel On Mon, 2002-07-15 at 21:55, Matthias Andree wrote: > I assume that most will just call close() or fclose() and exit() right > away. Does fclose() imply fsync()? It doesn't. > Some applications will not even check the [f]close() return value... We are only interested in reliable code. Anything else is already fatally broken. -- quote -- Not checking the return value of close is a common but nevertheless serious programming error. File system implementations which use techniques as ``write-behind'' to increase performance may lead to write(2) succeeding, although the data has not been written yet. The error status may be reported at a later write operation, but it is guaranteed to be reported on closing the file. Not checking the return value when closing the file may lead to silent loss of data. This can especially be observed with NFS and disk quotas. ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 22:55 ` Alan Cox @ 2002-07-15 21:58 ` Matthias Andree 0 siblings, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-15 21:58 UTC (permalink / raw) To: linux-kernel On Mon, 15 Jul 2002, Alan Cox wrote: > We are only interested in reliable code. Anything else is already > fatally broken. > > -- quote -- > Not checking the return value of close is a common but > nevertheless serious programming error. File system As in 6. on http://www.apocalypse.org/pub/u/paul/docs/commandments.html (The Ten Commandments for C Programmers, by Henry Spencer). ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 19:13 ` Patrick J. LoPresti 2002-07-15 20:55 ` Matthias Andree @ 2002-07-15 21:14 ` Chris Mason 2002-07-15 21:31 ` Patrick J. LoPresti 2002-07-16 12:35 ` Matthias Andree 1 sibling, 2 replies; 90+ messages in thread From: Chris Mason @ 2002-07-15 21:14 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: linux-kernel On Mon, 2002-07-15 at 15:13, Patrick J. LoPresti wrote: > > 1) that newly grown file is someone's inbox, and the old contents of the > > new block include someone else's private message. > > > > 2) That newly grown file is a control file for the application, and the > > application expects it to contain valid data within (think sendmail). > > In a correctly-written application, neither of these things can > happen. (See my earlier message today on fsync() and MTAs.) To get a > file onto disk reliably, the application must 1) flush the data, and > then 2) flush a "validity" indicator. This could be a sequence like: > > create temp file > flush data to temp file > rename temp file > flush rename operation Yes, most mtas do this for queue files, I'm not sure how many do it for the actual spool file. mail server authors are more than welcome to recommend the best safety/performance combo for their product, and to ask the FS guys which combinations are safe. -chris ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:14 ` Chris Mason @ 2002-07-15 21:31 ` Patrick J. LoPresti 2002-07-15 22:12 ` Richard A Nelson 2002-07-16 1:02 ` Lawrence Greenfield 2002-07-16 12:35 ` Matthias Andree 1 sibling, 2 replies; 90+ messages in thread From: Patrick J. LoPresti @ 2002-07-15 21:31 UTC (permalink / raw) To: Chris Mason; +Cc: linux-kernel Chris Mason <mason@suse.com> writes: > Yes, most mtas do this for queue files, I'm not sure how many do it for > the actual spool file. Maybe the control files are small enough to fit in one disk block, making the operations atomic in practice. Or something. > mail server authors are more than welcome to recommend the best > safety/performance combo for their product, and to ask the FS guys > which combinations are safe. Yeah, but it's a shame if those combinations require performance hits like "synchronous directory updates" or, worse, "fsync() == sync()". I really wish MTA authors would just support Linux's "fsync the directory" approach. It is simple, reliable, and fast. Yes, it does require Linux-specific support in the application, but that's what application authors should expect when there is a gap in the standards. - Pat ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:31 ` Patrick J. LoPresti @ 2002-07-15 22:12 ` Richard A Nelson 2002-07-16 1:02 ` Lawrence Greenfield 1 sibling, 0 replies; 90+ messages in thread From: Richard A Nelson @ 2002-07-15 22:12 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: Chris Mason, linux-kernel On 15 Jul 2002, Patrick J. LoPresti wrote: > I really wish MTA authors would just support Linux's "fsync the > directory" approach. It is simple, reliable, and fast. Yes, it does > require Linux-specific support in the application, but that's what > application authors should expect when there is a gap in the > standards. This is exactly what sendmail did in its 8.12.0 release (2001/09/08) -- Rick Nelson "...very few phenomena can pull someone out of Deep Hack Mode, with two noted exceptions: being struck by lightning, or worse, your *computer* being struck by lightning." (By Matt Welsh) ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:31 ` Patrick J. LoPresti 2002-07-15 22:12 ` Richard A Nelson @ 2002-07-16 1:02 ` Lawrence Greenfield [not found] ` <mit.lcs.mail.linux-kernel/200207160102.g6G12BiH022986@lin2.andrew.cmu.edu> 1 sibling, 1 reply; 90+ messages in thread From: Lawrence Greenfield @ 2002-07-16 1:02 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: linux-kernel From: "Patrick J. LoPresti" <patl@curl.com> Date: 15 Jul 2002 17:31:07 -0400 [...] I really wish MTA authors would just support Linux's "fsync the directory" approach. It is simple, reliable, and fast. Yes, it does require Linux-specific support in the application, but that's what application authors should expect when there is a gap in the standards. Actually, it's not all that simple (you have to find the enclosing directories of any files you're modifying, which might require string manipulation) or necessarily all that fast (you're doubling the number of system calls and now the application is imposing an ordering on the filesystem that didn't exist before). It's only necessary for ext2. Modern Linux filesystems (such as ext3 or reiserfs) don't require it. Finally: ext2 isn't safe even if you do call fsync() on the directory! Let's consider: some filesystem operation modifies two different blocks. This operation is safe if block A is written before block B. . FFS guarantees this by performing the writes synchronously: block A is written when it is changed, followed by block B when it is changed. . Journalling filesystems (ext3, reiserfs) guarantee this by journalling the operation and forcing that journal entry to disk before either A or B can be modified. . What does ext2 do (in the default mode)? It modifies A, it modifies B, and then leaves it up to the buffer cache to write them back---and the buffer cache might decide to write B before A. We're finally getting to some decent shared semantics on filesystems. Reiserfs, ext3, FFS w/ softupdates, vxfs, etc., all work with just fsync()ing the file (though an fsync() is required after a link() or rename() operation). Let's encourage all filesystems to provide these semantics and make it slightly easier on us stupid application programmers. Larry ^ permalink raw reply [flat|nested] 90+ messages in thread
[parent not found: <mit.lcs.mail.linux-kernel/200207160102.g6G12BiH022986@lin2.andrew.cmu.edu>]
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks [not found] ` <mit.lcs.mail.linux-kernel/200207160102.g6G12BiH022986@lin2.andrew.cmu.edu> @ 2002-07-16 1:43 ` Patrick J. LoPresti 2002-07-16 1:56 ` Thunder from the hill ` (2 more replies) 0 siblings, 3 replies; 90+ messages in thread From: Patrick J. LoPresti @ 2002-07-16 1:43 UTC (permalink / raw) To: linux-kernel Lawrence Greenfield <leg+@andrew.cmu.edu> writes: > Actually, it's not all that simple (you have to find the enclosing > directories of any files you're modifying, which might require string > manipulation) No, you have to find the directories you are modifying. And the application knows darn well which directories it is modifying. Don't speculate. Show some sample code, and let's see how hard it would be to use the "Linux way". I am betting on "not hard at all". > or necessarily all that fast (you're doubling the number of system > calls and now the application is imposing an ordering on the > filesystem that didn't exist before). No, you are not doubling the number of system calls. As I have tried to point out repeatedly, doing this stuff reliably and portably already requires a sequence like this: write data flush data write "validity" indicator (e.g., rename() or fchmod()) flush validity indicator On Linux, flushing a rename() means calling fsync() on the directory instead of the file. That's it. Doing that instead of fsync'ing the file adds at most two system calls (to open and close the directory), and those can be amortized over many operations on that directory (think "mail spool"). So the system call overhead is non-existent. As for "imposing an ordering on the filesystem that didn't exist before", that is complete nonsense. This is imposing *precisely* the ordering required for reliable operation; no more, no less. Relying on mount options, "chattr +S", or journaling artifacts for your ordering is the inefficient approach; since they impose extra ordering, they can never be faster and will usually be slower. > It's only necessary for ext2. Modern Linux filesystems (such as ext3 > or reiserfs) don't require it. Only because they take the performance hit of flushing the whole log to disk on every fsync(). Combine that with "data=ordered" and see what happens to your performance. (Perhaps "data=ordered" should be called "fsync=sync".) I would rather get back the performance and convince application authors to understand what they are doing. > Finally: ext2 isn't safe even if you do call fsync() on the directory! Wrong. write temp file fsync() temp file rename() temp file to actual file fsync() directory No matter where this crashes, it is perfectly safe on ext2. (If not, ext2 is badly broken.) The worst that can happen after a crash is that the file might exist with both the old name and the new name. But an application can detect this case on startup and clean it up. - Pat ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 1:43 ` Patrick J. LoPresti @ 2002-07-16 1:56 ` Thunder from the hill 2002-07-16 12:47 ` Matthias Andree 2002-07-16 21:09 ` James Antill 2 siblings, 0 replies; 90+ messages in thread From: Thunder from the hill @ 2002-07-16 1:56 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: linux-kernel Hi, On 15 Jul 2002, Patrick J. LoPresti wrote: > Doing that instead of fsync'ing the > file adds at most two system calls (to open and close the directory), Keep the directory fd open all the time, and flush it when needed. This gets rid of the open(dir, dd); fsync(dd); close(dd);, you just have: open(dir, dd); once, then fsync(dd); fsync(dd); ... and then one close(dd); Not too much of an overhead, is it? Regards, Thunder -- (Use http://www.ebb.org/ungeek if you can't decode) ------BEGIN GEEK CODE BLOCK------ Version: 3.12 GCS/E/G/S/AT d- s++:-- a? C++$ ULAVHI++++$ P++$ L++++(+++++)$ E W-$ N--- o? K? w-- O- M V$ PS+ PE- Y- PGP+ t+ 5+ X+ R- !tv b++ DI? !D G e++++ h* r--- y- ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 1:43 ` Patrick J. LoPresti 2002-07-16 1:56 ` Thunder from the hill @ 2002-07-16 12:47 ` Matthias Andree 2002-07-16 21:09 ` James Antill 2 siblings, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-16 12:47 UTC (permalink / raw) To: linux-kernel On Mon, 15 Jul 2002, Patrick J. LoPresti wrote: > On Linux, flushing a rename() means calling fsync() on the directory > instead of the file. That's it. Doing that instead of fsync'ing the > file adds at most two system calls (to open and close the directory), > and those can be amortized over many operations on that directory > (think "mail spool"). So the system call overhead is non-existent. Indeed, but I can also leave the file descriptor open on any file system on any system except SOME of Linux'. (Ok, this precludes systems that don't offer POSIX synchronous completion semantics, but these systems don't nessarily have fsync() either). > ordering required for reliable operation; no more, no less. Relying > on mount options, "chattr +S", or journaling artifacts for your > ordering is the inefficient approach; since they impose extra > ordering, they can never be faster and will usually be slower. It is sometimes the only way, if the application is unaware. I hope I'm not loosening a flame war if I mention qmail now, which is not even softupdates aware. Without chattr +S or mount -o sync, nothing is to be gained. OTOH, where mount -o sync only makes directory updates synchronous, it's not too expensive, which is why the +D approach is still useful there. > > It's only necessary for ext2. Modern Linux filesystems (such as ext3 > > or reiserfs) don't require it. > > Only because they take the performance hit of flushing the whole log > to disk on every fsync(). Combine that with "data=ordered" and see > what happens to your performance. (Perhaps "data=ordered" should be > called "fsync=sync".) I would rather get back the performance and > convince application authors to understand what they are doing. 1. data=ordered is more than fsync=sync. It guarantees that data blocks are flushed before flushing the meta data blocks that reference the data blocks. Try this on ext2fs and lose. 2. sync() is unreliable, it can return control to the caller earlier than what is sound. It can "complete" at any time it desires without having completed. (Probably so it can ever return as new blocks are written by another process, but at least SUS v2 did not detail on this). 3. Application authors do not desire fsync=sync semantics, but they want to rely on "fsync(fd) also syncs recent renames". It comes as a now-guaranteed side effect of how ext3fs works, so I am told. I'm not sure how the ext3fs journal works internally, but it'd fine with all applications if only that part of a file system be synched that is really relevant to the current fsync(fd). No more. It seems as though fsync==sync is an artifact that ext2 also suffers from. -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-16 1:43 ` Patrick J. LoPresti 2002-07-16 1:56 ` Thunder from the hill 2002-07-16 12:47 ` Matthias Andree @ 2002-07-16 21:09 ` James Antill 2 siblings, 0 replies; 90+ messages in thread From: James Antill @ 2002-07-16 21:09 UTC (permalink / raw) To: Lawrence Greenfield, Patrick J. LoPresti; +Cc: linux-kernel "Patrick J. LoPresti" <patl@curl.com> writes: > Lawrence Greenfield <leg+@andrew.cmu.edu> writes: > > > Actually, it's not all that simple (you have to find the enclosing > > directories of any files you're modifying, which might require string > > manipulation) > > No, you have to find the directories you are modifying. And the > application knows darn well which directories it is modifying. > > Don't speculate. Show some sample code, and let's see how hard it > would be to use the "Linux way". I am betting on "not hard at all". I added fsync() on directories to exim-3.31, it took about 2hrs coding and another hours testing it (with strace) to make sure it was doing the right thing. That was from almost never seeing the source before. The only reason it took that long was because that version of exim altered the spool in a couple of different places. Forward porting to 3.951 took about 20minutes IIRC (that version only plays witht he spool in one place). -- # James Antill -- james@and.org :0: * ^From: .*james@and\.org /dev/null ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 21:14 ` Chris Mason 2002-07-15 21:31 ` Patrick J. LoPresti @ 2002-07-16 12:35 ` Matthias Andree 1 sibling, 0 replies; 90+ messages in thread From: Matthias Andree @ 2002-07-16 12:35 UTC (permalink / raw) To: linux-kernel On Mon, 15 Jul 2002, Chris Mason wrote: > On Mon, 2002-07-15 at 15:13, Patrick J. LoPresti wrote: > > > > 1) that newly grown file is someone's inbox, and the old contents of the > > > new block include someone else's private message. > > > > > > 2) That newly grown file is a control file for the application, and the > > > application expects it to contain valid data within (think sendmail). > > > > In a correctly-written application, neither of these things can > > happen. (See my earlier message today on fsync() and MTAs.) To get a > > file onto disk reliably, the application must 1) flush the data, and > > then 2) flush a "validity" indicator. This could be a sequence like: > > > > create temp file > > flush data to temp file > > rename temp file > > flush rename operation > > Yes, most mtas do this for queue files, I'm not sure how many do it for > the actual spool file. mail server authors are more than welcome to Less. For one, Postfix' local(8) daemon relies on synchronous directory update for Maildir spools. For mbox spool, the problem is less prevalent, because spool files usually exist already and fsync() is sufficient (and fsync() is done before local(8) reports success to the queue manager). -- Matthias Andree ^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [ANNOUNCE] Ext3 vs Reiserfs benchmarks 2002-07-15 15:22 ` Patrick J. LoPresti ` (2 preceding siblings ...) [not found] ` <20020715173337$acad@traf.lcs.mit.edu> @ 2002-07-16 7:07 ` Dax Kelson 3 siblings, 0 replies; 90+ messages in thread From: Dax Kelson @ 2002-07-16 7:07 UTC (permalink / raw) To: Patrick J. LoPresti; +Cc: linux-kernel On Mon, 2002-07-15 at 09:22, Patrick J. LoPresti wrote: > One other thing. I think this statement is misleading: > > IF your server is stable and not prone to crashing, and/or you > have the write cache on your hard drives battery backed, you > should strongly consider using the writeback journaling mode of > Ext3 versus ordered. I rewrote that statement on the website. Dax Kelson Guru Labs ^ permalink raw reply [flat|nested] 90+ messages in thread
end of thread, other threads:[~2002-07-19 20:46 UTC | newest] Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2002-07-12 16:21 [ANNOUNCE] Ext3 vs Reiserfs benchmarks Dax Kelson 2002-07-12 17:05 ` Andreas Dilger 2002-07-12 17:26 ` kwijibo 2002-07-12 17:36 ` Andreas Dilger 2002-07-12 20:34 ` Chris Mason 2002-07-13 4:44 ` Daniel Phillips 2002-07-14 20:40 ` Dax Kelson 2002-07-15 8:26 ` Sam Vilain 2002-07-15 12:30 ` Alan Cox 2002-07-15 12:02 ` Sam Vilain 2002-07-15 13:23 ` Alan Cox 2002-07-15 13:40 ` Chris Mason 2002-07-15 19:40 ` Andrew Morton 2002-07-15 15:12 ` Andrea Arcangeli 2002-07-15 16:03 ` Andreas Dilger 2002-07-15 16:12 ` Daniel Phillips 2002-07-15 17:48 ` Sam Vilain 2002-07-15 18:47 ` Mathieu Chouquet-Stringer 2002-07-15 19:26 ` Sam Vilain 2002-07-16 8:18 ` Stelian Pop 2002-07-16 12:22 ` Gerhard Mack 2002-07-16 12:49 ` Stelian Pop 2002-07-16 15:11 ` Gerhard Mack 2002-07-16 15:22 ` Andrea Arcangeli 2002-07-16 15:39 ` Stelian Pop 2002-07-16 19:45 ` Matthias Andree 2002-07-16 20:04 ` Shawn 2002-07-16 20:11 ` Mathieu Chouquet-Stringer 2002-07-16 20:22 ` Shawn 2002-07-16 20:27 ` Mathieu Chouquet-Stringer 2002-07-17 11:45 ` Matthias Andree 2002-07-17 19:02 ` Andreas Dilger 2002-07-18 9:29 ` Matthias Andree 2002-07-19 8:29 ` Matthias Andree 2002-07-19 16:39 ` Andreas Dilger 2002-07-19 20:01 ` Shawn 2002-07-19 20:47 ` Andreas Dilger 2002-07-15 21:14 ` Andreas Dilger 2002-07-17 18:41 ` bill davidsen 2002-07-17 19:47 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks (whither dump?) Lew Wolfgang 2002-07-16 8:15 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks Stelian Pop 2002-07-16 12:27 ` Matthias Andree 2002-07-16 12:43 ` Stelian Pop 2002-07-16 12:53 ` Matthias Andree 2002-07-16 13:05 ` Christoph Hellwig 2002-07-16 19:38 ` Matthias Andree 2002-07-16 19:49 ` Andreas Dilger 2002-07-16 20:11 ` Thunder from the hill 2002-07-16 21:06 ` Matthias Andree 2002-07-16 21:23 ` Andreas Dilger 2002-07-16 21:38 ` Thunder from the hill 2002-07-17 11:47 ` Matthias Andree 2002-07-18 14:50 ` Bill Davidsen 2002-07-18 15:09 ` Rik van Riel 2002-07-16 22:19 ` Backups done right (was [ANNOUNCE] Ext3 vs Reiserfs benchmarks) stoffel 2002-07-16 22:33 ` Thunder from the hill 2002-07-18 15:04 ` Bill Davidsen 2002-07-18 15:27 ` Rik van Riel 2002-07-18 15:50 ` stoffel 2002-07-18 16:29 ` Bill Davidsen 2002-07-19 15:28 ` Sam Vilain 2002-07-17 18:51 ` [ANNOUNCE] Ext3 vs Reiserfs benchmarks bill davidsen 2002-07-18 9:32 ` Matthias Andree 2002-07-15 12:09 ` Matti Aarnio [not found] <20020712162306$aa7d@traf.lcs.mit.edu> [not found] ` <mit.lcs.mail.linux-kernel/20020712162306$aa7d@traf.lcs.mit.edu> 2002-07-15 15:22 ` Patrick J. LoPresti 2002-07-15 17:31 ` Chris Mason 2002-07-15 18:33 ` Matthias Andree [not found] ` <20020715173337$acad@traf.lcs.mit.edu> [not found] ` <mit.lcs.mail.linux-kernel/20020715173337$acad@traf.lcs.mit.edu> 2002-07-15 19:13 ` Patrick J. LoPresti 2002-07-15 20:55 ` Matthias Andree 2002-07-15 21:23 ` Patrick J. LoPresti 2002-07-15 21:38 ` Thunder from the hill 2002-07-16 12:31 ` Matthias Andree 2002-07-16 15:53 ` Thunder from the hill 2002-07-16 19:26 ` Matthias Andree 2002-07-16 19:38 ` Thunder from the hill 2002-07-15 21:59 ` Ketil Froyn 2002-07-15 23:08 ` Matti Aarnio 2002-07-16 12:33 ` Matthias Andree 2002-07-15 22:55 ` Alan Cox 2002-07-15 21:58 ` Matthias Andree 2002-07-15 21:14 ` Chris Mason 2002-07-15 21:31 ` Patrick J. LoPresti 2002-07-15 22:12 ` Richard A Nelson 2002-07-16 1:02 ` Lawrence Greenfield [not found] ` <mit.lcs.mail.linux-kernel/200207160102.g6G12BiH022986@lin2.andrew.cmu.edu> 2002-07-16 1:43 ` Patrick J. LoPresti 2002-07-16 1:56 ` Thunder from the hill 2002-07-16 12:47 ` Matthias Andree 2002-07-16 21:09 ` James Antill 2002-07-16 12:35 ` Matthias Andree 2002-07-16 7:07 ` Dax Kelson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).