* ext3-2.4-0.9.0 @ 2001-07-06 15:18 Andrew Morton 2001-07-07 22:15 ` ext3-2.4-0.9.0 Neil Brown 0 siblings, 1 reply; 4+ messages in thread From: Andrew Morton @ 2001-07-06 15:18 UTC (permalink / raw) To: lkml; +Cc: Stephen C. Tweedie, Andreas Dilger, Peter J. Braam, ext3-users An update of the ext3 journalling filesystem for 2.4 kernels is available at http://www.uow.edu.au/~andrewm/linux/ext3/ Patches are against 2.4.6-ac1 and 2.4.6. Changes since 0.0.8 include: - Multiplied the version numbering by ten to cater for bugfix releases against the 0.9.0 stream. - The main thrust has been the removal of a number of changes in the core kernel which were required for to support the journalling of data. This has caused some duplication of core code within ext3, but it's not too bad. - A number of cleanups and resyncs with latest ext2. (Thanks, Al). - Reorganised and optimised ext3_write_inode() and the handling of files which were opened O_SYNC. - Move quota operations outside lock_super() - fixes last known source of quota deadlocks in -ac kernels. - Deleted large chunks of debug/development support code. - Improved handling of corner-case errors. - Improved robustness in out-of-memory situations. The last change is probably the most significant - it prevents possible crashes and fs corruption under extreme workloads. - ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ext3-2.4-0.9.0 2001-07-06 15:18 ext3-2.4-0.9.0 Andrew Morton @ 2001-07-07 22:15 ` Neil Brown 2001-07-08 1:05 ` ext3-2.4-0.9.0 Andrew Morton 0 siblings, 1 reply; 4+ messages in thread From: Neil Brown @ 2001-07-07 22:15 UTC (permalink / raw) To: ext3-users; +Cc: lkml, Stephen C. Tweedie, Andreas Dilger, Peter J. Braam On Saturday July 7, andrewm@uow.edu.au wrote: > An update of the ext3 journalling filesystem for 2.4 kernels > is available at > > http://www.uow.edu.au/~andrewm/linux/ext3/ > > Patches are against 2.4.6-ac1 and 2.4.6. I thought it was time to try out ext3 between nfsd and raid5, so I built 2.4.6 plus this patch, and an ext3 filesystem on a largish raid5 volume, exported it (with the "sync" flag), mounted it from another machines with NFSv2, and ran "dbench 4". This produces a live-lock (I think that it the right term). Throughput would drop to zero (determined by watching the counts in /proc/nfs/rpc/nfsd), but could be coaxed along by generating other filesystem activity. I tried nfs over ext3 on a plain ide disc and it worked fine. I tried dbench directly on ext3/raid5 and it worked fine. I tried dbench/nfs/ext2/raid5 and it worked fine. So I think it is some interaction between ext3fs and raid5 triggered by the high rate of "fsync" calls made by nfsd. Naturally I blame ext3 because I know more about raid5 and nfsd :-) One particular aspect of raid5 that *could* be related is that it is very reticent to schedule write requests. It tries to hang on the them as long as possible in the hope of getting more write requests in the same stripe. My guess as to what is happening is that as write request is submitted and then waited-for without an intervening run_task_queue(&tq_disk); When the system is livelocked, all I can tell at the moment (I am at home and the console is at work so I cannot use alt-sysrq) is that kjournal is waiting in wait_on_buffer and an nfsd thread is waiting on the journal. I will try to explore it more deeply next time I am at work, but if there are any suggestions as to what it might be, or how I might more easily find out what is going on, I am all ears. NeilBrown ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ext3-2.4-0.9.0 2001-07-07 22:15 ` ext3-2.4-0.9.0 Neil Brown @ 2001-07-08 1:05 ` Andrew Morton 2001-07-08 6:02 ` ext3-2.4-0.9.0 Neil Brown 0 siblings, 1 reply; 4+ messages in thread From: Andrew Morton @ 2001-07-08 1:05 UTC (permalink / raw) To: ext3-users; +Cc: lkml, Stephen C. Tweedie, Andreas Dilger, Peter J. Braam Neil Brown wrote: > > On Saturday July 7, andrewm@uow.edu.au wrote: > > An update of the ext3 journalling filesystem for 2.4 kernels > > is available at > > > > http://www.uow.edu.au/~andrewm/linux/ext3/ > > > > Patches are against 2.4.6-ac1 and 2.4.6. > > I thought it was time to try out ext3 between nfsd and raid5, so I > built 2.4.6 plus this patch, and an ext3 filesystem on a largish > raid5 volume, exported it (with the "sync" flag), mounted it from > another machines with NFSv2, and ran "dbench 4". > > This produces a live-lock (I think that it the right term). > Throughput would drop to zero (determined by watching the counts in > /proc/nfs/rpc/nfsd), but could be coaxed along by generating other > filesystem activity. > > I tried nfs over ext3 on a plain ide disc and it worked fine. > I tried dbench directly on ext3/raid5 and it worked fine. > I tried dbench/nfs/ext2/raid5 and it worked fine. > > So I think it is some interaction between ext3fs and raid5 triggered > by the high rate of "fsync" calls made by nfsd. Naturally I blame > ext3 because I know more about raid5 and nfsd :-) fsync will cause ext3 to commit the current transaction once all handles against it close - so that will produce rapid bursts of small numbers of writes. > One particular aspect of raid5 that *could* be related is that it is > very reticent to schedule write requests. It tries to hang on the them > as long as possible in the hope of getting more write requests in the > same stripe. My guess as to what is happening is that as write > request is submitted and then waited-for without an intervening > run_task_queue(&tq_disk); Could well be. ext3 will happily feed 2,000 buffers into submit_bh() prior to running tq_disk. Everything else is happy with this, so I blame nfsd and raid5 :) Rapid fsyncs will break this up, however. Does this patch help? --- fs/jbd/commit.c 2001/07/01 04:24:42 1.40 +++ fs/jbd/commit.c 2001/07/08 00:53:42 @@ -202,6 +202,7 @@ spin_unlock(&journal_datalist_lock); unlock_journal(journal); ll_rw_block(WRITE, bufs, wbuf); + run_task_queue(&tq_disk); lock_journal(journal); journal_brelse_array(wbuf, bufs); goto write_out_data; @@ -410,6 +411,7 @@ bh->b_end_io = end_buffer_io_sync; submit_bh(WRITE, bh); } + run_task_queue(&tq_disk); lock_journal(journal); /* Force a new descriptor to be generated next > When the system is livelocked, all I can tell at the moment (I am at > home and the console is at work so I cannot use alt-sysrq) is that > kjournal is waiting in wait_on_buffer and an nfsd thread is waiting on > the journal. That sounds like Something Wierd is going on. wait_on_buffer will unplug and the disks should be going hell-for-leather. > I will try to explore it more deeply next time I am at work, but if > there are any suggestions as to what it might be, or how I might more > easily find out what is going on, I am all ears. > I'll see if I can get it to happen here. Thanks. - ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: ext3-2.4-0.9.0 2001-07-08 1:05 ` ext3-2.4-0.9.0 Andrew Morton @ 2001-07-08 6:02 ` Neil Brown 0 siblings, 0 replies; 4+ messages in thread From: Neil Brown @ 2001-07-08 6:02 UTC (permalink / raw) To: ext3-users; +Cc: lkml, Stephen C. Tweedie, Andreas Dilger, Peter J. Braam On Sunday July 8, andrewm@uow.edu.au wrote: > > Could well be. ext3 will happily feed 2,000 buffers into submit_bh() > prior to running tq_disk. Everything else is happy with this, so I blame > nfsd and raid5 :) Rapid fsyncs will break this up, however. > raid5 is definately happy with large sequences of requests between tq_disk (infact, that is best), but I think I have found a situation where lots of small requests can confuse it. It seems that your intuation about the direction of blame is better than mine :-) Then a write request happens to raid5, the queue is (potentially) plugged, and then the request is (potentially) queued, and there is a window between the two where the queue can be unplugged by another process. If this happens, then the tq_disk run the follows the write request will not wake-up the raid5d, so the raid5 queue will not be run, and the request will just sit there until something else causes raid5d to run. I'm guessing that ext3 imposes more sequencing on requests than ext2 does, and so it is easier for one request being stalled to stall the whole filesystem. In any case, the follow patch against raid5 seems to have relieved the situation, but more testing is underway. So ThankYou to ext3 for helping to find a bug in raid5 :-) NeilBrown --- drivers/md/raid5.c 2001/07/07 06:23:02 1.1 +++ drivers/md/raid5.c 2001/07/08 00:22:52 @@ -66,9 +66,10 @@ BUG(); if (atomic_read(&conf->active_stripes)==0) BUG(); - if (test_bit(STRIPE_DELAYED, &sh->state)) + if (test_bit(STRIPE_DELAYED, &sh->state)) { list_add_tail(&sh->lru, &conf->delayed_list); - else if (test_bit(STRIPE_HANDLE, &sh->state)) { + md_wakeup_thread(conf->thread); + } else if (test_bit(STRIPE_HANDLE, &sh->state)) { list_add_tail(&sh->lru, &conf->handle_list); md_wakeup_thread(conf->thread); } else { @@ -1167,10 +1168,9 @@ raid5_activate_delayed(conf); - if (conf->plugged) { + if (conf->plugged) conf->plugged = 0; - md_wakeup_thread(conf->thread); - } + md_wakeup_thread(conf->thread); spin_unlock_irqrestore(&conf->device_lock, flags); } ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2001-07-08 6:03 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2001-07-06 15:18 ext3-2.4-0.9.0 Andrew Morton 2001-07-07 22:15 ` ext3-2.4-0.9.0 Neil Brown 2001-07-08 1:05 ` ext3-2.4-0.9.0 Andrew Morton 2001-07-08 6:02 ` ext3-2.4-0.9.0 Neil Brown
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).