linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: ext3 badness in 2.6.0-test2
@ 2003-08-06  6:40 rwhron
  0 siblings, 0 replies; 10+ messages in thread
From: rwhron @ 2003-08-06  6:40 UTC (permalink / raw)
  To: linux-kernel

>>   EXT3-fs error (device md0) in start_transaction: Journal has aborted

> Without the initial message we do not know.

During a dbench 64 run with 2.6.0-test2-mm4 on ext3 /var/log/messages said:

kernel: attempt to access beyond end of device
kernel: hdc1: rw=0, want=1212696656, limit=4096449

fdisk /dev/hdc using sectors for units shows:
Disk /dev/hdc: 16 heads, 63 sectors, 39703 cylinders
Units = sectors of 1 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hdc1            63   4096511   2048224+  83  Linux
/dev/hdc2       4096512  23634575   9769032   83  Linux
/dev/hdc3   *  23634576  40020623   8193024   83  Linux


The console displayed:

Buffer I/O error on device hdc1, logical block 298266
lost page write due to I/O error on hdc1
Buffer I/O error on device hdc1, logical block 298112
lost page write due to I/O error on hdc1
Buffer I/O error on device hdc1, logical block 296626
lost page write due to I/O error on hdc1
Buffer I/O error on device hdc1, logical block 294743
lost page write due to I/O error on hdc1
EXT3-fs error (device hdc1): ext3_free_blocks: Freeing blocks not in datazone - block = 151587081, count = 1
Aborting journal on device hdc1.
ext3_abort called.
EXT3-fs abort (device hdc1): ext3_journal_start: Detected aborted journal
Remounting filesystem read-only
ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device hdc1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hdc1) in ext3_truncate: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device hdc1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hdc1) in ext3_orphan_del: Journal has aborted
ext3_reserve_inode_write: aborting transaction: Journal has aborted in __ext3_journal_get_write_access<2>EXT3-fs error (device hdc1) in ext3_reserve_inode_write: Journal has aborted
EXT3-fs error (device hdc1) in ext3_delete_inode: Journal has aborted

The console did not respond to <Enter>.
The machine was pingable, but would not give an ssh prompt.

Additional /var/log/messages:

Aug  5 20:29:24 mountain kernel: Buffer I/O error on device hdc1, logical block 298266
Aug  5 20:29:24 mountain kernel: lost page write due to I/O error on hdc1
Aug  5 20:29:24 mountain kernel: Buffer I/O error on device hdc1, logical block 298112
Aug  5 20:29:24 mountain kernel: lost page write due to I/O error on hdc1
Aug  5 20:29:24 mountain kernel: Buffer I/O error on device hdc1, logical block 296626
Aug  5 20:29:24 mountain kernel: lost page write due to I/O error on hdc1
Aug  5 20:29:24 mountain kernel: Buffer I/O error on device hdc1, logical block 294743
Aug  5 20:29:24 mountain kernel: lost page write due to I/O error on hdc1
Aug  5 20:29:24 mountain kernel: attempt to access beyond end of device
Aug  5 20:29:24 mountain kernel: hdc1: rw=0, want=1212696656, limit=4096449
Aug  5 20:29:24 mountain kernel: attempt to access beyond end of device
Aug  5 20:29:24 mountain kernel: hdc1: rw=0, want=1212696656, limit=4096449
Aug  5 20:29:24 mountain kernel: attempt to access beyond end of device
Aug  5 20:29:36 mountain kernel: hdc1: rw=0, want=1212696656, limit=4096449
..

Uniprocessor K6/2 with IDE disks.
It did not have a problem with dbench 32 on ext3.
dbench on ext2 ran fine too.
e2fsprogs-1.33.  After e2fsck, filesystem seems okay.


--
Randy Hron
http://home.earthlink.net/~rwhron/kernel/bigbox.html


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-09  1:05             ` Mike Fedyk
@ 2003-08-10 23:44               ` Neil Brown
  0 siblings, 0 replies; 10+ messages in thread
From: Neil Brown @ 2003-08-10 23:44 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel, ext3-users

On Friday August 8, mfedyk@matchmail.com wrote:
> On Sat, Aug 09, 2003 at 10:39:43AM +1000, Neil Brown wrote:
> > -		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> > +		sh = get_active_stripe(conf, new_sector, pd_idx, 0/*(bi->bi_rw&RWA_MASK)*/);
> 
> Wouldn't it be better to remove instead of just commenting out that
> part?

Thew ugliness (hopefuly) reminds me to fix it properly.
I think I can come up with a sensible use for the read-ahead flag, but
I would want to think carefully about it first, and test it somewhat.

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-09  0:39           ` Neil Brown
@ 2003-08-09  1:05             ` Mike Fedyk
  2003-08-10 23:44               ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Mike Fedyk @ 2003-08-09  1:05 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andrew Morton, dan, linux-kernel, ext3-users

On Sat, Aug 09, 2003 at 10:39:43AM +1000, Neil Brown wrote:
> -		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
> +		sh = get_active_stripe(conf, new_sector, pd_idx, 0/*(bi->bi_rw&RWA_MASK)*/);

Wouldn't it be better to remove instead of just commenting out that part?

At first glance it looked like a device by zero error... :-/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-08  1:16         ` Andrew Morton
@ 2003-08-09  0:39           ` Neil Brown
  2003-08-09  1:05             ` Mike Fedyk
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2003-08-09  0:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dan, linux-kernel, ext3-users

On Thursday August 7, akpm@osdl.org wrote:
> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> 
> > So I guess the finger points generally in the direction of raid5.
> > Now I've just got to figure if it is a bug in r5, or some assumption
> > that it makes that is no longer valid (I was briefly suspicious of
> > PF_READAHEAD which could have made a real mess of raid5, but that
> > wouldn't have this symptom)
> 
> The PF_READAHEAD things was a huge bug.  Make sure that it is fixed before
> proceeding.  Linus's tree has the fix.

I found it. It was read-ahead related, but nothing to do with
PF_READAHEAD.

With this patch, my test ran to completion instead of dying at about
th 20% mark.

NeilBrown

=================================================================
Disable raid5 handling of read-ahead

raid5 trys to honour RWA_MASK, but messes it up and can return bad data.
Just ignore RWA_MASK for now.


 ----------- Diffstat output ------------
 ./drivers/md/raid5.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
--- ./drivers/md/raid5.c~current~	2003-08-08 14:37:00.000000000 +1000
+++ ./drivers/md/raid5.c	2003-08-08 14:37:19.000000000 +1000
@@ -1326,7 +1326,7 @@ static int make_request (request_queue_t
 			(unsigned long long)new_sector, 
 			(unsigned long long)logical_sector);
 
-		sh = get_active_stripe(conf, new_sector, pd_idx, (bi->bi_rw&RWA_MASK));
+		sh = get_active_stripe(conf, new_sector, pd_idx, 0/*(bi->bi_rw&RWA_MASK)*/);
 		if (sh) {
 
 			add_stripe_bio(sh, bi, dd_idx, (bi->bi_rw&RW_MASK));



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-08  1:00       ` Neil Brown
@ 2003-08-08  1:16         ` Andrew Morton
  2003-08-09  0:39           ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-08-08  1:16 UTC (permalink / raw)
  To: Neil Brown; +Cc: dan, linux-kernel, ext3-users

Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> On Tuesday August 5, akpm@osdl.org wrote:
> > Neil Brown <neilb@cse.unsw.edu.au> wrote:
> > > ...
> > > Aug  6 15:22:05 adams kernel: EXT3-fs error (device md1): ext3_add_entry: bad entry in directory #41
> > > 009295: rec_len is smaller than minimal - offset=0, inode=3265411686, rec_len=0, name_len=0
> > 
> > It looks like we had a block full of zeroes come back from the device
> > driver.  I find it distinctly fishy how this happens so much with
> > ext3-on-md, and so little with ext3-on-just-a-disk.
> 
> Well, they're not *all* zero.....
> 
> I can reproduce this easily with various configurations of ext3 over
> raid5, and get a similar problem with ext2 over raid5 (corrupt inodes
> rather than directory entries) but ext3 over raid0 is rock-solid.

Good news that it is reproducible.

Have you tried running fsx-linux?  It is good at picking up data loss.

> So I guess the finger points generally in the direction of raid5.
> Now I've just got to figure if it is a bug in r5, or some assumption
> that it makes that is no longer valid (I was briefly suspicious of
> PF_READAHEAD which could have made a real mess of raid5, but that
> wouldn't have this symptom)

The PF_READAHEAD things was a huge bug.  Make sure that it is fixed before
proceeding.  Linus's tree has the fix.  This is the relevant patch:

 drivers/block/ll_rw_blk.c |    2 +-
 fs/buffer.c               |    3 +--
 include/linux/sched.h     |    1 -
 mm/readahead.c            |   11 +++--------
 4 files changed, 5 insertions(+), 12 deletions(-)

diff -puN mm/readahead.c~remove-PF_READAHEAD mm/readahead.c
--- 25/mm/readahead.c~remove-PF_READAHEAD	2003-08-06 19:53:14.000000000 -0700
+++ 25-akpm/mm/readahead.c	2003-08-06 19:53:14.000000000 -0700
@@ -298,15 +298,10 @@ int force_page_cache_readahead(struct ad
 int do_page_cache_readahead(struct address_space *mapping, struct file *filp,
 			unsigned long offset, unsigned long nr_to_read)
 {
-	int ret = 0;
-
-	if (!bdi_read_congested(mapping->backing_dev_info)) {
-		current->flags |= PF_READAHEAD;
-		ret = __do_page_cache_readahead(mapping, filp,
+	if (!bdi_read_congested(mapping->backing_dev_info))
+		return __do_page_cache_readahead(mapping, filp,
 						offset, nr_to_read);
-		current->flags &= ~PF_READAHEAD;
-	}
-	return ret;
+	return 0;
 }
 
 /*
diff -puN fs/buffer.c~remove-PF_READAHEAD fs/buffer.c
--- 25/fs/buffer.c~remove-PF_READAHEAD	2003-08-06 19:53:14.000000000 -0700
+++ 25-akpm/fs/buffer.c	2003-08-06 19:53:14.000000000 -0700
@@ -506,8 +506,7 @@ static void end_buffer_async_read(struct
 		set_buffer_uptodate(bh);
 	} else {
 		clear_buffer_uptodate(bh);
-		if (!(current->flags & PF_READAHEAD))
-			buffer_io_error(bh);
+		buffer_io_error(bh);
 		SetPageError(page);
 	}
 
diff -puN drivers/block/ll_rw_blk.c~remove-PF_READAHEAD drivers/block/ll_rw_blk.c
--- 25/drivers/block/ll_rw_blk.c~remove-PF_READAHEAD	2003-08-06 19:53:14.000000000 -0700
+++ 25-akpm/drivers/block/ll_rw_blk.c	2003-08-06 19:53:14.000000000 -0700
@@ -1833,7 +1833,7 @@ static int __make_request(request_queue_
 
 	barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw);
 
-	ra = bio_flagged(bio, BIO_RW_AHEAD) || current->flags & PF_READAHEAD;
+	ra = bio_flagged(bio, BIO_RW_AHEAD);
 
 again:
 	insert_here = NULL;
diff -puN include/linux/sched.h~remove-PF_READAHEAD include/linux/sched.h
--- 25/include/linux/sched.h~remove-PF_READAHEAD	2003-08-06 19:53:14.000000000 -0700
+++ 25-akpm/include/linux/sched.h	2003-08-06 19:53:14.000000000 -0700
@@ -487,7 +487,6 @@ do { if (atomic_dec_and_test(&(tsk)->usa
 #define PF_SWAPOFF	0x00080000	/* I am in swapoff */
 #define PF_LESS_THROTTLE 0x00100000	/* Throttle me less: I clean memory */
 #define PF_SYNCWRITE	0x00200000	/* I am doing a sync write */
-#define PF_READAHEAD	0x00400000	/* I am doing read-ahead */
 
 #ifdef CONFIG_SMP
 extern int set_cpus_allowed(task_t *p, unsigned long new_mask);

_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-06  6:57     ` Andrew Morton
@ 2003-08-08  1:00       ` Neil Brown
  2003-08-08  1:16         ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2003-08-08  1:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dan, linux-kernel, ext3-users

On Tuesday August 5, akpm@osdl.org wrote:
> Neil Brown <neilb@cse.unsw.edu.au> wrote:
> > ...
> > Aug  6 15:22:05 adams kernel: EXT3-fs error (device md1): ext3_add_entry: bad entry in directory #41
> > 009295: rec_len is smaller than minimal - offset=0, inode=3265411686, rec_len=0, name_len=0
> 
> It looks like we had a block full of zeroes come back from the device
> driver.  I find it distinctly fishy how this happens so much with
> ext3-on-md, and so little with ext3-on-just-a-disk.

Well, they're not *all* zero.....

I can reproduce this easily with various configurations of ext3 over
raid5, and get a similar problem with ext2 over raid5 (corrupt inodes
rather than directory entries) but ext3 over raid0 is rock-solid.

So I guess the finger points generally in the direction of raid5.
Now I've just got to figure if it is a bug in r5, or some assumption
that it makes that is no longer valid (I was briefly suspicious of
PF_READAHEAD which could have made a real mess of raid5, but that
wouldn't have this symptom)

NeilBrown

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-06  6:36   ` Neil Brown
@ 2003-08-06  6:57     ` Andrew Morton
  2003-08-08  1:00       ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-08-06  6:57 UTC (permalink / raw)
  To: Neil Brown; +Cc: dan, linux-kernel, ext3-users

Neil Brown <neilb@cse.unsw.edu.au> wrote:
>
> > Could have been an IO error, or the block/MD/device layer returned
> > incorrect data.  ext3 used to go BUG a lot in the latter case, but nowadays
> > we try to abort the journal and go read-only.
> > 
> > Without the initial message we do not know.
> 
> Can I add a "me too".....

No.  Go away.

> First, I'm using data=journal - is that supposed to work in 2.6 yet?
> 

I think so.  It's much less tested than ordered mode, but some people have
beat upon it.

> I have a raid5 array across a bunch of SCSI drives and a separate scsi
> drive with boot, swap, and a journal partition.
> I have an ext3 filesystem on the raid5 array with an external journal
> on the journal partition.

oh.  Good to hear that external journals still work.

> The raid5 was rebuilding a spare and I was pounding the filesystem
> over NFS using the SPEC SFS benchmark program (ofcourse the raid5
> rebuild killed the performance reported by SFS, but I expected that.
> 
> Shortly after the rebuild finished, I got an ext3 error (see log
> below) and the journal aborted, and then nfsd Oopsed inside ext3.

> ...
> Aug  6 15:22:05 adams kernel: EXT3-fs error (device md1): ext3_add_entry: bad entry in directory #41
> 009295: rec_len is smaller than minimal - offset=0, inode=3265411686, rec_len=0, name_len=0

It looks like we had a block full of zeroes come back from the device
driver.  I find it distinctly fishy how this happens so much with
ext3-on-md, and so little with ext3-on-just-a-disk.


> Aug  6 15:22:05 adams kernel: Remounting filesystem read-only
> Aug  6 15:22:05 adams kernel: Unable to handle kernel NULL pointer dereference at virtual address 00000000

Now that's an ext3 bug. Something like this...

 fs/jbd/transaction.c |   10 ++++++++--
 1 files changed, 8 insertions(+), 2 deletions(-)

diff -puN fs/jbd/transaction.c~ext3-aborted-journal-fix fs/jbd/transaction.c
--- 25/fs/jbd/transaction.c~ext3-aborted-journal-fix	2003-08-05 23:53:16.000000000 -0700
+++ 25-akpm/fs/jbd/transaction.c	2003-08-05 23:56:47.000000000 -0700
@@ -525,12 +525,18 @@ do_get_write_access(handle_t *handle, st
 			int force_copy, int *credits) 
 {
 	struct buffer_head *bh;
-	transaction_t *transaction = handle->h_transaction;
-	journal_t *journal = transaction->t_journal;
+	transaction_t *transaction;
+	journal_t *journal;
 	int error;
 	char *frozen_buffer = NULL;
 	int need_copy = 0;
 
+	if (is_handle_aborted(handle))
+		return -EROFS;
+
+	transaction = handle->h_transaction;
+	journal = transaction->t_journal;
+
 	jbd_debug(5, "buffer_head %p, force_copy %d\n", jh, force_copy);
 
 	JBUFFER_TRACE(jh, "entry");

_


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-04 20:22 ` Andrew Morton
@ 2003-08-06  6:36   ` Neil Brown
  2003-08-06  6:57     ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Neil Brown @ 2003-08-06  6:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Daniel Jacobowitz, linux-kernel, ext3-users

On Monday August 4, akpm@osdl.org wrote:
> Daniel Jacobowitz <dan@debian.org> wrote:
> >
> > I came back this morning and found:
> >   EXT3-fs error (device md0) in start_transaction: Journal has aborted
> >   EXT3-fs error (device md0) in start_transaction: Journal has aborted
> >   EXT3-fs error (device md0) in start_transaction: Journal has aborted
> > 
> > Unfortunately, from the very first one, all writes failed; including all
> > writes to syslog.  So I don't know what happened at the beginning.  Is this
> > more likely to be something internal to ext3, or a problem with the RAID
> > layer?
> 
> Could have been an IO error, or the block/MD/device layer returned
> incorrect data.  ext3 used to go BUG a lot in the latter case, but nowadays
> we try to abort the journal and go read-only.
> 
> Without the initial message we do not know.

Can I add a "me too".....

First, I'm using data=journal - is that supposed to work in 2.6 yet?


I have a raid5 array across a bunch of SCSI drives and a separate scsi
drive with boot, swap, and a journal partition.
I have an ext3 filesystem on the raid5 array with an external journal
on the journal partition.

The raid5 was rebuilding a spare and I was pounding the filesystem
over NFS using the SPEC SFS benchmark program (ofcourse the raid5
rebuild killed the performance reported by SFS, but I expected that.

Shortly after the rebuild finished, I got an ext3 error (see log
below) and the journal aborted, and then nfsd Oopsed inside ext3.

I rebooted and fscked the filesystem and it found nothing interesting
- see output below.

So I suspect ext3 has a problem somewhere.  
I'll see if I can break it again :-)

NeilBrown



Aug  6 15:22:05 adams kernel: EXT3-fs error (device md1): ext3_add_entry: bad entry in directory #41
009295: rec_len is smaller than minimal - offset=0, inode=3265411686, rec_len=0, name_len=0
Aug  6 15:22:05 adams kernel: Aborting journal on device sda4.
Aug  6 15:22:05 adams kernel: ext3_abort called.
Aug  6 15:22:05 adams kernel: EXT3-fs abort (device md1): ext3_journal_start: Detected aborted journ
al
Aug  6 15:22:05 adams kernel: Remounting filesystem read-only
Aug  6 15:22:05 adams kernel: Unable to handle kernel NULL pointer dereference at virtual address 00
000000
Aug  6 15:22:05 adams kernel:  printing eip:
Aug  6 15:22:05 adams kernel: c01b1e61
Aug  6 15:22:05 adams kernel: *pde = 00000000
Aug  6 15:22:05 adams kernel: Oops: 0000 [#1]
Aug  6 15:22:05 adams kernel: CPU:    1
Aug  6 15:22:05 adams kernel: EIP:    0060:[<c01b1e61>]    Not tainted
Aug  6 15:22:05 adams kernel: EFLAGS: 00010286
Aug  6 15:22:05 adams kernel: do_journal_get_write_access: aborting transaction: Journal has aborted
 in __ext3_journal_get_write_access<2>EXT3-fs error (device md1) in ext3_prepare_write: Journal has 
aborted
Aug  6 15:22:05 adams kernel: EXT3-fs error (device md1) in start_transaction: Journal has aborted
Aug  6 15:22:05 adams kernel: EIP is at do_get_write_access+0x11/0x770
Aug  6 15:22:05 adams kernel: eax: e8888a64   ebx: f066f8a4   ecx: 00000004   edx: 00000dab
Aug  6 15:22:05 adams kernel: esi: f2eae000   edi: 00000000   ebp: c46208a4   esp: f19378d4
Aug  6 15:22:05 adams kernel: ds: 007b   es: 007b   ss: 0068
Aug  6 15:22:05 adams kernel: Process nfsd (pid: 732, threadinfo=f1936000 task=f1969000)
Aug  6 15:22:05 adams kernel: Stack: e1f155e4 e1f15d64 e1f15d24 e1f15624 e1f159a4 c95d71a4 e171a364 
f066f8a4 
Aug  6 15:22:05 adams kernel:        00000008 c371d780 f066f8a4 c01634e3 f066f8a4 0000001b 00000000 
00001000 
Aug  6 15:22:05 adams kernel:        00000000 0000001b 00000000 0000001b 00000000 f066f8a4 f2eae000 
f066f8a4 
Aug  6 15:22:05 adams kernel: Call Trace:
Aug  6 15:22:05 adams kernel:  [<c01634e3>] __find_get_block+0x73/0x100
Aug  6 15:22:05 adams kernel:  [<c01b290d>] journal_get_undo_access+0x3d/0x170
Aug  6 15:22:05 adams kernel:  [<c01a2a34>] ext3_try_to_allocate+0xc4/0x240
Aug  6 15:22:05 adams kernel:  [<c01a2db4>] ext3_new_block+0x204/0x740
Aug  6 15:22:05 adams kernel:  [<c0163439>] bh_lru_install+0xb9/0xf0
Aug  6 15:22:05 adams kernel:  [<c01a5a57>] ext3_alloc_block+0x37/0x40
Aug  6 15:22:05 adams kernel:  [<c01a5dfa>] ext3_alloc_branch+0x4a/0x2c0
Aug  6 15:22:05 adams kernel:  [<c0119eb5>] __change_page_attr+0x25/0x1e0
Aug  6 15:22:05 adams kernel:  [<c01a63fc>] ext3_get_block_handle+0x18c/0x340
Aug  6 15:22:05 adams kernel:  [<c0165d3c>] alloc_buffer_head+0x1c/0x50
Aug  6 15:22:05 adams kernel:  [<c0165d61>] alloc_buffer_head+0x41/0x50
Aug  6 15:22:05 adams kernel:  [<c0162e0a>] create_buffers+0x6a/0xc0
Aug  6 15:22:05 adams kernel:  [<c01a6614>] ext3_get_block+0x64/0xb0
Aug  6 15:22:05 adams kernel:  [<c016404b>] __block_prepare_write+0x20b/0x490
Aug  6 15:22:05 adams kernel:  [<c011bc70>] default_wake_function+0x0/0x30
Aug  6 15:22:05 adams kernel:  [<c0164bb4>] block_prepare_write+0x34/0x50
Aug  6 15:22:05 adams kernel:  [<c01a65b0>] ext3_get_block+0x0/0xb0
Aug  6 15:22:05 adams kernel:  [<c01a6bcf>] ext3_prepare_write+0x5f/0x110
Aug  6 15:22:05 adams kernel:  [<c01a65b0>] ext3_get_block+0x0/0xb0
Aug  6 15:22:05 adams kernel:  [<c013f0e2>] generic_file_aio_write_nolock+0x412/0xbd0
Aug  6 15:22:05 adams kernel:  [<c017c0ed>] d_alloc_anon+0x2d/0x240
Aug  6 15:22:05 adams kernel:  [<c034bf0f>] sock_alloc_send_skb+0x2f/0x40
Aug  6 15:22:05 adams kernel:  [<c0366ceb>] ip_append_data+0x6db/0x780
Aug  6 15:22:05 adams kernel:  [<c013f91e>] generic_file_write_nolock+0x7e/0xa0
Aug  6 15:22:05 adams kernel:  [<c038867a>] udp_sendmsg+0x41a/0xb40
Aug  6 15:22:05 adams kernel:  [<c02a4136>] e1000_xmit_frame+0x516/0x680
Aug  6 15:22:05 adams kernel:  [<c01df6b0>] exp_find_key+0x60/0x70
Aug  6 15:22:05 adams kernel:  [<c013fb7c>] generic_file_writev+0x5c/0x80
Aug  6 15:22:05 adams kernel:  [<c016068f>] do_readv_writev+0x23f/0x2d0
Aug  6 15:22:05 adams kernel:  [<c0160040>] do_sync_write+0x0/0xc0
Aug  6 15:22:05 adams kernel:  [<c016108d>] open_private_file+0x9d/0xa0
Aug  6 15:22:05 adams kernel:  [<c01607e8>] vfs_writev+0x58/0x70
Aug  6 15:22:05 adams kernel:  [<c01dbdcf>] nfsd_write+0x11f/0x380
Aug  6 15:22:05 adams kernel:  [<c011bc9a>] default_wake_function+0x2a/0x30
Aug  6 15:22:05 adams kernel:  [<c011bcda>] __wake_up_common+0x3a/0x70
Aug  6 15:22:05 adams kernel:  [<c01d8808>] nfsd_proc_write+0xa8/0x130
Aug  6 15:22:05 adams kernel:  [<c01d7818>] nfsd_dispatch+0xe8/0x1f5
Aug  6 15:22:05 adams kernel:  [<c01d7730>] nfsd_dispatch+0x0/0x1f5
Aug  6 15:22:05 adams kernel:  [<c03b8120>] svc_process+0x480/0x64c
Aug  6 15:22:05 adams kernel:  [<c01d747b>] nfsd+0x26b/0x520
Aug  6 15:22:05 adams kernel:  [<c010b356>] work_resched+0x5/0x16
Aug  6 15:22:05 adams kernel:  [<c01d7210>] nfsd+0x0/0x520
Aug  6 15:22:05 adams kernel:  [<c01d7210>] nfsd+0x0/0x520
Aug  6 15:22:05 adams kernel:  [<c0108e35>] kernel_thread_helper+0x5/0x10
Aug  6 15:22:05 adams kernel: 
Aug  6 15:22:05 adams kernel: Code: 8b 37 c7 44 24 20 00 00 00 00 c7 44 24 1c 00 00 00 00 8d 96 
Aug  6 15:22:05 adams kernel:  <1>Unable to handle kernel NULL pointer dereference at virtual addres
s 00000000
Aug  6 15:22:05 adams kernel:  printing eip:
Aug  6 15:22:05 adams kernel: journal commit I/O error
Aug  6 15:22:05 adams kernel: c01b1e61
Aug  6 15:22:05 adams kernel: journal commit I/O error


-----------------------------------------------------
adams # fsck -n /dev/md1
fsck 1.34-WIP (21-May-2003)
e2fsck 1.34-WIP (21-May-2003)
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/md1 contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Unattached zero-length inode 47710329.  Clear? no

Unattached inode 47710329
Connect to /lost+found? no

Pass 5: Checking group summary information

/dev/md1: ********** WARNING: Filesystem still has errors **********

/dev/md1: 235617/53362688 files (11.7% non-contiguous), 3139703/106699200 blocks
adams # fsck /dev/md1
fsck 1.34-WIP (21-May-2003)
e2fsck 1.34-WIP (21-May-2003)
/dev/md1: recovering journal
/dev/md1: clean, 235617/53362688 files, 3139703/106699200 blocks
adams # fsck /dev/md1
fsck 1.34-WIP (21-May-2003)
e2fsck 1.34-WIP (21-May-2003)
/dev/md1: clean, 235617/53362688 files, 3139703/106699200 blocks
adams # fsck -f /dev/md1
fsck 1.34-WIP (21-May-2003)
e2fsck 1.34-WIP (21-May-2003)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #2532 (8073, counted=8075).
Fix<y>? yes

Free blocks count wrong (103559497, counted=103559499).
Fix<y>? yes

Free inodes count wrong for group #2912 (16263, counted=16264).
Fix<y>? yes

Free inodes count wrong (53127071, counted=53127072).
Fix<y>? yes


/dev/md1: ***** FILE SYSTEM WAS MODIFIED *****
/dev/md1: 235616/53362688 files (11.7% non-contiguous), 3139701/106699200 blocks

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: ext3 badness in 2.6.0-test2
  2003-08-04 14:22 Daniel Jacobowitz
@ 2003-08-04 20:22 ` Andrew Morton
  2003-08-06  6:36   ` Neil Brown
  0 siblings, 1 reply; 10+ messages in thread
From: Andrew Morton @ 2003-08-04 20:22 UTC (permalink / raw)
  To: Daniel Jacobowitz; +Cc: linux-kernel

Daniel Jacobowitz <dan@debian.org> wrote:
>
> I came back this morning and found:
>   EXT3-fs error (device md0) in start_transaction: Journal has aborted
>   EXT3-fs error (device md0) in start_transaction: Journal has aborted
>   EXT3-fs error (device md0) in start_transaction: Journal has aborted
> 
> Unfortunately, from the very first one, all writes failed; including all
> writes to syslog.  So I don't know what happened at the beginning.  Is this
> more likely to be something internal to ext3, or a problem with the RAID
> layer?

Could have been an IO error, or the block/MD/device layer returned
incorrect data.  ext3 used to go BUG a lot in the latter case, but nowadays
we try to abort the journal and go read-only.

Without the initial message we do not know.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* ext3 badness in 2.6.0-test2
@ 2003-08-04 14:22 Daniel Jacobowitz
  2003-08-04 20:22 ` Andrew Morton
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Jacobowitz @ 2003-08-04 14:22 UTC (permalink / raw)
  To: linux-kernel

I came back this morning and found:
  EXT3-fs error (device md0) in start_transaction: Journal has aborted
  EXT3-fs error (device md0) in start_transaction: Journal has aborted
  EXT3-fs error (device md0) in start_transaction: Journal has aborted

Unfortunately, from the very first one, all writes failed; including all
writes to syslog.  So I don't know what happened at the beginning.  Is this
more likely to be something internal to ext3, or a problem with the RAID
layer?

The RAID was able to shut down cleanly and came back up with no errors, and
the ext3 filesystem was tagged as having (just a few) errors on next boot,
so I'm guessing an ext3 problem.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-08-10 23:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-06  6:40 ext3 badness in 2.6.0-test2 rwhron
  -- strict thread matches above, loose matches on Subject: below --
2003-08-04 14:22 Daniel Jacobowitz
2003-08-04 20:22 ` Andrew Morton
2003-08-06  6:36   ` Neil Brown
2003-08-06  6:57     ` Andrew Morton
2003-08-08  1:00       ` Neil Brown
2003-08-08  1:16         ` Andrew Morton
2003-08-09  0:39           ` Neil Brown
2003-08-09  1:05             ` Mike Fedyk
2003-08-10 23:44               ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).