linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* testing 2.6.14-rc4: unmountable fs after xfs_force_shutdown
@ 2006-02-25 21:16 David Greaves
  2006-02-26  1:23 ` Eric Sandeen
  0 siblings, 1 reply; 3+ messages in thread
From: David Greaves @ 2006-02-25 21:16 UTC (permalink / raw)
  To: linux-xfs, 'linux-kernel@vger.kernel.org', nathans

Hi

Please help.

I was testing 2.6.16-rc4 and seem to have lost my data filesystem :(
I know it's called scratch - but my wife went and stored data on there -
grrr!
Anyway, I'm now in trouble.

I am running XFS over LVM over a 3-disk RAID5.

Something (libata?) caused some bogus read errors (see kernel-ide for
details) which threw 2 drives out of my raid array:

sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 390716735
raid5: Disk failure on sda1, disabling device. Operation continuing on 2
devices
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 390716735
raid5: Disk failure on sdb1, disabling device. Operation continuing on 1
devices
RAID5 conf printout:
 --- rd:3 wd:1 fd:2
 disk 0, o:1, dev:sdd1
 disk 1, o:0, dev:sdb1
 disk 2, o:0, dev:sda1
xfs_force_shutdown(dm-0,0x1) called from line 338 of file
fs/xfs/xfs_rw.c.  Return address = 0xc020c0e9
Filesystem "dm-0": I/O Error Detected.  Shutting down filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)
I/O error in filesystem ("dm-0") meta-data dev dm-0 block
0x640884a       ("xlog_bwrite") error 5 buf count 262144
XFS: failed to locate log tail
XFS: log mount/recovery failed: error 5
XFS: log mount failed

After checking with mdadm --examine I was able to reconstruct the array
using the 2 booted devices as they both had the same event count:
raid5: device sdb1 operational as raid disk 1
raid5: device sda1 operational as raid disk 2
raid5: allocated 3155kB for md1
raid5: raid level 5 set md1 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sda1


I restarted lvm and remounted the drive but it failed like this:
XFS mounting filesystem dm-0
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
ata2: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata2: status=0x51 { DriveReady SeekComplete Error }
sd 1:0:0:0: SCSI error: return code = 0x8000002
sdb: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 390716735
raid5: Disk failure on sdb1, disabling device. Operation continuing on 1
devices
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
ata1: no sense translation for op=0x2a cmd=0x3d status: 0x51
ata1: status=0x51 { DriveReady SeekComplete Error }
sd 0:0:0:0: SCSI error: return code = 0x8000002
sda: Current: sense key: Medium Error
    Additional sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sda, sector 390716735
raid5: Disk failure on sda1, disabling device. Operation continuing on 0
devices
RAID5 conf printout:
 --- rd:3 wd:0 fd:3
 disk 1, o:0, dev:sdb1
 disk 2, o:0, dev:sda1
xfs_force_shutdown(dm-0,0x1) called from line 338 of file
fs/xfs/xfs_rw.c.  Return address = 0xc020c0e9
Filesystem "dm-0": I/O Error Detected.  Shutting down filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)
I/O error in filesystem ("dm-0") meta-data dev dm-0 block
0x640884a       ("xlog_bwrite") error 5 buf count 262144
XFS: failed to locate log tail
XFS: log mount/recovery failed: error 5
XFS: log mount failed

I then then rebooted into 2.6.15 (where it's been happy for weeks and
weeks!)
raid5: device sdb1 operational as raid disk 1
raid5: device sda1 operational as raid disk 2
raid5: allocated 3155kB for md1
raid5: raid level 5 set md1 active with 2 out of 3 devices, algorithm 2
RAID5 conf printout:
 --- rd:3 wd:2 fd:1
 disk 1, o:1, dev:sdb1
 disk 2, o:1, dev:sda1

Setting up LVM Volume Groups...
  Reading all physical volumes.  This may take a while...
  /dev/hda: open failed: No medium found
  Found volume group "video_vg" using metadata type lvm2
  1 logical volume(s) in volume group "video_vg" now active

haze:~# mount /scratch
mount: /dev/video_vg/video_lv: can't read superblock

haze:~# xfs_repair -n /dev/video_vg/video_lv
Phase 1 - find and verify superblock...
superblock read failed, offset 0, size 524288, ag 0, rval 0

fatal error -- Invalid argument

haze:~# lvdisplay /dev/video_vg/video_lv
  --- Logical volume ---
  LV Name                /dev/video_vg/video_lv
  VG Name                video_vg
  LV UUID                cg2yc1-dO6V-3GlY-JnhY-4nXi-V85s-WTIFml
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                372.61 GB
  Current LE             95389
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0


Any sugestions?

David

-- 


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: testing 2.6.14-rc4: unmountable fs after xfs_force_shutdown
  2006-02-25 21:16 testing 2.6.14-rc4: unmountable fs after xfs_force_shutdown David Greaves
@ 2006-02-26  1:23 ` Eric Sandeen
  2006-02-26  9:44   ` Odd LVM behaviour - was " David Greaves
  0 siblings, 1 reply; 3+ messages in thread
From: Eric Sandeen @ 2006-02-26  1:23 UTC (permalink / raw)
  To: David Greaves; +Cc: linux-xfs, 'linux-kernel@vger.kernel.org', nathans

David Greaves wrote:
> haze:~# mount /scratch
> mount: /dev/video_vg/video_lv: can't read superblock
> 
> haze:~# xfs_repair -n /dev/video_vg/video_lv
> Phase 1 - find and verify superblock...
> superblock read failed, offset 0, size 524288, ag 0, rval 0

First, why can't it be read?  any kernel messages as a result?  can you 
do a simple dd -from- this block device?

-Eric

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Odd LVM behaviour - was Re: testing 2.6.14-rc4: unmountable fs after xfs_force_shutdown
  2006-02-26  1:23 ` Eric Sandeen
@ 2006-02-26  9:44   ` David Greaves
  0 siblings, 0 replies; 3+ messages in thread
From: David Greaves @ 2006-02-26  9:44 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: linux-xfs, 'linux-kernel@vger.kernel.org',
	nathans, LVM general discussion and development

If you reply to this, please remove xfs/nathan/eric from cc - it's not
xfs at all, it was (if anything) an lvm bug.

Eric Sandeen wrote:

> David Greaves wrote:
>
>> haze:~# mount /scratch
>> mount: /dev/video_vg/video_lv: can't read superblock
>>
>> haze:~# xfs_repair -n /dev/video_vg/video_lv
>> Phase 1 - find and verify superblock...
>> superblock read failed, offset 0, size 524288, ag 0, rval 0
>
>
> First, why can't it be read?  any kernel messages as a result?  can
> you do a simple dd -from- this block device?

Eric, thanks. A good place to look. The dd just produced 0 bytes.
.....OK - thanks, it's fixed now.

It turns out that the md device stopped before the lvm started. So when
lvm did start it must have seen md1 as 0 bytes.
Then when I re-assembled and ran md1, lvm didn't notice.
Confusingly, the lvm userspace tools *did* notice and gave me 'healthy'
reports (I checked that before mailing - I just didn't actually dd the
lvm device - sorry).
The only clue was ages back in the log:

Adding 1004020k swap on /dev/sdc4.  Priority:-1 extents:1 across:1004020k
md: md1 stopped.
md: bind<sdb1>
md: bind<sda1>
md: bind<sdd1>
device-mapper: 4.4.0-ioctl (2005-01-12) initialised: dm-devel@redhat.com
cdrom: open failed.
device-mapper: dm-linear: Device lookup failed
device-mapper: error adding target to table

So at this point lvm is sick - but why do the userspace tools report as
below?

haze:/usr/src/linux-2.6.15# lvm version
  LVM version:     2.01.04 (2005-02-09)
  Library version: 1.01.04 (2005-08-02)
  Driver version:  4.4.0



These commands were issued cronologically and represent my basic diagnosis.

haze:~# mdadm --assemble --run /dev/md1 /dev/sda1 /dev/sdb1
mdadm: /dev/md1 has been started with 2 drives (out of 3).

haze:~# /etc/init.d/lvm start
Setting up LVM Volume Groups...
  Reading all physical volumes.  This may take a while...
  /dev/hda: open failed: No medium found
  Found volume group "video_vg" using metadata type lvm2
  1 logical volume(s) in volume group "video_vg" now active

(/dev/hda is my CDROM which I know I should exclude from the scan!)
(In retrospect I guess I should have stopped and started lvm, not just
started it)
Mount now fails so I start to check things out...

haze:~# cat /proc/mdstat
Personalities : [raid1] [raid5]
md1 : active raid5 sdb1[1] sda1[2]
      390716672 blocks level 5, 64k chunk, algorithm 2 [3/2] [_UU]
haze:~# lvdisplay /dev/video_vg
  --- Logical volume ---
  LV Name                /dev/video_vg/video_lv
  VG Name                video_vg
  LV UUID                cg2yc1-dO6V-3GlY-JnhY-4nXi-V85s-WTIFml
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                372.61 GB
  Current LE             95389
  Segments               1
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0

haze:~# pvdisplay /dev/md1
  --- Physical volume ---
  PV Name               /dev/md1
  VG Name               video_vg
  PV Size               372.61 GB / not usable 0
  Allocatable           yes (but full)
  PE Size (KByte)       4096
  Total PE              95389
  Free PE               0
  Allocated PE          95389
  PV UUID               IUig5k-460l-sMZc-23Iz-MMFl-Cfh9-XuBMiq

haze:~# vgdisplay video_vg
  --- Volume group ---
  VG Name               video_vg
  System ID
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  19
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                1
  Open LV               0
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               372.61 GB
  PE Size               4.00 MB
  Total PE              95389
  Alloc PE / Size       95389 / 372.61 GB
  Free  PE / Size       0 / 0
  VG UUID               I2gW2x-aHcC-kqzs-Efpd-Q7TE-dkWf-KpHSO7

haze:~# vgck -v video_vg
    Using volume group(s) on command line
    Finding volume group "video_vg"

haze:~# lvscan
  ACTIVE            '/dev/video_vg/video_lv' [372.61 GB] inherit


haze:~# dd if=/dev/md1 of=/dev/null count=5
5+0 records in
5+0 records out
2560 bytes transferred in 0.000260 seconds (9849982 bytes/sec)

haze:~# dd if=/dev/video_vg/video_lv of=/dev/null count=500
0+0 records in
0+0 records out
0 bytes transferred in 0.000211 seconds (0 bytes/sec)

eek!

haze:~# /etc/init.d/lvm stop
Shutting down LVM Volume Groups...
  0 logical volume(s) in volume group "video_vg" now active
haze:~# lvdisplay
  --- Logical volume ---
  LV Name                /dev/video_vg/video_lv
  VG Name                video_vg
  LV UUID                cg2yc1-dO6V-3GlY-JnhY-4nXi-V85s-WTIFml
  LV Write Access        read/write
  LV Status              NOT available
  LV Size                372.61 GB
  Current LE             95389
  Segments               1
  Allocation             inherit
  Read ahead sectors     0

haze:~# /etc/init.d/lvm start
Setting up LVM Volume Groups...
  Reading all physical volumes.  This may take a while...
  /dev/hda: open failed: No medium found
  Found volume group "video_vg" using metadata type lvm2
  /dev/video_vg: opendir failed: No such file or directory
  1 logical volume(s) in volume group "video_vg" now active

(note: this *successful* start contains more error messages than the
failed start at the beginning)

haze:~# dd if=/dev/video_vg/video_lv of=/dev/null count=500
500+0 records in
500+0 records out
256000 bytes transferred in 0.007378 seconds (34697467 bytes/sec)

haze:~# mount /scratch
haze:~#

Success.

David

-- 


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-02-26  9:44 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-25 21:16 testing 2.6.14-rc4: unmountable fs after xfs_force_shutdown David Greaves
2006-02-26  1:23 ` Eric Sandeen
2006-02-26  9:44   ` Odd LVM behaviour - was " David Greaves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).