linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.22-rc4 XFS fails after hibernate/resume
@ 2007-06-16 19:56 David Greaves
  2007-06-16 22:29 ` [linux-lvm] " David Robinson
  2007-06-16 22:47 ` Rafael J. Wysocki
  0 siblings, 2 replies; 34+ messages in thread
From: David Greaves @ 2007-06-16 19:56 UTC (permalink / raw)
  To: 'linux-kernel@vger.kernel.org',
	xfs, LVM general discussion and development
  Cc: linux-pm

This isn't a regression.

I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited to try it).
I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - no.

Note this is a different (desktop) machine to that involved my recent bugs.

The machine will work for days (continually powered up) without a problem and 
then exhibits a filesystem failure within minutes of a resume.

I know xfs/raid are OK with hibernate. Is lvm?

The root filesystem is xfs on raid1 and that doesn't seem to have any problems.

System info:

/dev/mapper/video_vg-video_lv on /scratch type xfs (rw)

haze:~# vgdisplay
   --- Volume group ---
   VG Name               video_vg
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  19
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                1
   Open LV               1
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               372.61 GB
   PE Size               4.00 MB
   Total PE              95389
   Alloc PE / Size       95389 / 372.61 GB
   Free  PE / Size       0 / 0
   VG UUID               I2gW2x-aHcC-kqzs-Efpd-Q7TE-dkWf-KpHSO7

haze:~# pvdisplay
   --- Physical volume ---
   PV Name               /dev/md1
   VG Name               video_vg
   PV Size               372.62 GB / not usable 3.25 MB
   Allocatable           yes (but full)
   PE Size (KByte)       4096
   Total PE              95389
   Free PE               0
   Allocated PE          95389
   PV UUID               IUig5k-460l-sMZc-23Iz-MMFl-Cfh9-XuBMiq

md1 : active raid5 sdd1[0] sda1[2] sdc1[1]
       390716672 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]



00:00.0 Host bridge: VIA Technologies, Inc. VT8377 [KT400/KT600 AGP] Host Bridge 
(rev 80)
00:01.0 PCI bridge: VIA Technologies, Inc. VT8237 PCI Bridge
00:0a.0 Mass storage controller: Silicon Image, Inc. SiI 3112 
[SATALink/SATARaid] Serial ATA Controller (rev 02)
00:0b.0 Ethernet controller: Marvell Technology Group Ltd. 88E8001 Gigabit 
Ethernet Controller (rev 12)
00:0f.0 RAID bus controller: VIA Technologies, Inc. VIA VT6420 SATA RAID 
Controller (rev 80)
00:0f.1 IDE interface: VIA Technologies, Inc. 
VT82C586A/B/VT82C686/A/B/VT823x/A/C PIPC Bus Master IDE (rev 06)
00:10.0 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 81)
00:10.1 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 81)
00:10.2 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 81)
00:10.3 USB Controller: VIA Technologies, Inc. VT82xxxxx UHCI USB 1.1 Controller 
(rev 81)
00:10.4 USB Controller: VIA Technologies, Inc. USB 2.0 (rev 86)
00:11.0 ISA bridge: VIA Technologies, Inc. VT8237 ISA bridge 
[KT600/K8T800/K8T890 South]
00:11.5 Multimedia audio controller: VIA Technologies, Inc. VT8233/A/8235/8237 
AC97 Audio Controller (rev 60)
00:11.6 Communication controller: VIA Technologies, Inc. AC'97 Modem Controller 
(rev 80)
00:12.0 Ethernet controller: VIA Technologies, Inc. VT6102 [Rhine-II] (rev 78)
01:00.0 VGA compatible controller: ATI Technologies Inc RV280 [Radeon 9200 PRO] 
(rev 01)


tail end of info from dmesg:

k_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c0122a42>] __do_softirq+0x42/0x90
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b0d19>] xfs_alloc_ag_vextent_near+0x59/0xa30
  [<c01b177d>] xfs_alloc_ag_vextent+0x8d/0x100
  [<c01b1f93>] xfs_alloc_vextent+0x223/0x450
  [<c01bf7d0>] xfs_bmap_btalloc+0x400/0x770
  [<c01e183d>] xfs_iext_bno_to_ext+0x9d/0x1d0
  [<c01c483d>] xfs_bmapi+0x10bd/0x1490
  [<c01edace>] xlog_grant_log_space+0x22e/0x2b0
  [<c01edf60>] xfs_log_reserve+0xc0/0xe0
  [<c01e918f>] xfs_iomap_write_allocate+0x27f/0x4f0
  [<c0188861>] __block_prepare_write+0x421/0x490
  [<c01886b2>] __block_prepare_write+0x272/0x490
  [<c01e7c81>] xfs_iomap+0x391/0x4b0
  [<c020e3c0>] xfs_bmap+0x0/0x10
  [<c0207067>] xfs_map_blocks+0x47/0x90
  [<c020847c>] xfs_page_state_convert+0x3dc/0x7b0
  [<c01dff61>] xfs_ilock+0x71/0xa0
  [<c01dfed5>] xfs_iunlock+0x85/0x90
  [<c0208980>] xfs_vm_writepage+0x60/0xf0
  [<c014cd78>] __writepage+0x8/0x30
  [<c014d1bf>] write_cache_pages+0x1ff/0x320
  [<c014cd70>] __writepage+0x0/0x30
  [<c014d300>] generic_writepages+0x20/0x30
  [<c014d33b>] do_writepages+0x2b/0x50
  [<c0148592>] __filemap_fdatawrite_range+0x72/0x90
  [<c020b260>] xfs_file_fsync+0x0/0x80
  [<c0148893>] filemap_fdatawrite+0x23/0x30
  [<c018691e>] do_fsync+0x4e/0xb0
  [<c01869a5>] __do_fsync+0x25/0x40
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b27be
  [<c01cb73b>] xfs_btree_check_sblock+0x5b/0xd0
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b27be>] xfs_alloc_lookup+0x17e/0x390
  [<c01b000e>] xfs_free_ag_extent+0x2de/0x720
  [<c01b1d3b>] xfs_free_extent+0xbb/0xf0
  [<c01bc9b9>] xfs_bmap_finish+0x139/0x180
  [<c01c5540>] xfs_bunmapi+0x0/0xf80
  [<c01e320f>] xfs_itruncate_finish+0x26f/0x3f0
  [<c020447b>] xfs_inactive+0x48b/0x500
  [<c0210eb1>] xfs_fs_clear_inode+0x31/0x80
  [<c017a964>] clear_inode+0x54/0xf0
  [<c014f7b7>] truncate_inode_pages+0x17/0x20
  [<c017aad2>] generic_delete_inode+0xd2/0x100
  [<c017a11c>] iput+0x5c/0x70
  [<c01779e5>] d_kill+0x35/0x60
  [<c0177ab1>] dput+0xa1/0x150
  [<c0170ac8>] sys_renameat+0x1d8/0x200
  [<c0177a2c>] dput+0x1c/0x150
  [<c0167ab3>] __fput+0x113/0x180
  [<c017d053>] mntput_no_expire+0x13/0x90
  [<c0170b17>] sys_rename+0x27/0x30
  [<c0103fb4>] syscall_call+0x7/0xb
  =======================
xfs_force_shutdown(dm-0,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. 
Return address = 0xc02113cc
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down 
filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-16 19:56 2.6.22-rc4 XFS fails after hibernate/resume David Greaves
@ 2007-06-16 22:29 ` David Robinson
  2007-06-17 11:38   ` David Greaves
  2007-06-16 22:47 ` Rafael J. Wysocki
  1 sibling, 1 reply; 34+ messages in thread
From: David Robinson @ 2007-06-16 22:29 UTC (permalink / raw)
  To: LVM general discussion and development
  Cc: 'linux-kernel@vger.kernel.org', xfs, linux-pm

David Greaves wrote:
> This isn't a regression.
> 
> I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited to 
> try it).
> I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - no.
> 
> Note this is a different (desktop) machine to that involved my recent bugs.
> 
> The machine will work for days (continually powered up) without a 
> problem and then exhibits a filesystem failure within minutes of a resume.
> 
> I know xfs/raid are OK with hibernate. Is lvm?

I have LVM working with hibernate w/o any problems (w/ ext3). If there 
were a problem it wouldn't be with LVM but with device-mapper, and I 
doubt there's a problem with either. The stack trace shows that you're 
within XFS code (but it's likely its hibernate).

You can easily check whether its LVM/device-mapper:

1) check "dmsetup table" - it should be the same before hibernating and 
after resuming.

2) read directly from the LV - ie, "dd if=/dev/mapper/video_vg-video_lv 
of=/dev/null bs=10M count=200".

If dmsetup shows the same info and you can read directly from the LV I 
doubt it would be a LVM/device-mapper problem.

Cheers,
Dave

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-16 19:56 2.6.22-rc4 XFS fails after hibernate/resume David Greaves
  2007-06-16 22:29 ` [linux-lvm] " David Robinson
@ 2007-06-16 22:47 ` Rafael J. Wysocki
  2007-06-17 11:37   ` David Greaves
  1 sibling, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-06-16 22:47 UTC (permalink / raw)
  To: David Greaves
  Cc: 'linux-kernel@vger.kernel.org',
	xfs, LVM general discussion and development, linux-pm

On Saturday, 16 June 2007 21:56, David Greaves wrote:
> This isn't a regression.
> 
> I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited to try it).
> I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - no.
> 
> Note this is a different (desktop) machine to that involved my recent bugs.
> 
> The machine will work for days (continually powered up) without a problem and 
> then exhibits a filesystem failure within minutes of a resume.
> 
> I know xfs/raid are OK with hibernate. Is lvm?
> 
> The root filesystem is xfs on raid1 and that doesn't seem to have any problems.

What is the partition that's showing problems?  How's it set up, on how many
drives etc.?

Also, is the dmesg output below from right after the resume?

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-16 22:47 ` Rafael J. Wysocki
@ 2007-06-17 11:37   ` David Greaves
  0 siblings, 0 replies; 34+ messages in thread
From: David Greaves @ 2007-06-17 11:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: 'linux-kernel@vger.kernel.org',
	xfs, LVM general discussion and development, linux-pm

Rafael J. Wysocki wrote:
> On Saturday, 16 June 2007 21:56, David Greaves wrote:
>> This isn't a regression.
>>
>> I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited to try it).
>> I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - no.
>>
>> Note this is a different (desktop) machine to that involved my recent bugs.
>>
>> The machine will work for days (continually powered up) without a problem and 
>> then exhibits a filesystem failure within minutes of a resume.
>>
>> I know xfs/raid are OK with hibernate. Is lvm?
>>
>> The root filesystem is xfs on raid1 and that doesn't seem to have any problems.
> 
> What is the partition that's showing problems?  How's it set up, on how many
> drives etc.?
I did put that in the OP :)
Here's a recap...
/dev/mapper/video_vg-video_lv on /scratch type xfs (rw)

md1 : active raid5 sdd1[0] sda1[2] sdc1[1]
       390716672 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

haze:~# pvdisplay
   --- Physical volume ---
   PV Name               /dev/md1
   VG Name               video_vg
   PV Size               372.62 GB / not usable 3.25 MB
   Allocatable           yes (but full)
   PE Size (KByte)       4096
   Total PE              95389
   Free PE               0
   Allocated PE          95389
   PV UUID               IUig5k-460l-sMZc-23Iz-MMFl-Cfh9-XuBMiq

> 
> Also, is the dmesg output below from right after the resume?

It runs OK for a few minutes - just enough to think "hey, maybe it'll work this 
time".  Not more than an hour of normal use.
Then you notice when some app fails because the filesystem went away.
The dmesg comes from that point.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-16 22:29 ` [linux-lvm] " David Robinson
@ 2007-06-17 11:38   ` David Greaves
  2007-06-18  7:49     ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-17 11:38 UTC (permalink / raw)
  To: David Robinson
  Cc: LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

David Robinson wrote:
> David Greaves wrote:
>> This isn't a regression.
>>
>> I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited 
>> to try it).
>> I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - no.
>>
>> Note this is a different (desktop) machine to that involved my recent 
>> bugs.
>>
>> The machine will work for days (continually powered up) without a 
>> problem and then exhibits a filesystem failure within minutes of a 
>> resume.
>>
>> I know xfs/raid are OK with hibernate. Is lvm?
> 
> I have LVM working with hibernate w/o any problems (w/ ext3). If there 
> were a problem it wouldn't be with LVM but with device-mapper, and I 
> doubt there's a problem with either. The stack trace shows that you're 
> within XFS code (but it's likely its hibernate).

Thanks - that's good to know.
The suspicion arises because I have xfs on raid1 as root and have *never* had a 
problem with that filesystem. It's *always* xfs on lvm on raid5. I also have 
another system (previously discussed) that reliably hibernated xfs on raid6.

(Clearly raid5 is in my suspect list)

> You can easily check whether its LVM/device-mapper:
> 
> 1) check "dmsetup table" - it should be the same before hibernating and 
> after resuming.
> 
> 2) read directly from the LV - ie, "dd if=/dev/mapper/video_vg-video_lv 
> of=/dev/null bs=10M count=200".
> 
> If dmsetup shows the same info and you can read directly from the LV I 
> doubt it would be a LVM/device-mapper problem.

OK, that gave me an idea.

Freeze the filesystem
md5sum the lvm
hibernate
resume
md5sum the lvm

so:


haze:~# xfs_freeze -f /scratch/

Without this sync, the next two md5sums differed..
haze:~# sync
haze:~# dd if=/dev/video_vg/video_lv bs=10M count=200 | md5sum
200+0 records in
200+0 records out
2097152000 bytes (2.1 GB) copied, 41.2495 seconds, 50.8 MB/s
f42539366bb4269623fa4db14e8e8be2  -
haze:~# dd if=/dev/video_vg/video_lv bs=10M count=200 | md5sum
200+0 records in
200+0 records out
2097152000 bytes (2.1 GB) copied, 41.8111 seconds, 50.2 MB/s
f42539366bb4269623fa4db14e8e8be2  -


haze:~# echo platform > /sys/power/disk
haze:~# echo disk > /sys/power/state


haze:~# dd if=/dev/video_vg/video_lv bs=10M count=200 | md5sum
200+0 records in
200+0 records out
2097152000 bytes (2.1 GB) copied, 42.0478 seconds, 49.9 MB/s
f42539366bb4269623fa4db14e8e8be2  -
haze:~# xfs_freeze -u /scratch/

So the lvm and below looks OK...

I'll see how it behaves now the filesystem has been frozen/thawed over the 
hibernate...

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-17 11:38   ` David Greaves
@ 2007-06-18  7:49     ` David Greaves
  2007-06-18 14:50       ` David Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-18  7:49 UTC (permalink / raw)
  To: David Robinson
  Cc: LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

David Greaves wrote:
> David Robinson wrote:
>> David Greaves wrote:
>>> This isn't a regression.
>>>
>>> I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited 
>>> to try it).
>>> I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - 
>>> no.
>>>
>>> Note this is a different (desktop) machine to that involved my recent 
>>> bugs.
>>>
>>> The machine will work for days (continually powered up) without a 
>>> problem and then exhibits a filesystem failure within minutes of a 
>>> resume.

<snip>

> OK, that gave me an idea.
> 
> Freeze the filesystem
> md5sum the lvm
> hibernate
> resume
> md5sum the lvm
<snip>
> So the lvm and below looks OK...
> 
> I'll see how it behaves now the filesystem has been frozen/thawed over 
> the hibernate...


And it appears to behave well. (A few hours compile/clean cycling kernel builds 
on that filesystem were OK).


Historically I've done:
sync
echo platform > /sys/power/disk
echo disk > /sys/power/state
# resume

and had filesystem corruption (only on this machine, my other hibernating xfs 
machines don't have this problem)

So doing:
xfs_freeze -f /scratch
sync
echo platform > /sys/power/disk
echo disk > /sys/power/state
# resume
xfs_freeze -u /scratch

Works (for now - more usage testing tonight)

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-18  7:49     ` David Greaves
@ 2007-06-18 14:50       ` David Chinner
  2007-06-18 19:14         ` David Greaves
  2007-06-27 20:49         ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
  0 siblings, 2 replies; 34+ messages in thread
From: David Chinner @ 2007-06-18 14:50 UTC (permalink / raw)
  To: David Greaves
  Cc: David Robinson, LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
> David Greaves wrote:
> >OK, that gave me an idea.
> >
> >Freeze the filesystem
> >md5sum the lvm
> >hibernate
> >resume
> >md5sum the lvm
> <snip>
> >So the lvm and below looks OK...
> >
> >I'll see how it behaves now the filesystem has been frozen/thawed over 
> >the hibernate...
> 
> 
> And it appears to behave well. (A few hours compile/clean cycling kernel 
> builds on that filesystem were OK).
> 
> 
> Historically I've done:
> sync
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> # resume
> 
> and had filesystem corruption (only on this machine, my other hibernating 
> xfs machines don't have this problem)
> 
> So doing:
> xfs_freeze -f /scratch
> sync
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> # resume
> xfs_freeze -u /scratch
>
> Works (for now - more usage testing tonight)

Verrry interesting.

What you were seeing was an XFS shutdown occurring because the free space
btree was corrupted. IOWs, the process of suspend/resume has resulted
in either bad data being written to disk, the correct data not being
written to disk or the cached block being corrupted in memory.

If you run xfs_check on the filesystem after it has shut down after a resume,
can you tell us if it reports on-disk corruption? Note: do not run xfs_repair
to check this - it does not check the free space btrees; instead it simply
rebuilds them from scratch. If xfs_check reports an error, then run xfs_repair
to fix it up.

FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
filesystem for a suspend/resume to work safely and have argued that the only
safe thing to do is freeze the filesystem before suspend and thaw it after
resume. This is why I originally asked you to test that with the other problem
that you reported. Up until this point in time, there's been no evidence to
prove either side of the argument......

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-18 14:50       ` David Chinner
@ 2007-06-18 19:14         ` David Greaves
  2007-06-19  9:24           ` [linux-lvm] 2.6.22-rc5 " David Greaves
  2007-06-27 20:49         ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
  1 sibling, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-18 19:14 UTC (permalink / raw)
  To: David Chinner
  Cc: David Robinson, LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

OK, just an quick ack

When I resumed tonight (having done a freeze/thaw over the suspend) some libata 
errors threw up during the resume and there was an eventual hard hang. Maybe I 
spoke to soon?

I'm going to have to do some more testing...

David Chinner wrote:
> On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
>> David Greaves wrote:
>> So doing:
>> xfs_freeze -f /scratch
>> sync
>> echo platform > /sys/power/disk
>> echo disk > /sys/power/state
>> # resume
>> xfs_freeze -u /scratch
>>
>> Works (for now - more usage testing tonight)
> 
> Verrry interesting.
Good :)


> What you were seeing was an XFS shutdown occurring because the free space
> btree was corrupted. IOWs, the process of suspend/resume has resulted
> in either bad data being written to disk, the correct data not being
> written to disk or the cached block being corrupted in memory.
That's the kind of thing I was suspecting, yes.

> If you run xfs_check on the filesystem after it has shut down after a resume,
> can you tell us if it reports on-disk corruption? Note: do not run xfs_repair
> to check this - it does not check the free space btrees; instead it simply
> rebuilds them from scratch. If xfs_check reports an error, then run xfs_repair
> to fix it up.
OK, I can try this tonight...

> FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
> filesystem for a suspend/resume to work safely and have argued that the only
> safe thing to do is freeze the filesystem before suspend and thaw it after
> resume. This is why I originally asked you to test that with the other problem
> that you reported. Up until this point in time, there's been no evidence to
> prove either side of the argument......
> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-18 19:14         ` David Greaves
@ 2007-06-19  9:24           ` David Greaves
  2007-06-19  9:44             ` Tejun Heo
                               ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: David Greaves @ 2007-06-19  9:24 UTC (permalink / raw)
  To: David Chinner, Tejun Heo
  Cc: David Robinson, LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

David Greaves wrote:
> I'm going to have to do some more testing...
done


> David Chinner wrote:
>> On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
>>> David Greaves wrote:
>>> So doing:
>>> xfs_freeze -f /scratch
>>> sync
>>> echo platform > /sys/power/disk
>>> echo disk > /sys/power/state
>>> # resume
>>> xfs_freeze -u /scratch
>>>
>>> Works (for now - more usage testing tonight)
>>
>> Verrry interesting.
> Good :)
Now, not so good :)


>> What you were seeing was an XFS shutdown occurring because the free space
>> btree was corrupted. IOWs, the process of suspend/resume has resulted
>> in either bad data being written to disk, the correct data not being
>> written to disk or the cached block being corrupted in memory.
> That's the kind of thing I was suspecting, yes.
> 
>> If you run xfs_check on the filesystem after it has shut down after a 
>> resume,
>> can you tell us if it reports on-disk corruption? Note: do not run 
>> xfs_repair
>> to check this - it does not check the free space btrees; instead it 
>> simply
>> rebuilds them from scratch. If xfs_check reports an error, then run 
>> xfs_repair
>> to fix it up.
> OK, I can try this tonight...


This is on 2.6.22-rc5

So I hibernated last night and resumed this morning.
Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry Dave)

Here are some photos of the screen during resume. This is not 100% reproducable 
- it seems to occur only if the system is shutdown for 30mins or so.

Tejun, I wonder if error handling during resume is problematic? I got the same 
errors in 2.6.21. I have never seen these (or any other libata) errors other 
than during resume.

http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg
(hard to read, here's one from 2.6.21
http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg

I _think_ I've only seen the xfs problem when a resume shows these errors.


Ok, to try and cause a problem I ran a make and got this back at once:
make: stat: Makefile: Input/output error
make: stat: clean: Input/output error
make: *** No rule to make target `clean'.  Stop.
make: stat: GNUmakefile: Input/output error
make: stat: makefile: Input/output error


I caught the first dmesg this time:

Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file 
fs/xfs/xfs_btree.c.  Caller 0xc01b58e1
  [<c0104f6a>] show_trace_log_lvl+0x1a/0x30
  [<c0105c52>] show_trace+0x12/0x20
  [<c0105d15>] dump_stack+0x15/0x20
  [<c01daddf>] xfs_error_report+0x4f/0x60
  [<c01cd736>] xfs_btree_check_sblock+0x56/0xd0
  [<c01b58e1>] xfs_alloc_lookup+0x181/0x390
  [<c01b5b06>] xfs_alloc_lookup_le+0x16/0x20
  [<c01b30c1>] xfs_free_ag_extent+0x51/0x690
  [<c01b4ea4>] xfs_free_extent+0xa4/0xc0
  [<c01bf739>] xfs_bmap_finish+0x119/0x170
  [<c01e3f4a>] xfs_itruncate_finish+0x23a/0x3a0
  [<c02046a2>] xfs_inactive+0x482/0x500
  [<c0210ad4>] xfs_fs_clear_inode+0x34/0xa0
  [<c017d777>] clear_inode+0x57/0xe0
  [<c017d8e5>] generic_delete_inode+0xe5/0x110
  [<c017da77>] generic_drop_inode+0x167/0x1b0
  [<c017cedf>] iput+0x5f/0x70
  [<c01735cf>] do_unlinkat+0xdf/0x140
  [<c0173640>] sys_unlink+0x10/0x20
  [<c01040a4>] syscall_call+0x7/0xb
  =======================
xfs_force_shutdown(dm-0,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. 
Return address = 0xc021101e
Filesystem "dm-0": Corruption of in-memory data detected.  Shutting down 
filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

so I cd'ed out of /scratch and umounted.

I then tried the xfs_check.

haze:~# xfs_check /dev/video_vg/video_lv
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_check.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
haze:~# mount /scratch/
haze:~# umount /scratch/
haze:~# xfs_check /dev/video_vg/video_lv

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'xfs_db'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767bc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'syslogd'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767cc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

ugh. Try again
haze:~# xfs_check /dev/video_vg/video_lv
haze:~#

whilst running a top reported this as roughly the peak memory usage:
  8759 root      18   0  479m 474m  876 R  2.0 46.9   0:02.49 xfs_db
so it looks like it didn't run out of memory (machine has 1Gb).

Dave, I ran xfs_check -v... but I got bored when it reached 122M of bz2 
compressed output with no sign of stopping... still got it if it's any use...

lots of:
setting block 0/0 to sb
setting block 0/1 to freelist
setting block 0/2 to freelist
setting block 0/3 to freelist
setting block 0/4 to freelist
setting block 0/75 to btbno
setting block 0/346901 to free1
setting block 0/346903 to free1
setting block 0/346904 to free1
setting block 0/346905 to free1
   and stuff like this
inode 128 mode 040777 fmt extents afmt extents nex 1 anex 0 nblk 1 sz 4096
inode 128 nlink 39 is dir
inode 128 extent [0,7,1,0]

I then rebooted and ran a repair which didn't show any damage.

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-19  9:24           ` [linux-lvm] 2.6.22-rc5 " David Greaves
@ 2007-06-19  9:44             ` Tejun Heo
  2007-06-19 14:13               ` David Greaves
  2007-06-19 11:21             ` Rafael J. Wysocki
  2007-06-20  0:18             ` David Chinner
  2 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2007-06-19  9:44 UTC (permalink / raw)
  To: David Greaves
  Cc: David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

Hello,

David Greaves wrote:
>> Good :)
> Now, not so good :)

Oh, crap.  :-)

> So I hibernated last night and resumed this morning.
> Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry
> Dave)
> 
> Here are some photos of the screen during resume. This is not 100%
> reproducable - it seems to occur only if the system is shutdown for
> 30mins or so.
> 
> Tejun, I wonder if error handling during resume is problematic? I got
> the same errors in 2.6.21. I have never seen these (or any other libata)
> errors other than during resume.
> 
> http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg
> (hard to read, here's one from 2.6.21
> http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg

Your controller is repeatedly reporting PHY readiness changed exception.
 Are you reading the system image from the device attached to the first
SATA port?

> I _think_ I've only seen the xfs problem when a resume shows these errors.

The error handling itself tries very hard to ensure that there is no
data corruption in case of errors.  All commands which experience
exceptions are retried but if the drive itself is doing something
stupid, there's only so much the driver can do.

How reproducible is the problem?  Does the problem go away or occur more
often if you change the drive you write the memory image to?

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-19  9:24           ` [linux-lvm] 2.6.22-rc5 " David Greaves
  2007-06-19  9:44             ` Tejun Heo
@ 2007-06-19 11:21             ` Rafael J. Wysocki
  2007-06-19 15:31               ` David Greaves
  2007-06-20  0:18             ` David Chinner
  2 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-06-19 11:21 UTC (permalink / raw)
  To: David Greaves
  Cc: David Chinner, Tejun Heo, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

On Tuesday, 19 June 2007 11:24, David Greaves wrote:
> David Greaves wrote:
> > I'm going to have to do some more testing...
> done
> 
> 
> > David Chinner wrote:
> >> On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
> >>> David Greaves wrote:
> >>> So doing:
> >>> xfs_freeze -f /scratch
> >>> sync
> >>> echo platform > /sys/power/disk
> >>> echo disk > /sys/power/state
> >>> # resume
> >>> xfs_freeze -u /scratch
> >>>
> >>> Works (for now - more usage testing tonight)
> >>
> >> Verrry interesting.
> > Good :)
> Now, not so good :)
> 
> 
> >> What you were seeing was an XFS shutdown occurring because the free space
> >> btree was corrupted. IOWs, the process of suspend/resume has resulted
> >> in either bad data being written to disk, the correct data not being
> >> written to disk or the cached block being corrupted in memory.
> > That's the kind of thing I was suspecting, yes.
> > 
> >> If you run xfs_check on the filesystem after it has shut down after a 
> >> resume,
> >> can you tell us if it reports on-disk corruption? Note: do not run 
> >> xfs_repair
> >> to check this - it does not check the free space btrees; instead it 
> >> simply
> >> rebuilds them from scratch. If xfs_check reports an error, then run 
> >> xfs_repair
> >> to fix it up.
> > OK, I can try this tonight...
> 
> 
> This is on 2.6.22-rc5

Is the Tejun's patch

http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc5/patches/30-block-always-requeue-nonfs-requests-at-the-front.patch

applied on top of that?

Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-19  9:44             ` Tejun Heo
@ 2007-06-19 14:13               ` David Greaves
  2007-06-20  8:03                 ` Tejun Heo
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-19 14:13 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

Tejun Heo wrote:
> Hello,
again...

> David Greaves wrote:
>>> Good :)
>> Now, not so good :)
> 
> Oh, crap.  :-)
<grin>

>> So I hibernated last night and resumed this morning.
>> Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry
>> Dave)
>>
>> Here are some photos of the screen during resume. This is not 100%
>> reproducable - it seems to occur only if the system is shutdown for
>> 30mins or so.
>>
>> Tejun, I wonder if error handling during resume is problematic? I got
>> the same errors in 2.6.21. I have never seen these (or any other libata)
>> errors other than during resume.
>>
>> http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg
>> (hard to read, here's one from 2.6.21
>> http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg
> 
> Your controller is repeatedly reporting PHY readiness changed exception.
>  Are you reading the system image from the device attached to the first
> SATA port?

Yes if you mean 1st as in the one after the zero-th ...

resume=/dev/sdb4
haze:~# swapon -s
Filename                                Type            Size    Used    Priority
/dev/sdb4                               partition       1004020 0       -1

dmesg snippet below...

sda is part of the /scratch xfs array though. SMART doesn't show any problems 
and of course all is well other than during a resume.

sda/b are on sata_sil (a cheap plugin pci card)

> 
>> I _think_ I've only seen the xfs problem when a resume shows these errors.
> 
> The error handling itself tries very hard to ensure that there is no
> data corruption in case of errors.  All commands which experience
> exceptions are retried but if the drive itself is doing something
> stupid, there's only so much the driver can do.
> 
> How reproducible is the problem?  Does the problem go away or occur more
> often if you change the drive you write the memory image to?

I don't think there should be activity on the sda drive during resume itself.

[I broke my / md mirror and am using some of that for swap/resume for now]

I did change the swap/resume device to sdd2 (different controller, onboard 
sata_via) and there was no EH during resume. The system seemed OK, wrote a few 
Gb of video and did a kernel compile.
I repeated this test, no EH during resume, no problems.
I even ran xfs_fsr, the defragment utility, to stress the fs.

I retain this configuration and try again tonight but it looks like there _may_ 
be a link between EH during resume and my problems...

Of course, I don't understand why it *should* EH during resume, it doesn't 
during boot or normal operation...

Any more tests you'd like me to try?

David


dmesg snippet...

sata_sil 0000:00:0a.0: version 2.2
ACPI: PCI Interrupt 0000:00:0a.0[A] -> GSI 16 (level, low) -> IRQ 18
scsi0 : sata_sil
PM: Adding info for No Bus:host0
scsi1 : sata_sil
PM: Adding info for No Bus:host1
ata1: SATA max UDMA/100 cmd 0xf881e080 ctl 0xf881e08a bmdma 0xf881e000 irq 0
ata2: SATA max UDMA/100 cmd 0xf881e0c0 ctl 0xf881e0ca bmdma 0xf881e008 irq 0
ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata1.00: ATA-7: Maxtor 6B200M0, BANC1980, max UDMA/100
ata1.00: 390721968 sectors, multi 0: LBA48
ata1.00: configured for UDMA/100
ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ata2.00: ata_hpa_resize 1: sectors = 312581808, hpa_sectors = 312581808
ata2.00: ATA-6: ST3160023AS, 3.18, max UDMA/133
ata2.00: 312581808 sectors, multi 0: LBA48
ata2.00: ata_hpa_resize 1: sectors = 312581808, hpa_sectors = 312581808
ata2.00: configured for UDMA/100
PM: Adding info for No Bus:target0:0:0
scsi 0:0:0:0: Direct-Access     ATA      Maxtor 6B200M0   BANC PQ: 0 ANSI: 5
PM: Adding info for scsi:0:0:0:0
sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO 
or FUA
sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)
sd 0:0:0:0: [sda] Write Protect is off
sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO 
or FUA
  sda: sda1
sd 0:0:0:0: [sda] Attached SCSI disk
sd 0:0:0:0: Attached scsi generic sg0 type 0
PM: Adding info for No Bus:target1:0:0
scsi 1:0:0:0: Direct-Access     ATA      ST3160023AS      3.18 PQ: 0 ANSI: 5
PM: Adding info for scsi:1:0:0:0
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO 
or FUA
sd 1:0:0:0: [sdb] 312581808 512-byte hardware sectors (160042 MB)
sd 1:0:0:0: [sdb] Write Protect is off
sd 1:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO 
or FUA
  sdb: sdb1 sdb2 sdb3 sdb4
sd 1:0:0:0: [sdb] Attached SCSI disk
sd 1:0:0:0: Attached scsi generic sg1 type 0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-19 11:21             ` Rafael J. Wysocki
@ 2007-06-19 15:31               ` David Greaves
  0 siblings, 0 replies; 34+ messages in thread
From: David Greaves @ 2007-06-19 15:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Chinner, Tejun Heo, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

Rafael J. Wysocki wrote:
>> This is on 2.6.22-rc5
> 
> Is the Tejun's patch
> 
> http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.22-rc5/patches/30-block-always-requeue-nonfs-requests-at-the-front.patch
> 
> applied on top of that?

2.6.22-rc5 includes it.

(but, when I was testing rc4, I did apply this patch)

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-19  9:24           ` [linux-lvm] 2.6.22-rc5 " David Greaves
  2007-06-19  9:44             ` Tejun Heo
  2007-06-19 11:21             ` Rafael J. Wysocki
@ 2007-06-20  0:18             ` David Chinner
  2 siblings, 0 replies; 34+ messages in thread
From: David Chinner @ 2007-06-20  0:18 UTC (permalink / raw)
  To: David Greaves
  Cc: David Chinner, Tejun Heo, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

On Tue, Jun 19, 2007 at 10:24:23AM +0100, David Greaves wrote:
> David Greaves wrote:
> so I cd'ed out of /scratch and umounted.
> 
> I then tried the xfs_check.
> 
> haze:~# xfs_check /dev/video_vg/video_lv
> ERROR: The filesystem has valuable metadata changes in a log which needs to
> be replayed.  Mount the filesystem to replay the log, and unmount it before
> re-running xfs_check.  If you are unable to mount the filesystem, then use
> the xfs_repair -L option to destroy the log and attempt a repair.
> Note that destroying the log may cause corruption -- please attempt a mount
> of the filesystem before doing this.
> haze:~# mount /scratch/
> haze:~# umount /scratch/
> haze:~# xfs_check /dev/video_vg/video_lv
> 
> Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
> haze kernel: Bad page state in process 'xfs_db'

I think we can safely say that your system is hosed at this point ;)

> ugh. Try again
> haze:~# xfs_check /dev/video_vg/video_lv
> haze:~#

zero output means no on-disk corruption was found. Everything is
consistent on disk, so that seems to indicate something in memory has been
crispy fried by the suspend/resume....

> Dave, I ran xfs_check -v... but I got bored when it reached 122M of bz2 
> compressed output with no sign of stopping... still got it if it's any 
> use...

No, not useful. It's a log of every operation it does and so is really
only useful for debugging xfs-check problems ;)

> I then rebooted and ran a repair which didn't show any damage.

Not surprising as your first check showed no damage.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-19 14:13               ` David Greaves
@ 2007-06-20  8:03                 ` Tejun Heo
  2007-06-21 18:06                   ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2007-06-20  8:03 UTC (permalink / raw)
  To: David Greaves
  Cc: David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

David Greaves wrote:
> Tejun Heo wrote:
>> Your controller is repeatedly reporting PHY readiness changed exception.
>>  Are you reading the system image from the device attached to the first
>> SATA port?
> 
> Yes if you mean 1st as in the one after the zero-th ...

I meant the first first (0th).

>> How reproducible is the problem?  Does the problem go away or occur more
>> often if you change the drive you write the memory image to?
> 
> I don't think there should be activity on the sda drive during resume
> itself.
> 
> [I broke my / md mirror and am using some of that for swap/resume for now]
> 
> I did change the swap/resume device to sdd2 (different controller,
> onboard sata_via) and there was no EH during resume. The system seemed
> OK, wrote a few Gb of video and did a kernel compile.
> I repeated this test, no EH during resume, no problems.
> I even ran xfs_fsr, the defragment utility, to stress the fs.
> 
> I retain this configuration and try again tonight but it looks like
> there _may_ be a link between EH during resume and my problems...
> 
> Of course, I don't understand why it *should* EH during resume, it
> doesn't during boot or normal operation...

EH occurs during boot, suspend and resume all the time.  It just runs in
quiet mode to avoid disturbing the users too much.  In your case, EH is
kicking in due to actual exception conditions so it's being verbose to
give clue about what's going on.

It's really weird tho.  The PHY RDY status changed events are coming
from the device which is NOT used while resuming and it's before any
actual PM events are triggered.  Your kernel just boots, swsusp realizes
it's resuming and tries to read memory image from the swap device.
While reading, the disk controller raises consecutive PHY readiness
changed interrupts.  EH recovers them alright but the end result seems
to indicate that the loaded image is corrupt.

So, there's no device suspend/resume code involved at all.  The kernel
just booted and is trying to read data from the drive.  Please try with
only the first drive attached and see what happens.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-20  8:03                 ` Tejun Heo
@ 2007-06-21 18:06                   ` David Greaves
  2007-06-29  8:20                     ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-21 18:06 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

been away, back now...

Tejun Heo wrote:
> David Greaves wrote:
>> Tejun Heo wrote:
>>> How reproducible is the problem?  Does the problem go away or occur more
>>> often if you change the drive you write the memory image to?
>> I don't think there should be activity on the sda drive during resume
>> itself.
>>
>> [I broke my / md mirror and am using some of that for swap/resume for now]
>>
>> I did change the swap/resume device to sdd2 (different controller,
>> onboard sata_via) and there was no EH during resume. The system seemed
>> OK, wrote a few Gb of video and did a kernel compile.
>> I repeated this test, no EH during resume, no problems.
>> I even ran xfs_fsr, the defragment utility, to stress the fs.
>>
>> I retain this configuration and try again tonight but it looks like
>> there _may_ be a link between EH during resume and my problems...
Having retained this new configuration for a couple of days now I haven't had 
any problems.
This is good but not really ideal since / isn't mirrored anymore :(

>> Of course, I don't understand why it *should* EH during resume, it
>> doesn't during boot or normal operation...
> 
> EH occurs during boot, suspend and resume all the time.  It just runs in
> quiet mode to avoid disturbing the users too much.  In your case, EH is
> kicking in due to actual exception conditions so it's being verbose to
> give clue about what's going on.
I was trying to say that I don't actually see any errors being handled in normal 
operation.
I'm not sure if you are saying that these PHY RDY events are normally handled 
quietly (which would explain it).


> It's really weird tho.  The PHY RDY status changed events are coming
> from the device which is NOT used while resuming
yes - but the erroring device which is not being used is on the same controller 
as the device with the in-use resume partition.

> and it's before any
> actual PM events are triggered.  Your kernel just boots, swsusp realizes
> it's resuming and tries to read memory image from the swap device.
yes

> While reading, the disk controller raises consecutive PHY readiness
> changed interrupts.  EH recovers them alright but the end result seems
> to indicate that the loaded image is corrupt.
Yes, that's consistent with what I'm seeing.

When I move the swap/resume partition to a different controller (ie when I broke 
the / mirror and used the freed space) the problem seems to go away.

I am seeing messages in dmesg though:
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ata1.00: configured for UDMA/100
ata2.00: revalidation failed (errno=-2)
ata2: failed to recover some devices, retrying in 5 secs
sd 0:0:0:0: [sda] 390721968 512-byte hardware sectors (200050 MB)

sd 0:0:0:0: resuming
sd 0:0:0:0: [sda] Starting disk
ATA: abnormal status 0x7F on port 0x00019807
ATA: abnormal status 0x7F on port 0x00019007
ATA: abnormal status 0x7F on port 0x00019007
ATA: abnormal status 0x7F on port 0x00019807

ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ATA: abnormal status 0xD0 on port 0xf881e0c7
ata1.00: configured for UDMA/100
ata2.00: revalidation failed (errno=-2)
ata2: failed to recover some devices, retrying in 5 secs


> So, there's no device suspend/resume code involved at all.  The kernel
> just booted and is trying to read data from the drive.  Please try with
> only the first drive attached and see what happens.
That's kinda hard; swap and root are on different drives...

Does it help that although the errors above appear, the system seems OK when I 
just use the other controller?

I have to be cautious what I do with this machine as it's the wife's active 
desktop box <grin>.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-18 14:50       ` David Chinner
  2007-06-18 19:14         ` David Greaves
@ 2007-06-27 20:49         ` Pavel Machek
  2007-06-28 15:27           ` Rafael J. Wysocki
  2007-06-29  4:55           ` David Chinner
  1 sibling, 2 replies; 34+ messages in thread
From: Pavel Machek @ 2007-06-27 20:49 UTC (permalink / raw)
  To: David Chinner
  Cc: David Greaves, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

Hi!

> FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
> filesystem for a suspend/resume to work safely and have argued that the only

Hmm, so XFS writes to disk even when its threads are frozen?

> safe thing to do is freeze the filesystem before suspend and thaw it after
> resume. This is why I originally asked you to test that with the other problem

Could you add that to the XFS threads if it is really required? They
do know that they are being frozen for suspend.

							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-27 20:49         ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
@ 2007-06-28 15:27           ` Rafael J. Wysocki
  2007-06-28 22:00             ` [linux-pm] " Pavel Machek
  2007-06-29  4:55           ` David Chinner
  1 sibling, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-06-28 15:27 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Chinner, David Greaves, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

On Wednesday, 27 June 2007 22:49, Pavel Machek wrote:
> Hi!
> 
> > FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
> > filesystem for a suspend/resume to work safely and have argued that the only
> 
> Hmm, so XFS writes to disk even when its threads are frozen?
> 
> > safe thing to do is freeze the filesystem before suspend and thaw it after
> > resume. This is why I originally asked you to test that with the other problem
> 
> Could you add that to the XFS threads if it is really required? They
> do know that they are being frozen for suspend.

Well, do you remember the workqueues?  They are still nonfreezable.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-28 15:27           ` Rafael J. Wysocki
@ 2007-06-28 22:00             ` Pavel Machek
  2007-06-28 22:16               ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: Pavel Machek @ 2007-06-28 22:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Chinner, linux-pm, 'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, David Greaves

On Thu 2007-06-28 17:27:34, Rafael J. Wysocki wrote:
> On Wednesday, 27 June 2007 22:49, Pavel Machek wrote:
> > Hi!
> > 
> > > FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
> > > filesystem for a suspend/resume to work safely and have argued that the only
> > 
> > Hmm, so XFS writes to disk even when its threads are frozen?
> > 
> > > safe thing to do is freeze the filesystem before suspend and thaw it after
> > > resume. This is why I originally asked you to test that with the other problem
> > 
> > Could you add that to the XFS threads if it is really required? They
> > do know that they are being frozen for suspend.
> 
> Well, do you remember the workqueues?  They are still nonfreezable.

Oops, that would explain it :-(. Can we make XFS stop using them?
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-28 22:00             ` [linux-pm] " Pavel Machek
@ 2007-06-28 22:16               ` Rafael J. Wysocki
  2007-06-29  5:00                 ` David Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-06-28 22:16 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Chinner, linux-pm, 'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, David Greaves, Oleg Nesterov

On Friday, 29 June 2007 00:00, Pavel Machek wrote:
> On Thu 2007-06-28 17:27:34, Rafael J. Wysocki wrote:
> > On Wednesday, 27 June 2007 22:49, Pavel Machek wrote:
> > > Hi!
> > > 
> > > > FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
> > > > filesystem for a suspend/resume to work safely and have argued that the only
> > > 
> > > Hmm, so XFS writes to disk even when its threads are frozen?
> > > 
> > > > safe thing to do is freeze the filesystem before suspend and thaw it after
> > > > resume. This is why I originally asked you to test that with the other problem
> > > 
> > > Could you add that to the XFS threads if it is really required? They
> > > do know that they are being frozen for suspend.
> > 
> > Well, do you remember the workqueues?  They are still nonfreezable.
> 
> Oops, that would explain it :-(. Can we make XFS stop using them?

I'm afraid that we can't.

There are two solutions possible, IMO.  One would be to make these workqueues
freezable, which is possible, but hacky and Oleg didn't like that very much.
The second would be to freeze XFS from within the hibernation code path,
using freeze_bdev().

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-27 20:49         ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
  2007-06-28 15:27           ` Rafael J. Wysocki
@ 2007-06-29  4:55           ` David Chinner
  1 sibling, 0 replies; 34+ messages in thread
From: David Chinner @ 2007-06-29  4:55 UTC (permalink / raw)
  To: Pavel Machek
  Cc: David Chinner, David Greaves, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

On Wed, Jun 27, 2007 at 08:49:24PM +0000, Pavel Machek wrote:
> Hi!
> 
> > FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
> > filesystem for a suspend/resume to work safely and have argued that the only
> 
> Hmm, so XFS writes to disk even when its threads are frozen?

They issue async I/O before they sleep and expects
processing to be done on I/O completion via workqueues.

> > safe thing to do is freeze the filesystem before suspend and thaw it after
> > resume. This is why I originally asked you to test that with the other problem
> 
> Could you add that to the XFS threads if it is really required? They
> do know that they are being frozen for suspend.

We don't suspend the threads on a filesystem freeze - they continue
run. A filesystem freeze guarantees the filesystem clean and that
the in memory state matches what is on disk. It is not possible for
the filesytem to issue I/O or have outstanding I/O when it is in the
frozen state, so the state of the threads and/or workqueues does not
matter because they will be idle.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-28 22:16               ` Rafael J. Wysocki
@ 2007-06-29  5:00                 ` David Chinner
  2007-06-29  7:40                   ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: David Chinner @ 2007-06-29  5:00 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Pavel Machek, David Chinner, linux-pm,
	'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, David Greaves, Oleg Nesterov

On Fri, Jun 29, 2007 at 12:16:44AM +0200, Rafael J. Wysocki wrote:
> There are two solutions possible, IMO.  One would be to make these workqueues
> freezable, which is possible, but hacky and Oleg didn't like that very much.
> The second would be to freeze XFS from within the hibernation code path,
> using freeze_bdev().

The second is much more likely to work reliably. If freezing the
filesystem leaves something in an inconsistent state, then it's
something I can reproduce and debug without needing to
suspend/resume.

FWIW, don't forget you need to thaw the filesystem on resume.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-29  5:00                 ` David Chinner
@ 2007-06-29  7:40                   ` David Greaves
  2007-06-29  7:43                     ` David Chinner
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-29  7:40 UTC (permalink / raw)
  To: David Chinner
  Cc: Rafael J. Wysocki, Pavel Machek, linux-pm,
	'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, Oleg Nesterov

David Chinner wrote:
> On Fri, Jun 29, 2007 at 12:16:44AM +0200, Rafael J. Wysocki wrote:
>> There are two solutions possible, IMO.  One would be to make these workqueues
>> freezable, which is possible, but hacky and Oleg didn't like that very much.
>> The second would be to freeze XFS from within the hibernation code path,
>> using freeze_bdev().
> 
> The second is much more likely to work reliably. If freezing the
> filesystem leaves something in an inconsistent state, then it's
> something I can reproduce and debug without needing to
> suspend/resume.
> 
> FWIW, don't forget you need to thaw the filesystem on resume.

I've been a little distracted recently - sorry. I'll re-read the thread and see 
if there are any test actions I need to complete.

I do know that the corruption problems I've been having:
a) only happen after hibernate/resume
b) only ever happen on one of 2 XFS filesystems
c) happen even when the script does xfs_freeze;sync;hibernate;xfs_thaw

What happens if a filesystem is frozen and I hibernate?
Will it be thawed when I resume?

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-29  7:40                   ` David Greaves
@ 2007-06-29  7:43                     ` David Chinner
  2007-06-29  7:54                       ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: David Chinner @ 2007-06-29  7:43 UTC (permalink / raw)
  To: David Greaves
  Cc: David Chinner, Rafael J. Wysocki, Pavel Machek, linux-pm,
	'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, Oleg Nesterov

On Fri, Jun 29, 2007 at 08:40:00AM +0100, David Greaves wrote:
> What happens if a filesystem is frozen and I hibernate?
> Will it be thawed when I resume?

If you froze it yourself, then you'll have to thaw it yourself.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-29  7:43                     ` David Chinner
@ 2007-06-29  7:54                       ` David Greaves
  2007-06-29 13:18                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-29  7:54 UTC (permalink / raw)
  To: David Chinner
  Cc: Rafael J. Wysocki, Pavel Machek, linux-pm,
	'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, Oleg Nesterov

David Chinner wrote:
> On Fri, Jun 29, 2007 at 08:40:00AM +0100, David Greaves wrote:
>> What happens if a filesystem is frozen and I hibernate?
>> Will it be thawed when I resume?
> 
> If you froze it yourself, then you'll have to thaw it yourself.

So hibernate will not attempt to re-freeze a frozen fs and, during resume, it 
will only thaw filesystems that were frozen by the suspend?

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-21 18:06                   ` David Greaves
@ 2007-06-29  8:20                     ` David Greaves
  2007-07-02 10:56                       ` Tejun Heo
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-06-29  8:20 UTC (permalink / raw)
  To: Tejun Heo, David Chinner
  Cc: David Robinson, LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

David Greaves wrote:
> been away, back now...
again...

David Greaves wrote:
> When I move the swap/resume partition to a different controller (ie when 
> I broke the / mirror and used the freed space) the problem seems to go 
> away.
No, it's not gone away - but it's taking longer to show up.
I can try and put together a test loop that does work, hibernates, resumes and 
repeats but since I know it crashes at some point there doesn't seem much point 
unless I'm looking for something.
There's not much in the logs - is there any other instrumentation that people 
could suggest?
DaveC, given this is happening without (obvious) libata errors do you think it 
may be something in the XFS/md/hibernate area?

If there's anything to be tried then I'll also move to 2.6.22-rc6.


 > Tejun Heo wrote:
 >> It's really weird tho.  The PHY RDY status changed events are coming
 >> from the device which is NOT used while resuming

There is an obvious problem there though Tejun (the errors even when sda isn't 
involved in the OS boot) - can I start another thread about that issue/bug 
later? I need to reshuffle partitions so I'd rather get the hibernate working 
first and then go back to it if that's OK?

David


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-29  7:54                       ` David Greaves
@ 2007-06-29 13:18                         ` Rafael J. Wysocki
  2007-06-29 13:30                           ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-06-29 13:18 UTC (permalink / raw)
  To: David Greaves
  Cc: David Chinner, Pavel Machek, linux-pm,
	'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, Oleg Nesterov

On Friday, 29 June 2007 09:54, David Greaves wrote:
> David Chinner wrote:
> > On Fri, Jun 29, 2007 at 08:40:00AM +0100, David Greaves wrote:
> >> What happens if a filesystem is frozen and I hibernate?
> >> Will it be thawed when I resume?
> > 
> > If you froze it yourself, then you'll have to thaw it yourself.
> 
> So hibernate will not attempt to re-freeze a frozen fs and, during resume, it 
> will only thaw filesystems that were frozen by the suspend?

Right now it doesn't freeze (or thaw) any filesystems.  It just sync()s them
before creating the hibernation image.

However, the fact that you've seen corruption with the XFS filesystems frozen
before the hibernation indicates that the problem occurs on a lower level.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-pm] Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
  2007-06-29 13:18                         ` Rafael J. Wysocki
@ 2007-06-29 13:30                           ` David Greaves
  0 siblings, 0 replies; 34+ messages in thread
From: David Greaves @ 2007-06-29 13:30 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: David Chinner, Pavel Machek, linux-pm,
	'linux-kernel@vger.kernel.org',
	xfs, LinuxRaid, LVM general discussion and development,
	David Robinson, Oleg Nesterov

Rafael J. Wysocki wrote:
> On Friday, 29 June 2007 09:54, David Greaves wrote:
>> David Chinner wrote:
>>> On Fri, Jun 29, 2007 at 08:40:00AM +0100, David Greaves wrote:
>>>> What happens if a filesystem is frozen and I hibernate?
>>>> Will it be thawed when I resume?
>>> If you froze it yourself, then you'll have to thaw it yourself.
>> So hibernate will not attempt to re-freeze a frozen fs and, during resume, it 
>> will only thaw filesystems that were frozen by the suspend?
> 
> Right now it doesn't freeze (or thaw) any filesystems.  It just sync()s them
> before creating the hibernation image.
Thanks. Yes I realise that :)
I wasn't clear, I should have said:
So hibernate should not attempt to re-freeze a frozen fs and, during resume, it
should only thaw filesystems that were frozen by the suspend.


> However, the fact that you've seen corruption with the XFS filesystems frozen
> before the hibernation indicates that the problem occurs on a lower level.
And that was why I chimed in - I don't think freezing fixes the problem (though 
it may make sense for other reasons).

David



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-06-29  8:20                     ` David Greaves
@ 2007-07-02 10:56                       ` Tejun Heo
  2007-07-02 14:08                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: Tejun Heo @ 2007-07-02 10:56 UTC (permalink / raw)
  To: David Greaves
  Cc: David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid, Rafael J. Wysocki

David Greaves wrote:
>> Tejun Heo wrote:
>>> It's really weird tho.  The PHY RDY status changed events are coming
>>> from the device which is NOT used while resuming
> 
> There is an obvious problem there though Tejun (the errors even when sda
> isn't involved in the OS boot) - can I start another thread about that
> issue/bug later? I need to reshuffle partitions so I'd rather get the
> hibernate working first and then go back to it if that's OK?

Yeah, sure.  The problem is that we don't know whether or how those two
are related.  It would be great if there's a way to verify memory image
read from hibernation is intact.  Rafael, any ideas?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-07-02 10:56                       ` Tejun Heo
@ 2007-07-02 14:08                         ` Rafael J. Wysocki
  2007-07-02 14:32                           ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-07-02 14:08 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Greaves, David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

On Monday, 2 July 2007 12:56, Tejun Heo wrote:
> David Greaves wrote:
> >> Tejun Heo wrote:
> >>> It's really weird tho.  The PHY RDY status changed events are coming
> >>> from the device which is NOT used while resuming
> > 
> > There is an obvious problem there though Tejun (the errors even when sda
> > isn't involved in the OS boot) - can I start another thread about that
> > issue/bug later? I need to reshuffle partitions so I'd rather get the
> > hibernate working first and then go back to it if that's OK?
> 
> Yeah, sure.  The problem is that we don't know whether or how those two
> are related.  It would be great if there's a way to verify memory image
> read from hibernation is intact.  Rafael, any ideas?

Well, s2disk has an option to compute an MD5 checksum of the image during
the hibernation and verify it while reading the image.  Still, s2disk/resume
aren't very easy to install  and configure ...

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-07-02 14:08                         ` Rafael J. Wysocki
@ 2007-07-02 14:32                           ` David Greaves
  2007-07-02 15:12                             ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-07-02 14:32 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Tejun Heo, David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

Rafael J. Wysocki wrote:
> On Monday, 2 July 2007 12:56, Tejun Heo wrote:
>> David Greaves wrote:
>>>> Tejun Heo wrote:
>>>>> It's really weird tho.  The PHY RDY status changed events are coming
>>>>> from the device which is NOT used while resuming
>>> There is an obvious problem there though Tejun (the errors even when sda
>>> isn't involved in the OS boot) - can I start another thread about that
>>> issue/bug later? I need to reshuffle partitions so I'd rather get the
>>> hibernate working first and then go back to it if that's OK?
>> Yeah, sure.  The problem is that we don't know whether or how those two
>> are related.  It would be great if there's a way to verify memory image
>> read from hibernation is intact.  Rafael, any ideas?
> 
> Well, s2disk has an option to compute an MD5 checksum of the image during
> the hibernation and verify it while reading the image.
(Assuming you mean the mainline version)

Sounds like a good think to try next...
Couldn't see anything on this in ../Documentation/power/*
How do I enable it?


>  Still, s2disk/resume
> aren't very easy to install  and configure ...

I have it working fine on 2 other machines now so that doesn't appear to be a 
problem.

David

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-07-02 14:32                           ` David Greaves
@ 2007-07-02 15:12                             ` Rafael J. Wysocki
  2007-07-02 16:36                               ` David Greaves
  0 siblings, 1 reply; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-07-02 15:12 UTC (permalink / raw)
  To: David Greaves
  Cc: Tejun Heo, David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

On Monday, 2 July 2007 16:32, David Greaves wrote:
> Rafael J. Wysocki wrote:
> > On Monday, 2 July 2007 12:56, Tejun Heo wrote:
> >> David Greaves wrote:
> >>>> Tejun Heo wrote:
> >>>>> It's really weird tho.  The PHY RDY status changed events are coming
> >>>>> from the device which is NOT used while resuming
> >>> There is an obvious problem there though Tejun (the errors even when sda
> >>> isn't involved in the OS boot) - can I start another thread about that
> >>> issue/bug later? I need to reshuffle partitions so I'd rather get the
> >>> hibernate working first and then go back to it if that's OK?
> >> Yeah, sure.  The problem is that we don't know whether or how those two
> >> are related.  It would be great if there's a way to verify memory image
> >> read from hibernation is intact.  Rafael, any ideas?
> > 
> > Well, s2disk has an option to compute an MD5 checksum of the image during
> > the hibernation and verify it while reading the image.
> (Assuming you mean the mainline version)
> 
> Sounds like a good think to try next...
> Couldn't see anything on this in ../Documentation/power/*
> How do I enable it?

Add 'compute checksum = y' to the s2disk's configuration file.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-07-02 15:12                             ` Rafael J. Wysocki
@ 2007-07-02 16:36                               ` David Greaves
  2007-07-02 20:15                                 ` Rafael J. Wysocki
  0 siblings, 1 reply; 34+ messages in thread
From: David Greaves @ 2007-07-02 16:36 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Tejun Heo, David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

Rafael J. Wysocki wrote:
> On Monday, 2 July 2007 16:32, David Greaves wrote:
>> Rafael J. Wysocki wrote:
>>> On Monday, 2 July 2007 12:56, Tejun Heo wrote:
>>>> David Greaves wrote:
>>>>>> Tejun Heo wrote:
>>>>>>> It's really weird tho.  The PHY RDY status changed events are coming
>>>>>>> from the device which is NOT used while resuming
>>>>> There is an obvious problem there though Tejun (the errors even when sda
>>>>> isn't involved in the OS boot) - can I start another thread about that
>>>>> issue/bug later? I need to reshuffle partitions so I'd rather get the
>>>>> hibernate working first and then go back to it if that's OK?
>>>> Yeah, sure.  The problem is that we don't know whether or how those two
>>>> are related.  It would be great if there's a way to verify memory image
>>>> read from hibernation is intact.  Rafael, any ideas?
>>> Well, s2disk has an option to compute an MD5 checksum of the image during
>>> the hibernation and verify it while reading the image.
>> (Assuming you mean the mainline version)
>>
>> Sounds like a good think to try next...
>> Couldn't see anything on this in ../Documentation/power/*
>> How do I enable it?
> 
> Add 'compute checksum = y' to the s2disk's configuration file.

Ah, right - that's uswsusp isn't it? Which isn't what I'm having problems with 
AFAIK?

My suspend procedure is:

xfs_freeze -f /scratch
sync
echo platform > /sys/power/disk
echo disk > /sys/power/state
xfs_freeze -u /scratch

Which should work (actually it should work without the sync/xfs_freeze too).

So to debug the problem I'd like to minimally extend this process rather than 
replace it with another approach.

I take it there isn't an 'echo y > /sys/power/do_image_checksum'?

David



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume
  2007-07-02 16:36                               ` David Greaves
@ 2007-07-02 20:15                                 ` Rafael J. Wysocki
  0 siblings, 0 replies; 34+ messages in thread
From: Rafael J. Wysocki @ 2007-07-02 20:15 UTC (permalink / raw)
  To: David Greaves
  Cc: Tejun Heo, David Chinner, David Robinson,
	LVM general discussion and development,
	'linux-kernel@vger.kernel.org',
	xfs, linux-pm, LinuxRaid

On Monday, 2 July 2007 18:36, David Greaves wrote:
> Rafael J. Wysocki wrote:
> > On Monday, 2 July 2007 16:32, David Greaves wrote:
> >> Rafael J. Wysocki wrote:
> >>> On Monday, 2 July 2007 12:56, Tejun Heo wrote:
> >>>> David Greaves wrote:
> >>>>>> Tejun Heo wrote:
> >>>>>>> It's really weird tho.  The PHY RDY status changed events are coming
> >>>>>>> from the device which is NOT used while resuming
> >>>>> There is an obvious problem there though Tejun (the errors even when sda
> >>>>> isn't involved in the OS boot) - can I start another thread about that
> >>>>> issue/bug later? I need to reshuffle partitions so I'd rather get the
> >>>>> hibernate working first and then go back to it if that's OK?
> >>>> Yeah, sure.  The problem is that we don't know whether or how those two
> >>>> are related.  It would be great if there's a way to verify memory image
> >>>> read from hibernation is intact.  Rafael, any ideas?
> >>> Well, s2disk has an option to compute an MD5 checksum of the image during
> >>> the hibernation and verify it while reading the image.
> >> (Assuming you mean the mainline version)
> >>
> >> Sounds like a good think to try next...
> >> Couldn't see anything on this in ../Documentation/power/*
> >> How do I enable it?
> > 
> > Add 'compute checksum = y' to the s2disk's configuration file.
> 
> Ah, right - that's uswsusp isn't it? Which isn't what I'm having problems with 
> AFAIK?
> 
> My suspend procedure is:
> 
> xfs_freeze -f /scratch
> sync
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> xfs_freeze -u /scratch
> 
> Which should work (actually it should work without the sync/xfs_freeze too).
> 
> So to debug the problem I'd like to minimally extend this process rather than 
> replace it with another approach.

Well, this is not entirely "another approach".  Only the saving of the image is
done differently, the rest is the same.

> I take it there isn't an 'echo y > /sys/power/do_image_checksum'?

No, there is not anything like that.

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2007-07-02 20:08 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-06-16 19:56 2.6.22-rc4 XFS fails after hibernate/resume David Greaves
2007-06-16 22:29 ` [linux-lvm] " David Robinson
2007-06-17 11:38   ` David Greaves
2007-06-18  7:49     ` David Greaves
2007-06-18 14:50       ` David Chinner
2007-06-18 19:14         ` David Greaves
2007-06-19  9:24           ` [linux-lvm] 2.6.22-rc5 " David Greaves
2007-06-19  9:44             ` Tejun Heo
2007-06-19 14:13               ` David Greaves
2007-06-20  8:03                 ` Tejun Heo
2007-06-21 18:06                   ` David Greaves
2007-06-29  8:20                     ` David Greaves
2007-07-02 10:56                       ` Tejun Heo
2007-07-02 14:08                         ` Rafael J. Wysocki
2007-07-02 14:32                           ` David Greaves
2007-07-02 15:12                             ` Rafael J. Wysocki
2007-07-02 16:36                               ` David Greaves
2007-07-02 20:15                                 ` Rafael J. Wysocki
2007-06-19 11:21             ` Rafael J. Wysocki
2007-06-19 15:31               ` David Greaves
2007-06-20  0:18             ` David Chinner
2007-06-27 20:49         ` [linux-lvm] 2.6.22-rc4 " Pavel Machek
2007-06-28 15:27           ` Rafael J. Wysocki
2007-06-28 22:00             ` [linux-pm] " Pavel Machek
2007-06-28 22:16               ` Rafael J. Wysocki
2007-06-29  5:00                 ` David Chinner
2007-06-29  7:40                   ` David Greaves
2007-06-29  7:43                     ` David Chinner
2007-06-29  7:54                       ` David Greaves
2007-06-29 13:18                         ` Rafael J. Wysocki
2007-06-29 13:30                           ` David Greaves
2007-06-29  4:55           ` David Chinner
2007-06-16 22:47 ` Rafael J. Wysocki
2007-06-17 11:37   ` David Greaves

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).