All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: filesystem shrinks after using xfs_repair
@ 2010-07-12  1:10 Eli Morris
  2010-07-12  2:24 ` Stan Hoeppner
  2010-07-12 11:47 ` Emmanuel Florac
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-12  1:10 UTC (permalink / raw)
  To: xfs

Hi guys,

Here are some of the log files from my XFS problem. Yes, I think this all started with a hardware failure of some sort. My storage is RAID 6, a an Astra SecureStor ES.


[root@nimbus log]# more messages.1 | grep I/O
Jul  2 17:02:30 nimbus kernel: sd 6:0:0:0: rejecting I/O to offline device
Jul  2 17:02:30 nimbus kernel: sd 6:0:0:0: rejecting I/O to offline device
Jul  2 17:02:30 nimbus kernel: sr 5:0:0:0: rejecting I/O to offline device
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687805082
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687826610
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687827634
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687828658
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978787
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978788
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978789
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978790
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978791
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978792
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978793
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978794
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978795
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: Buffer I/O error on device dm-0, logical block 2835978796
Jul  3 12:41:41 nimbus kernel: lost page write due to I/O error on dm-0
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687814314
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687815338
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687816362
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687817386

Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372371106
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372372130
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372373154
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372374178
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471976114
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471977138
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471978162
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471979186

Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471987386
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471988410
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471989434
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471990458
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471991482
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980261922
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262050
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372375202
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262050
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372376226
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687847114
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687848138
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262114
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980261986
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687849162
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687850186
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687851210
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262434
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687852234
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471991490
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262434
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687853258
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262178
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687854282
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687855306
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980261922
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262498
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262514
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262498
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262242
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262562
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262306
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262594
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262658
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262370
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262722
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262786
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262850
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687798938
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262914
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262978
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687797914
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263042
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687797906
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263106
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263170
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687796882
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263234
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263298
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687795858
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263362
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263426
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687794834
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980263490
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687793810
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687792786
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687791762
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471975082
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471974058
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687790738
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687789714
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471973034
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471992514
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471993538
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471994562
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471995586
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471996610
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471997634
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471998658
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471999682
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471972010
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471970986
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471969962
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471968938
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471967914
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471966890
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471966882
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372343426
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471999690
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471965858
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472000714
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472003786
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472001738
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472002762
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472004810
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471964834
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472005834
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472006858
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12472007882
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471963810
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471962786
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372342402
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471961762
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471960738
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471959714
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdd, sector 12471958690
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372330106
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687855314
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687856338
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687857362
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687858386
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687859410
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687860434
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372277834
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687861458
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687862482
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 22687863506
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372327026
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372324978
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372328058
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372376234
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372377258
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372378282
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372379306
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372380330
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372381354
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372382378
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372383402
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372384426
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372320882
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372327034
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372326002
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372329082
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372323954
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372341378
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372340354
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372339330
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372338306
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372337282
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372336258
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372335234
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372335226
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372334202
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372333178
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372332154
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdc, sector 24372331130
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980261922
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262114
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262178
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262242
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262306
Jul  3 12:41:41 nimbus kernel: end_request: I/O error, dev sdg, sector 1980262370
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 22687804058
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 159634
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 159634
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 159634
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 22687803034
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 159634
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 159634
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 159634
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 159634
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 22687802010
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 22687800986
Jul  3 12:41:42 nimbus kernel: end_request: I/O error, dev sdc, sector 22687799962
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x26df0       ("xfs_trans_read_buf") error 5 buf count 8192
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 12:41:42 nimbus kernel: I/O error in filesystem ("dm-0") meta-data dev dm-0 block 0x32ee86931       ("xlog_iodone") error 5 buf count 24576
Jul  3 12:41:42 nimbus kernel: Filesystem "dm-0": Log I/O Error Detected.  Shutting down filesystem: dm-0
Jul  3 12:41:42 nimbus kernel: scsi 1:0:0:0: rejecting I/O to dead device
Jul  3 13:42:57 nimbus kernel: serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Jul  3 13:42:57 nimbus kernel: serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
Jul  3 13:42:57 nimbus kernel: 00:06: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
Jul  3 13:42:57 nimbus kernel: 00:07: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A
[root@nimbus log]# 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-12  1:10 filesystem shrinks after using xfs_repair Eli Morris
@ 2010-07-12  2:24 ` Stan Hoeppner
  2010-07-12 11:47 ` Emmanuel Florac
  1 sibling, 0 replies; 45+ messages in thread
From: Stan Hoeppner @ 2010-07-12  2:24 UTC (permalink / raw)
  To: xfs

Eli Morris put forth on 7/11/2010 8:10 PM:
> Hi guys,
> 
> Here are some of the log files from my XFS problem. Yes, I think this all started with a hardware failure of some sort. My storage is RAID 6, a an Astra SecureStor ES.
> 
> 
> [root@nimbus log]# more messages.1 | grep I/O
> Jul  2 17:02:30 nimbus kernel: sd 6:0:0:0: rejecting I/O to offline device
> Jul  2 17:02:30 nimbus kernel: sd 6:0:0:0: rejecting I/O to offline device
> Jul  2 17:02:30 nimbus kernel: sr 5:0:0:0: rejecting I/O to offline device

<snip>

What does the web gui log on the Astra ES tell you?

If the Astra supports syslogging (I assume it does as it is billed as
"enterprise class") you should configure that to facilitate consistent error
information gathering--i.e. grep everything from one terminal session.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-12  1:10 filesystem shrinks after using xfs_repair Eli Morris
  2010-07-12  2:24 ` Stan Hoeppner
@ 2010-07-12 11:47 ` Emmanuel Florac
  2010-07-23  8:30   ` Eli Morris
  1 sibling, 1 reply; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-12 11:47 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Sun, 11 Jul 2010 18:10:41 -0700
Eli Morris <ermorris@ucsc.edu> écrivait:

> Here are some of the log files from my XFS problem. Yes, I think this
> all started with a hardware failure of some sort. My storage is RAID
> 6, a an Astra SecureStor ES.
> 

There are IO errors on sdc, sdd and sdg. Aren't these jbods connected
through the same cable, for instance? You must correct the hardware
problems before attempting any repair or it will do more harm than good.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-12 11:47 ` Emmanuel Florac
@ 2010-07-23  8:30   ` Eli Morris
  2010-07-23 10:23     ` Emmanuel Florac
  2010-07-24  0:54     ` Dave Chinner
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-23  8:30 UTC (permalink / raw)
  To: xfs

On Jul 12, 2010, at 4:47 AM, Emmanuel Florac wrote:

> Le Sun, 11 Jul 2010 18:10:41 -0700
> Eli Morris <ermorris@ucsc.edu> écrivait:
> 
>> Here are some of the log files from my XFS problem. Yes, I think this
>> all started with a hardware failure of some sort. My storage is RAID
>> 6, a an Astra SecureStor ES.
>> 
> 
> There are IO errors on sdc, sdd and sdg. Aren't these jbods connected
> through the same cable, for instance? You must correct the hardware
> problems before attempting any repair or it will do more harm than good.
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------

Hi Emmanuel,

I think the raid tech support and me found and corrected the hardware problems associated with the RAID. I'm still having the same problem though. I expanded the filesystem to use the space of the now corrected RAID and that seems to work OK. I can write files to the new space OK. But then, if I run xfs_repair on the volume, the newly added space disappears and there are tons of error messages from xfs_repair (listed below)

thanks,

Eli

bad data fork in inode 2183564202
cleared inode 2183564202
inode 2183564204 - bad extent starting block number 18924700017, offset 0
.
.
.
entry "entries" in shortform directory 312244 references free inode 312298
junking entry "entries" in directory inode 312244
entry "README.txt" in shortform directory 312244 references free inode 312396
junking entry "README.txt" in directory inode 312244
entry "format" in shortform directory 312244 references free inode 312399
junking entry "format" in directory inode 312244
entry "config.h.in.svn-base" at block 0 offset 48 in directory inode 312245 references free inode 312246
	clearing inode number in entry at offset 48...
entry "configure.svn-base" at block 0 offset 80 in directory inode 312245 references free inode 312247
	clearing inode number in entry at offset 80...
entry "Makefile.in.svn-base" at block 0 offset 112 in directory inode 312245 references free inode 312251
	clearing inode number in entry at offset 112...
entry "configure.in.svn-base" at block 0 offset 144 in directory inode 312245 references free inode 312253
	clearing inode number in entry at offset 144...
entry "ChangeLog.svn-base" at block 0 offset 176 in directory inode 312245 references free inode 312254
	clearing inode number in entry at offset 176...
entry "0README.svn-base" at block 0 offset 208 in directory inode 312245 references free inode 312263
	clearing inode number in entry at offset 208...
entry "DOMAIN.INFO" in shortform directory 312400 references free inode 312574
junking entry "DOMAIN.INFO" in directory inode 312400
entry "DOMAIN.CTL" in shortform directory 312400 references free inode 312575
junking entry "DOMAIN.CTL" in directory inode 312400
entry "entries" in shortform directory 312562 references free inode 312570
junking entry "entries" in directory inode 312562
entry "README.txt" in shortform directory 312562 references free inode 312572
junking entry "README.txt" in directory inode 312562
entry "format" in shortform directory 312562 references free inode 312573
junking entry "format" in directory inode 312562
entry "IC_P1993010100.ctl.svn-base" in shortform directory 312563 references free inode 312564
junking entry "IC_P1993010100.ctl.svn-base" in directory inode 312563
entry "SIGMAtoP.f.svn-base" in shortform directory 312563 references free inode 312565
junking entry "SIGMAtoP.f.svn-base" in directory inode 312563
entry "IC_P1994070100.ctl.svn-base" in shortform directory 312563 references free inode 312566
junking entry "IC_P1994070100.ctl.svn-base" in directory inode 312563
entry "entries" in shortform directory 312643 references free inode 312739
junking entry "entries" in directory inode 312643
entry "README.txt" in shortform directory 312643 references free inode 312741
junking entry "README.txt" in directory inode 312643
entry "format" in shortform directory 312643 references free inode 312742
junking entry "format" in directory inode 312643
entry "cmap.param.svn-base" at block 0 offset 48 in directory inode 312644 references free inode 312645
	clearing inode number in entry at offset 48...
entry "cmap2000.param.svn-base" at block 0 offset 80 in directory inode 312644 references free inode 312646
	clearing inode number in entry at offset 80...
entry "cmap2001.param.svn-base" at block 0 offset 120 in directory inode 312644 references free inode 312647
	clearing inode number in entry at offset 120...
entry "cruPGI3.x.svn-base" at block 0 offset 160 in directory inode 312644 references free inode 312648
	clearing inode number in entry at offset 160...
entry "cmap2002.param.svn-base" at block 0 offset 192 in directory inode 312644 references free inode 312649
	clearing inode number in entry at offset 192...
entry "cruPGI5.x.svn-base" at block 0 offset 232 in directory inode 312644 references free inode 312650
	clearing inode number in entry at offset 232...
entry "CMAP2RCM.f.svn-base" at block 0 offset 264 in directory inode 312644 references free inode 312651
	clearing inode number in entry at offset 264...
entry "cmapIFC7.x.svn-base" at block 0 offset 296 in directory inode 312644 references free inode 312652
	clearing inode number in entry at offset 296...
 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-23  8:30   ` Eli Morris
@ 2010-07-23 10:23     ` Emmanuel Florac
  2010-07-23 16:36       ` Eli Morris
  2010-07-24  0:54     ` Dave Chinner
  1 sibling, 1 reply; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-23 10:23 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Fri, 23 Jul 2010 01:30:40 -0700
Eli Morris <ermorris@ucsc.edu> écrivait:

> I think the raid tech support and me found and corrected the hardware
> problems associated with the RAID. I'm still having the same problem
> though. I expanded the filesystem to use the space of the now
> corrected RAID and that seems to work OK. I can write files to the
> new space OK. But then, if I run xfs_repair on the volume, the newly
> added space disappears and there are tons of error messages from
> xfs_repair (listed below)

When you do the xfs_repair, did you check dmesg or /var/log/messages
immediately thereafter? Just to get sure there isn't any intervening
hard error.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-23 10:23     ` Emmanuel Florac
@ 2010-07-23 16:36       ` Eli Morris
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-23 16:36 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

On Jul 23, 2010, at 3:23 AM, Emmanuel Florac wrote:

> Le Fri, 23 Jul 2010 01:30:40 -0700
> Eli Morris <ermorris@ucsc.edu> écrivait:
> 
>> I think the raid tech support and me found and corrected the hardware
>> problems associated with the RAID. I'm still having the same problem
>> though. I expanded the filesystem to use the space of the now
>> corrected RAID and that seems to work OK. I can write files to the
>> new space OK. But then, if I run xfs_repair on the volume, the newly
>> added space disappears and there are tons of error messages from
>> xfs_repair (listed below)
> 
> When you do the xfs_repair, did you check dmesg or /var/log/messages
> immediately thereafter? Just to get sure there isn't any intervening
> hard error.
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------

Hi, 

No, I don't see any hardware errors after running xfs_repair.

thanks,

Eli

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-23  8:30   ` Eli Morris
  2010-07-23 10:23     ` Emmanuel Florac
@ 2010-07-24  0:54     ` Dave Chinner
  2010-07-24  1:08       ` Eli Morris
  1 sibling, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2010-07-24  0:54 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Fri, Jul 23, 2010 at 01:30:40AM -0700, Eli Morris wrote:
> On Jul 12, 2010, at 4:47 AM, Emmanuel Florac wrote:
> 
> > Le Sun, 11 Jul 2010 18:10:41 -0700
> > Eli Morris <ermorris@ucsc.edu> écrivait:
> > 
> >> Here are some of the log files from my XFS problem. Yes, I think this
> >> all started with a hardware failure of some sort. My storage is RAID
> >> 6, a an Astra SecureStor ES.
> >> 
> > 
> > There are IO errors on sdc, sdd and sdg. Aren't these jbods connected
> > through the same cable, for instance? You must correct the hardware
> > problems before attempting any repair or it will do more harm than good.
> > 
> > -- 
> > ------------------------------------------------------------------------
> > Emmanuel Florac     |   Direction technique
> >                    |   Intellique
> >                    |	<eflorac@intellique.com>
> >                    |   +33 1 78 94 84 02
> > ------------------------------------------------------------------------
> 
> Hi Emmanuel,
> 
> I think the raid tech support and me found and corrected the
> hardware problems associated with the RAID. I'm still having the
> same problem though. I expanded the filesystem to use the space of
> the now corrected RAID and that seems to work OK. I can write
> files to the new space OK. But then, if I run xfs_repair on the
> volume, the newly added space disappears and there are tons of
> error messages from xfs_repair (listed below).

Can you post the full output of the xfs_repair? The superblock is
the first thing that is checked and repaired, so if it is being
"repaired" to reduce the size of the volume then all the other errors
are just a result of that. e.g. the grow could be leaving stale
secndary superblocks around and repair is seeing a primary/secondary
mismatch and restoring the secondary which has the size parameter
prior to the grow....

Also, the output of 'cat /proc/partitions' would be interesting
from before the grow, after the grow (when everything is working),
and again after the xfs_repair when everything goes bad....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-24  0:54     ` Dave Chinner
@ 2010-07-24  1:08       ` Eli Morris
  2010-07-24  2:39         ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Morris @ 2010-07-24  1:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Jul 23, 2010, at 5:54 PM, Dave Chinner wrote:

> On Fri, Jul 23, 2010 at 01:30:40AM -0700, Eli Morris wrote:
>> On Jul 12, 2010, at 4:47 AM, Emmanuel Florac wrote:
>> 
>>> Le Sun, 11 Jul 2010 18:10:41 -0700
>>> Eli Morris <ermorris@ucsc.edu> écrivait:
>>> 
>>>> Here are some of the log files from my XFS problem. Yes, I think this
>>>> all started with a hardware failure of some sort. My storage is RAID
>>>> 6, a an Astra SecureStor ES.
>>>> 
>>> 
>>> There are IO errors on sdc, sdd and sdg. Aren't these jbods connected
>>> through the same cable, for instance? You must correct the hardware
>>> problems before attempting any repair or it will do more harm than good.
>>> 
>>> -- 
>>> ------------------------------------------------------------------------
>>> Emmanuel Florac     |   Direction technique
>>>                   |   Intellique
>>>                   |	<eflorac@intellique.com>
>>>                   |   +33 1 78 94 84 02
>>> ------------------------------------------------------------------------
>> 
>> Hi Emmanuel,
>> 
>> I think the raid tech support and me found and corrected the
>> hardware problems associated with the RAID. I'm still having the
>> same problem though. I expanded the filesystem to use the space of
>> the now corrected RAID and that seems to work OK. I can write
>> files to the new space OK. But then, if I run xfs_repair on the
>> volume, the newly added space disappears and there are tons of
>> error messages from xfs_repair (listed below).
> 
> Can you post the full output of the xfs_repair? The superblock is
> the first thing that is checked and repaired, so if it is being
> "repaired" to reduce the size of the volume then all the other errors
> are just a result of that. e.g. the grow could be leaving stale
> secndary superblocks around and repair is seeing a primary/secondary
> mismatch and restoring the secondary which has the size parameter
> prior to the grow....
> 
> Also, the output of 'cat /proc/partitions' would be interesting
> from before the grow, after the grow (when everything is working),
> and again after the xfs_repair when everything goes bad....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Hi Dave,

Thanks for replying. Here is the output I think you're looking for....

thanks!

Eli

The problem partition is an LVM2 volume:

/dev/mapper/vg1-vol5



[root@nimbus /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb2              24G  7.6G   15G  34% /
/dev/sda5             1.7T  1.3T  391G  77% /export
/dev/sda2             3.8G  1.5G  2.2G  40% /var
tmpfs                  16G     0   16G   0% /dev/shm
/dev/sdb1             995G  946G   19G  99% /storage
tmpfs                 7.7G  7.9M  7.7G   1% /var/lib/ganglia/rrds
/dev/mapper/vg1-vol5   51T   51T   90M 100% /export/vol5


[root@nimbus /]# cat /proc/partitions
major minor  #blocks  name

   8     0 1843200000 sda
   8     1    8193118 sda1
   8     2    4096575 sda2
   8     3    1020127 sda3
   8     4          1 sda4
   8     5 1829883793 sda5
   8    16 1084948480 sdb
   8    17 1059342133 sdb1
   8    18   25599577 sdb2
   8    32 13671872256 sdc
   8    33 13671872222 sdc1
   8    48 13668734464 sdd
   8    49 12695309918 sdd1
   8    64 13671872256 sde
   8    65 13671872222 sde1
   8    80 13671872256 sdf
   8    81 13671869225 sdf1
   8    96 12695309952 sdg
   8    97 12695309918 sdg1
 253     0 66406219776 dm-0
 
[root@nimbus /]# xfs_growfs /dev/mapper/vg1-vol5
meta-data=/dev/vg1/vol5          isize=256    agcount=126, agsize=106811488 blks
         =                       sectsz=512   attr=1
data     =                       bsize=4096   blocks=13427728384, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0
[root@nimbus /]# umount /export/vol5
[root@nimbus /]# mount /export/vol5
[root@nimbus /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb2              24G  7.6G   15G  34% /
/dev/sda5             1.7T  1.3T  391G  77% /export
/dev/sda2             3.8G  1.5G  2.2G  40% /var
tmpfs                  16G     0   16G   0% /dev/shm
/dev/sdb1             995G  946G   19G  99% /storage
tmpfs                 7.7G  7.9M  7.7G   1% /var/lib/ganglia/rrds
/dev/mapper/vg1-vol5   62T   51T   12T  81% /export/vol5

[root@nimbus /]# cat /proc/partitions
major minor  #blocks  name

   8     0 1843200000 sda
   8     1    8193118 sda1
   8     2    4096575 sda2
   8     3    1020127 sda3
   8     4          1 sda4
   8     5 1829883793 sda5
   8    16 1084948480 sdb
   8    17 1059342133 sdb1
   8    18   25599577 sdb2
   8    32 13671872256 sdc
   8    33 13671872222 sdc1
   8    48 13668734464 sdd
   8    49 12695309918 sdd1
   8    64 13671872256 sde
   8    65 13671872222 sde1
   8    80 13671872256 sdf
   8    81 13671869225 sdf1
   8    96 12695309952 sdg
   8    97 12695309918 sdg1
 253     0 66406219776 dm-0

[root@nimbus /]# xfs_repair /dev/mapper/vg1-vol5
Phase 1 - find and verify superblock...
writing modified primary superblock
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - agno = 37
        - agno = 38
        - agno = 39
        - agno = 40
        - agno = 41
        - agno = 42
        - agno = 43
        - agno = 44
        - agno = 45
        - agno = 46
        - agno = 47
        - agno = 48
        - agno = 49
        - agno = 50
        - agno = 51
        - agno = 52
        - agno = 53
        - agno = 54
        - agno = 55
        - agno = 56
        - agno = 57
        - agno = 58
        - agno = 59
        - agno = 60
        - agno = 61
        - agno = 62
        - agno = 63
        - agno = 64
        - agno = 65
        - agno = 66
        - agno = 67
        - agno = 68
        - agno = 69
        - agno = 70
        - agno = 71
        - agno = 72
        - agno = 73
        - agno = 74
        - agno = 75
        - agno = 76
        - agno = 77
        - agno = 78
        - agno = 79
        - agno = 80
        - agno = 81
        - agno = 82
        - agno = 83
        - agno = 84
        - agno = 85
        - agno = 86
        - agno = 87
        - agno = 88
        - agno = 89
        - agno = 90
        - agno = 91
        - agno = 92
        - agno = 93
        - agno = 94
        - agno = 95
        - agno = 96
        - agno = 97
        - agno = 98
        - agno = 99
        - agno = 100
        - agno = 101
        - agno = 102
        - agno = 103
        - agno = 104
        - agno = 105
        - agno = 106
        - agno = 107
        - agno = 108
        - agno = 109
        - agno = 110
        - agno = 111
        - agno = 112
        - agno = 113
        - agno = 114
        - agno = 115
        - agno = 116
        - agno = 117
        - agno = 118
        - agno = 119
        - agno = 120
        - agno = 121
        - agno = 122
        - agno = 123
        - agno = 124
        - agno = 125
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - agno = 16
        - agno = 17
        - agno = 18
        - agno = 19
        - agno = 20
        - agno = 21
        - agno = 22
        - agno = 23
        - agno = 24
        - agno = 25
        - agno = 26
        - agno = 27
        - agno = 28
        - agno = 29
        - agno = 30
        - agno = 31
        - agno = 32
        - agno = 33
        - agno = 34
        - agno = 35
        - agno = 36
        - agno = 38
        - agno = 40
        - agno = 37
        - agno = 42
        - agno = 43
        - agno = 44
        - agno = 45
        - agno = 46
        - agno = 47
        - agno = 11
        - agno = 49
        - agno = 51
        - agno = 52
        - agno = 53
        - agno = 54
        - agno = 55
        - agno = 56
        - agno = 58
        - agno = 59
        - agno = 48
        - agno = 60
        - agno = 61
        - agno = 41
        - agno = 64
        - agno = 65
        - agno = 66
        - agno = 67
        - agno = 68
        - agno = 69
        - agno = 70
        - agno = 71
        - agno = 72
        - agno = 73
        - agno = 74
        - agno = 75
        - agno = 76
        - agno = 77
        - agno = 78
        - agno = 79
        - agno = 80
        - agno = 81
        - agno = 82
        - agno = 83
        - agno = 84
        - agno = 85
        - agno = 86
        - agno = 87
        - agno = 88
        - agno = 89
        - agno = 90
        - agno = 91
        - agno = 92
        - agno = 93
        - agno = 94
        - agno = 95
        - agno = 96
        - agno = 97
        - agno = 98
        - agno = 99
        - agno = 100
        - agno = 101
        - agno = 102
        - agno = 103
        - agno = 104
        - agno = 105
        - agno = 106
        - agno = 107
        - agno = 108
        - agno = 109
        - agno = 110
        - agno = 111
        - agno = 112
        - agno = 113
        - agno = 114
        - agno = 115
        - agno = 116
        - agno = 117
        - agno = 118
        - agno = 119
        - agno = 120
        - agno = 121
        - agno = 122
        - agno = 123
        - agno = 124
        - agno = 125
        - agno = 62
        - agno = 63
        - agno = 39
        - agno = 57
        - agno = 50
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

[root@nimbus /]# mount /export/vol5
[root@nimbus /]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb2              24G  7.6G   15G  34% /
/dev/sda5             1.7T  1.3T  391G  77% /export
/dev/sda2             3.8G  1.5G  2.2G  40% /var
tmpfs                  16G     0   16G   0% /dev/shm
/dev/sdb1             995G  946G   19G  99% /storage
tmpfs                 7.7G  7.9M  7.7G   1% /var/lib/ganglia/rrds
/dev/mapper/vg1-vol5   51T   51T   90M 100% /export/vol5
[root@nimbus /]# cat /proc/partitions
major minor  #blocks  name

   8     0 1843200000 sda
   8     1    8193118 sda1
   8     2    4096575 sda2
   8     3    1020127 sda3
   8     4          1 sda4
   8     5 1829883793 sda5
   8    16 1084948480 sdb
   8    17 1059342133 sdb1
   8    18   25599577 sdb2
   8    32 13671872256 sdc
   8    33 13671872222 sdc1
   8    48 13668734464 sdd
   8    49 12695309918 sdd1
   8    64 13671872256 sde
   8    65 13671872222 sde1
   8    80 13671872256 sdf
   8    81 13671869225 sdf1
   8    96 12695309952 sdg
   8    97 12695309918 sdg1
 253     0 66406219776 dm-0




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-24  1:08       ` Eli Morris
@ 2010-07-24  2:39         ` Dave Chinner
  2010-07-26  3:20           ` Eli Morris
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2010-07-24  2:39 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Fri, Jul 23, 2010 at 06:08:08PM -0700, Eli Morris wrote:
> On Jul 23, 2010, at 5:54 PM, Dave Chinner wrote:
> > On Fri, Jul 23, 2010 at 01:30:40AM -0700, Eli Morris wrote:
> >> I think the raid tech support and me found and corrected the
> >> hardware problems associated with the RAID. I'm still having the
> >> same problem though. I expanded the filesystem to use the space of
> >> the now corrected RAID and that seems to work OK. I can write
> >> files to the new space OK. But then, if I run xfs_repair on the
> >> volume, the newly added space disappears and there are tons of
> >> error messages from xfs_repair (listed below).
> > 
> > Can you post the full output of the xfs_repair? The superblock is
> > the first thing that is checked and repaired, so if it is being
> > "repaired" to reduce the size of the volume then all the other errors
> > are just a result of that. e.g. the grow could be leaving stale
> > secndary superblocks around and repair is seeing a primary/secondary
> > mismatch and restoring the secondary which has the size parameter
> > prior to the grow....
> > 
> > Also, the output of 'cat /proc/partitions' would be interesting
> > from before the grow, after the grow (when everything is working),
> > and again after the xfs_repair when everything goes bad....
> 
> Thanks for replying. Here is the output I think you're looking for....

Sure is. The underlying device does not change configuration, and:

> [root@nimbus /]# xfs_repair /dev/mapper/vg1-vol5
> Phase 1 - find and verify superblock...
> writing modified primary superblock
> Phase 2 - using internal log

There's a smoking gun - the primary superblock was modified in some
way. Looks like the only way we can get this occurring without an
error or warning being emitted is if repair found more superblocks
with the old geometry in it them than the new geometry.

With a current kernel, growfs is supposed to update every single
secondary superblock, so I can't see how this could be occurring.
However, can you remind me what kernel your are running and gather
the following information?

Run this before the grow:

# echo 3 > /proc/sys/vm/drop-caches
# for ag in `seq 0 1 125`; do
> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" <device>
> done

Then run the grow, sync, and unmount the filesystem. After that,
re-run the above xfs_db command and post the output of both so I can
see what growfs is actually doing to the secondary superblocks?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-24  2:39         ` Dave Chinner
@ 2010-07-26  3:20           ` Eli Morris
  2010-07-26  3:45             ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Morris @ 2010-07-26  3:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Jul 23, 2010, at 7:39 PM, Dave Chinner wrote:

> On Fri, Jul 23, 2010 at 06:08:08PM -0700, Eli Morris wrote:
>> On Jul 23, 2010, at 5:54 PM, Dave Chinner wrote:
>>> On Fri, Jul 23, 2010 at 01:30:40AM -0700, Eli Morris wrote:
>>>> I think the raid tech support and me found and corrected the
>>>> hardware problems associated with the RAID. I'm still having the
>>>> same problem though. I expanded the filesystem to use the space of
>>>> the now corrected RAID and that seems to work OK. I can write
>>>> files to the new space OK. But then, if I run xfs_repair on the
>>>> volume, the newly added space disappears and there are tons of
>>>> error messages from xfs_repair (listed below).
>>> 
>>> Can you post the full output of the xfs_repair? The superblock is
>>> the first thing that is checked and repaired, so if it is being
>>> "repaired" to reduce the size of the volume then all the other errors
>>> are just a result of that. e.g. the grow could be leaving stale
>>> secndary superblocks around and repair is seeing a primary/secondary
>>> mismatch and restoring the secondary which has the size parameter
>>> prior to the grow....
>>> 
>>> Also, the output of 'cat /proc/partitions' would be interesting
>>> from before the grow, after the grow (when everything is working),
>>> and again after the xfs_repair when everything goes bad....
>> 
>> Thanks for replying. Here is the output I think you're looking for....
> 
> Sure is. The underlying device does not change configuration, and:
> 
>> [root@nimbus /]# xfs_repair /dev/mapper/vg1-vol5
>> Phase 1 - find and verify superblock...
>> writing modified primary superblock
>> Phase 2 - using internal log
> 
> There's a smoking gun - the primary superblock was modified in some
> way. Looks like the only way we can get this occurring without an
> error or warning being emitted is if repair found more superblocks
> with the old geometry in it them than the new geometry.
> 
> With a current kernel, growfs is supposed to update every single
> secondary superblock, so I can't see how this could be occurring.
> However, can you remind me what kernel your are running and gather
> the following information?
> 
> Run this before the grow:
> 
> # echo 3 > /proc/sys/vm/drop-caches
> # for ag in `seq 0 1 125`; do
>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" <device>
>> done
> 
> Then run the grow, sync, and unmount the filesystem. After that,
> re-run the above xfs_db command and post the output of both so I can
> see what growfs is actually doing to the secondary superblocks?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


Hi Dave and everyone,

Here is the output from the commands you asked that I run....

Thanks so much for the help. I'm definitely not an expert on filesystems, so I really appreciate the help from you and everyone.

Eli

[root@nimbus ~]# uname -a
Linux nimbus.pmc.ucsc.edu 2.6.18-128.1.14.el5 #1 SMP Wed Jun 17 06:38:05 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux


[root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
[root@nimbus vm]# for ag in `seq 0 1 125`; do
> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> done
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
[root@nimbus vm]# 

[root@nimbus vm]# sync
[root@nimbus vm]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb2              24G  7.6G   15G  34% /
/dev/sda5             1.7T  1.3T  391G  77% /export
/dev/sda2             3.8G  1.4G  2.3G  37% /var
tmpfs                  16G     0   16G   0% /dev/shm
/dev/sdb1             995G  946G   19G  99% /storage
tmpfs                 7.7G  7.9M  7.7G   1% /var/lib/ganglia/rrds
/dev/mapper/vg1-vol5   51T   51T   90M 100% /export/vol5
[root@nimbus vm]# umount /export/vol5
[root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
[root@nimbus vm]# for ag in `seq 0 1 125`; do
> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> done
agcount = 156
dblocks = 16601554944
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
[root@nimbus vm]# 


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  3:20           ` Eli Morris
@ 2010-07-26  3:45             ` Dave Chinner
  2010-07-26  4:04               ` Eli Morris
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2010-07-26  3:45 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Sun, Jul 25, 2010 at 08:20:44PM -0700, Eli Morris wrote:
> On Jul 23, 2010, at 7:39 PM, Dave Chinner wrote:
> > On Fri, Jul 23, 2010 at 06:08:08PM -0700, Eli Morris wrote:
> >> On Jul 23, 2010, at 5:54 PM, Dave Chinner wrote:
> >>> On Fri, Jul 23, 2010 at 01:30:40AM -0700, Eli Morris wrote:
> >>>> I think the raid tech support and me found and corrected the
> >>>> hardware problems associated with the RAID. I'm still having the
> >>>> same problem though. I expanded the filesystem to use the space of
> >>>> the now corrected RAID and that seems to work OK. I can write
> >>>> files to the new space OK. But then, if I run xfs_repair on the
> >>>> volume, the newly added space disappears and there are tons of
> >>>> error messages from xfs_repair (listed below).
> >>> 
> >>> Can you post the full output of the xfs_repair? The superblock is
> >>> the first thing that is checked and repaired, so if it is being
> >>> "repaired" to reduce the size of the volume then all the other errors
> >>> are just a result of that. e.g. the grow could be leaving stale
> >>> secndary superblocks around and repair is seeing a primary/secondary
> >>> mismatch and restoring the secondary which has the size parameter
> >>> prior to the grow....
> >>> 
> >>> Also, the output of 'cat /proc/partitions' would be interesting
> >>> from before the grow, after the grow (when everything is working),
> >>> and again after the xfs_repair when everything goes bad....
> >> 
> >> Thanks for replying. Here is the output I think you're looking for....
> > 
> > Sure is. The underlying device does not change configuration, and:
> > 
> >> [root@nimbus /]# xfs_repair /dev/mapper/vg1-vol5
> >> Phase 1 - find and verify superblock...
> >> writing modified primary superblock
> >> Phase 2 - using internal log
> > 
> > There's a smoking gun - the primary superblock was modified in some
> > way. Looks like the only way we can get this occurring without an
> > error or warning being emitted is if repair found more superblocks
> > with the old geometry in it them than the new geometry.
> > 
> > With a current kernel, growfs is supposed to update every single
> > secondary superblock, so I can't see how this could be occurring.
> > However, can you remind me what kernel your are running and gather
> > the following information?
> > 
> > Run this before the grow:
> > 
> > # echo 3 > /proc/sys/vm/drop-caches
> > # for ag in `seq 0 1 125`; do
> >> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" <device>
> >> done
> > 
> > Then run the grow, sync, and unmount the filesystem. After that,
> > re-run the above xfs_db command and post the output of both so I can
> > see what growfs is actually doing to the secondary superblocks?
> 
> [root@nimbus ~]# uname -a
> Linux nimbus.pmc.ucsc.edu 2.6.18-128.1.14.el5 #1 SMP Wed Jun 17 06:38:05 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

Ok, so that's a relatively old RHEL or Centos version, right?

> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
> [root@nimbus vm]# for ag in `seq 0 1 125`; do
> > xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> > done
> agcount = 126
> dblocks = 13427728384
> agcount = 126
> dblocks = 13427728384
....

All nice and consistent before.

> [root@nimbus vm]# umount /export/vol5
> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
> [root@nimbus vm]# for ag in `seq 0 1 125`; do
> > xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> > done
> agcount = 156
> dblocks = 16601554944
> agcount = 126
> dblocks = 13427728384
> agcount = 126
> dblocks = 13427728384
.....

And after the grow only the primary superblock has the new size and
agcount, which is why repair is returning it back to the old size.
Can you dump the output after the grow for 155 AGs instead of 125
so we can see if the new secondary superblocks were written? (just
dumping `seq 125 1 155` will be fine.)

Also, the only way I can see this happening is that if there is an
IO error reading or writing the first secondary superblock. That
should leave a warning in dmesg - can you check to see if there's an
error of the form "error %d reading secondary superblock for ag %d"
or "write error %d updating secondary superblock for ag %d" in the
logs? I notice that if this happens, we log but don't return the
error, so the grow will look like it succeeded...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  3:45             ` Dave Chinner
@ 2010-07-26  4:04               ` Eli Morris
  2010-07-26  5:57                 ` Michael Monnerie
  2010-07-26  6:06                 ` Dave Chinner
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-26  4:04 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Jul 25, 2010, at 8:45 PM, Dave Chinner wrote:

> On Sun, Jul 25, 2010 at 08:20:44PM -0700, Eli Morris wrote:
>> On Jul 23, 2010, at 7:39 PM, Dave Chinner wrote:
>>> On Fri, Jul 23, 2010 at 06:08:08PM -0700, Eli Morris wrote:
>>>> On Jul 23, 2010, at 5:54 PM, Dave Chinner wrote:
>>>>> On Fri, Jul 23, 2010 at 01:30:40AM -0700, Eli Morris wrote:
>>>>>> I think the raid tech support and me found and corrected the
>>>>>> hardware problems associated with the RAID. I'm still having the
>>>>>> same problem though. I expanded the filesystem to use the space of
>>>>>> the now corrected RAID and that seems to work OK. I can write
>>>>>> files to the new space OK. But then, if I run xfs_repair on the
>>>>>> volume, the newly added space disappears and there are tons of
>>>>>> error messages from xfs_repair (listed below).
>>>>> 
>>>>> Can you post the full output of the xfs_repair? The superblock is
>>>>> the first thing that is checked and repaired, so if it is being
>>>>> "repaired" to reduce the size of the volume then all the other errors
>>>>> are just a result of that. e.g. the grow could be leaving stale
>>>>> secndary superblocks around and repair is seeing a primary/secondary
>>>>> mismatch and restoring the secondary which has the size parameter
>>>>> prior to the grow....
>>>>> 
>>>>> Also, the output of 'cat /proc/partitions' would be interesting
>>>>> from before the grow, after the grow (when everything is working),
>>>>> and again after the xfs_repair when everything goes bad....
>>>> 
>>>> Thanks for replying. Here is the output I think you're looking for....
>>> 
>>> Sure is. The underlying device does not change configuration, and:
>>> 
>>>> [root@nimbus /]# xfs_repair /dev/mapper/vg1-vol5
>>>> Phase 1 - find and verify superblock...
>>>> writing modified primary superblock
>>>> Phase 2 - using internal log
>>> 
>>> There's a smoking gun - the primary superblock was modified in some
>>> way. Looks like the only way we can get this occurring without an
>>> error or warning being emitted is if repair found more superblocks
>>> with the old geometry in it them than the new geometry.
>>> 
>>> With a current kernel, growfs is supposed to update every single
>>> secondary superblock, so I can't see how this could be occurring.
>>> However, can you remind me what kernel your are running and gather
>>> the following information?
>>> 
>>> Run this before the grow:
>>> 
>>> # echo 3 > /proc/sys/vm/drop-caches
>>> # for ag in `seq 0 1 125`; do
>>>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" <device>
>>>> done
>>> 
>>> Then run the grow, sync, and unmount the filesystem. After that,
>>> re-run the above xfs_db command and post the output of both so I can
>>> see what growfs is actually doing to the secondary superblocks?
>> 
>> [root@nimbus ~]# uname -a
>> Linux nimbus.pmc.ucsc.edu 2.6.18-128.1.14.el5 #1 SMP Wed Jun 17 06:38:05 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
> 
> Ok, so that's a relatively old RHEL or Centos version, right?
> 
>> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
>> [root@nimbus vm]# for ag in `seq 0 1 125`; do
>>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
>>> done
>> agcount = 126
>> dblocks = 13427728384
>> agcount = 126
>> dblocks = 13427728384
> ....
> 
> All nice and consistent before.
> 
>> [root@nimbus vm]# umount /export/vol5
>> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
>> [root@nimbus vm]# for ag in `seq 0 1 125`; do
>>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
>>> done
>> agcount = 156
>> dblocks = 16601554944
>> agcount = 126
>> dblocks = 13427728384
>> agcount = 126
>> dblocks = 13427728384
> .....
> 
> And after the grow only the primary superblock has the new size and
> agcount, which is why repair is returning it back to the old size.
> Can you dump the output after the grow for 155 AGs instead of 125
> so we can see if the new secondary superblocks were written? (just
> dumping `seq 125 1 155` will be fine.)
> 
> Also, the only way I can see this happening is that if there is an
> IO error reading or writing the first secondary superblock. That
> should leave a warning in dmesg - can you check to see if there's an
> error of the form "error %d reading secondary superblock for ag %d"
> or "write error %d updating secondary superblock for ag %d" in the
> logs? I notice that if this happens, we log but don't return the
> error, so the grow will look like it succeeded...
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Hi Dave, 

Here is the output---

thanks,

Eli

[root@nimbus log]# cat /etc/redhat-release
CentOS release 5.3 (Final)

[root@nimbus log]# grep error dmesg

[root@nimbus log]# grep superblock *

so, don't see anything there.

[root@nimbus log]# echo 3 > /proc/sys/vm/drop_caches
[root@nimbus log]#  for ag in `seq 125 1 155`; do
> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> done
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
agcount = 126
dblocks = 13427728384
[root@nimbus log]# 


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  4:04               ` Eli Morris
@ 2010-07-26  5:57                 ` Michael Monnerie
  2010-07-26  6:06                 ` Dave Chinner
  1 sibling, 0 replies; 45+ messages in thread
From: Michael Monnerie @ 2010-07-26  5:57 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 685 bytes --]

On Montag, 26. Juli 2010 Eli Morris wrote:
> [root@nimbus log]# grep error dmesg
> [root@nimbus log]# grep superblock *

Did you "dmesg >dmesg" before that? Just to be sure every dmesg output 
is really in that file. Maybe CentOS does this, I don't know, I just 
wanted to be sure you didn't make a mistake here.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  4:04               ` Eli Morris
  2010-07-26  5:57                 ` Michael Monnerie
@ 2010-07-26  6:06                 ` Dave Chinner
  2010-07-26  6:46                   ` Eli Morris
  1 sibling, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2010-07-26  6:06 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Sun, Jul 25, 2010 at 09:04:03PM -0700, Eli Morris wrote:
> On Jul 25, 2010, at 8:45 PM, Dave Chinner wrote:
> > On Sun, Jul 25, 2010 at 08:20:44PM -0700, Eli Morris wrote:
> >> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
> >> [root@nimbus vm]# for ag in `seq 0 1 125`; do
> >>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> >>> done
> >> agcount = 126
> >> dblocks = 13427728384
> >> agcount = 126
> >> dblocks = 13427728384
> > ....
> > 
> > All nice and consistent before.
> > 
> >> [root@nimbus vm]# umount /export/vol5
> >> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
> >> [root@nimbus vm]# for ag in `seq 0 1 125`; do
> >>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> >>> done
> >> agcount = 156
> >> dblocks = 16601554944
> >> agcount = 126
> >> dblocks = 13427728384
> >> agcount = 126
> >> dblocks = 13427728384
> > .....
> > 
> > And after the grow only the primary superblock has the new size and
> > agcount, which is why repair is returning it back to the old size.
> > Can you dump the output after the grow for 155 AGs instead of 125
> > so we can see if the new secondary superblocks were written? (just
> > dumping `seq 125 1 155` will be fine.)

Which shows:

> agcount = 126
> dblocks = 13427728384
> agcount = 126
> dblocks = 13427728384
....

Well, that's puzzling. The in-memory superblock is written to each
of the secondary superblocks, and that _should_ match the primary
superblock. The in-memory superblock is what is modified
during the growfs transaction and it is them synchronously written
to each secondary superblock. Without any I/O errors, I'm not sure
what is happening here.

Oh, I just noticed this from your previous mail:

> [root@nimbus vm]# df -h
> Filesystem            Size  Used Avail Use% Mounted on
.....
> /dev/mapper/vg1-vol5   51T   51T   90M 100% /export/vol5
                         ^^^^^^^^^^^^^^^

> [root@nimbus vm]# umount /export/vol5
> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
> [root@nimbus vm]# for ag in `seq 0 1 125`; do
> > xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
> > done
> agcount = 156
> dblocks = 16601554944
  ^^^^^^^^^^^^^^^^^^^^^

These don't match up - we've got the situation where the on-disk
value for the primary superblock has changed, but the in-memory
value has not appeared to change.

And I note from the original email I asked for data from that the
filesystem did not show up as 62TB until you unmounted and mounted
it again, which would have read the 62TB size from the primary
superblock on disk during mount. You do not need to unmount and
remount to see the new size. This leads me to believe you are
hitting one (or more) of the growfs overflow bugs that was fixed a
while back.

I've just confirmed that the problem does not exist at top-of-tree.
The following commands gives the right output, and the repair at the
end does not truncate the filesystem:

xfs_io -f -c "truncate $((13427728384 * 4096))" fsfile
mkfs.xfs -f -l size=128m,lazy-count=0 -d size=13427728384b,agcount=126,file,name=fsfile
xfs_io -f -c "truncate $((16601554944 * 4096))" fsfile
mount -o loop fsfile /mnt/scratch
xfs_growfs /mnt/scratch
xfs_info /mnt/scratch
umount /mnt/scratch
xfs_db -c "sb 0" -c "p agcount" -c "p dblocks" -f fsfile
xfs_db -c "sb 1" -c "p agcount" -c "p dblocks" -f fsfile
xfs_db -c "sb 127" -c "p agcount" -c "p dblocks" -f fsfile
xfs_repair -f fsfile

So rather than try to triage this any further, can you upgrade your
kernel/system to something more recent?

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  6:06                 ` Dave Chinner
@ 2010-07-26  6:46                   ` Eli Morris
  2010-07-26  8:40                     ` Michael Monnerie
                                       ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-26  6:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Jul 25, 2010, at 11:06 PM, Dave Chinner wrote:

> On Sun, Jul 25, 2010 at 09:04:03PM -0700, Eli Morris wrote:
>> On Jul 25, 2010, at 8:45 PM, Dave Chinner wrote:
>>> On Sun, Jul 25, 2010 at 08:20:44PM -0700, Eli Morris wrote:
>>>> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
>>>> [root@nimbus vm]# for ag in `seq 0 1 125`; do
>>>>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
>>>>> done
>>>> agcount = 126
>>>> dblocks = 13427728384
>>>> agcount = 126
>>>> dblocks = 13427728384
>>> ....
>>> 
>>> All nice and consistent before.
>>> 
>>>> [root@nimbus vm]# umount /export/vol5
>>>> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
>>>> [root@nimbus vm]# for ag in `seq 0 1 125`; do
>>>>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
>>>>> done
>>>> agcount = 156
>>>> dblocks = 16601554944
>>>> agcount = 126
>>>> dblocks = 13427728384
>>>> agcount = 126
>>>> dblocks = 13427728384
>>> .....
>>> 
>>> And after the grow only the primary superblock has the new size and
>>> agcount, which is why repair is returning it back to the old size.
>>> Can you dump the output after the grow for 155 AGs instead of 125
>>> so we can see if the new secondary superblocks were written? (just
>>> dumping `seq 125 1 155` will be fine.)
> 
> Which shows:
> 
>> agcount = 126
>> dblocks = 13427728384
>> agcount = 126
>> dblocks = 13427728384
> ....
> 
> Well, that's puzzling. The in-memory superblock is written to each
> of the secondary superblocks, and that _should_ match the primary
> superblock. The in-memory superblock is what is modified
> during the growfs transaction and it is them synchronously written
> to each secondary superblock. Without any I/O errors, I'm not sure
> what is happening here.
> 
> Oh, I just noticed this from your previous mail:
> 
>> [root@nimbus vm]# df -h
>> Filesystem            Size  Used Avail Use% Mounted on
> .....
>> /dev/mapper/vg1-vol5   51T   51T   90M 100% /export/vol5
>                         ^^^^^^^^^^^^^^^
> 
>> [root@nimbus vm]# umount /export/vol5
>> [root@nimbus vm]#  echo 3 > /proc/sys/vm/drop_caches
>> [root@nimbus vm]# for ag in `seq 0 1 125`; do
>>> xfs_db -r -c "sb $ag" -c "p agcount" -c "p dblocks" /dev/vg1/vol5
>>> done
>> agcount = 156
>> dblocks = 16601554944
>  ^^^^^^^^^^^^^^^^^^^^^
> 
> These don't match up - we've got the situation where the on-disk
> value for the primary superblock has changed, but the in-memory
> value has not appeared to change.
> 
> And I note from the original email I asked for data from that the
> filesystem did not show up as 62TB until you unmounted and mounted
> it again, which would have read the 62TB size from the primary
> superblock on disk during mount. You do not need to unmount and
> remount to see the new size. This leads me to believe you are
> hitting one (or more) of the growfs overflow bugs that was fixed a
> while back.
> 
> I've just confirmed that the problem does not exist at top-of-tree.
> The following commands gives the right output, and the repair at the
> end does not truncate the filesystem:
> 
> xfs_io -f -c "truncate $((13427728384 * 4096))" fsfile
> mkfs.xfs -f -l size=128m,lazy-count=0 -d size=13427728384b,agcount=126,file,name=fsfile
> xfs_io -f -c "truncate $((16601554944 * 4096))" fsfile
> mount -o loop fsfile /mnt/scratch
> xfs_growfs /mnt/scratch
> xfs_info /mnt/scratch
> umount /mnt/scratch
> xfs_db -c "sb 0" -c "p agcount" -c "p dblocks" -f fsfile
> xfs_db -c "sb 1" -c "p agcount" -c "p dblocks" -f fsfile
> xfs_db -c "sb 127" -c "p agcount" -c "p dblocks" -f fsfile
> xfs_repair -f fsfile
> 
> So rather than try to triage this any further, can you upgrade your
> kernel/system to something more recent?
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

Hi Dave,

I can update this to Centos 5 Update 4, but I can't install updates forward of it's release date of Dec 15, 2009. The reason is that this is the head node of a cluster and it uses the Rocks cluster distribution. The newest of Rocks is based on Centos 5 Update 4, but Rocks systems do not support updates (via yum, for example). 

Updating the OS takes me a day or two for the whole cluster and all the user programs. If you're pretty sure that will fix the problem, I'll go for it tomorrow. I'd appreciate it very much if you could let me know if Centos 5.4 is recent enough that it will fix the problem.

I will note that I've grown the filesystem several times, and while I recall having to unmount and remount the filesystem each time for it to report its new size, I've never seen it fall back to its old size when running xfs_repair. In fact, the original filesystem is about 12 TB, so xfs_repair only reverses the last grow and not the previous ones.

thanks again for your help,

Eli

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  6:46                   ` Eli Morris
@ 2010-07-26  8:40                     ` Michael Monnerie
  2010-07-26  9:49                     ` Emmanuel Florac
  2010-07-26 10:20                     ` Dave Chinner
  2 siblings, 0 replies; 45+ messages in thread
From: Michael Monnerie @ 2010-07-26  8:40 UTC (permalink / raw)
  To: xfs


[-- Attachment #1.1: Type: Text/Plain, Size: 712 bytes --]

On Montag, 26. Juli 2010 Eli Morris wrote:
>  let me know if Centos 5.4 is recent enough that it will fix the
>  problem.
 
If not, maybe just install the latest vanilla kernel, do the growfs, and 
when it works revert back to your working environment. As the problem 
only exists for your growsfs, that should be sufficient and less PITA.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  6:46                   ` Eli Morris
  2010-07-26  8:40                     ` Michael Monnerie
@ 2010-07-26  9:49                     ` Emmanuel Florac
  2010-07-26 17:22                       ` Eli Morris
  2010-07-26 10:20                     ` Dave Chinner
  2 siblings, 1 reply; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-26  9:49 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Sun, 25 Jul 2010 23:46:29 -0700
Eli Morris <ermorris@ucsc.edu> écrivait:

> Updating the OS takes me a day or two for the whole cluster and all
> the user programs. If you're pretty sure that will fix the problem,
> I'll go for it tomorrow. I'd appreciate it very much if you could let
> me know if Centos 5.4 is recent enough that it will fix the problem.
> 

Alternatively simply boot the system from a live CD with a recent
kernel to do the resize.

> I will note that I've grown the filesystem several times, and while I
> recall having to unmount and remount the filesystem each time for it
> to report its new size [...]

This definitely should not happen. After using xfs_growfs, the
filesystem appears immediately bigger.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  6:46                   ` Eli Morris
  2010-07-26  8:40                     ` Michael Monnerie
  2010-07-26  9:49                     ` Emmanuel Florac
@ 2010-07-26 10:20                     ` Dave Chinner
  2010-07-28  5:12                       ` Eli Morris
  2 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2010-07-26 10:20 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Sun, Jul 25, 2010 at 11:46:29PM -0700, Eli Morris wrote:
> On Jul 25, 2010, at 11:06 PM, Dave Chinner wrote:
> > On Sun, Jul 25, 2010 at 09:04:03PM -0700, Eli Morris wrote:
> >> On Jul 25, 2010, at 8:45 PM, Dave Chinner wrote:
> > I've just confirmed that the problem does not exist at top-of-tree.
> > The following commands gives the right output, and the repair at the
> > end does not truncate the filesystem:
> > 
> > xfs_io -f -c "truncate $((13427728384 * 4096))" fsfile
> > mkfs.xfs -f -l size=128m,lazy-count=0 -d size=13427728384b,agcount=126,file,name=fsfile
> > xfs_io -f -c "truncate $((16601554944 * 4096))" fsfile
> > mount -o loop fsfile /mnt/scratch
> > xfs_growfs /mnt/scratch
> > xfs_info /mnt/scratch
> > umount /mnt/scratch
> > xfs_db -c "sb 0" -c "p agcount" -c "p dblocks" -f fsfile
> > xfs_db -c "sb 1" -c "p agcount" -c "p dblocks" -f fsfile
> > xfs_db -c "sb 127" -c "p agcount" -c "p dblocks" -f fsfile
> > xfs_repair -f fsfile
> > 
> > So rather than try to triage this any further, can you upgrade your
> > kernel/system to something more recent?
> 
> I can update this to Centos 5 Update 4, but I can't install
> updates forward of it's release date of Dec 15, 2009. The reason
> is that this is the head node of a cluster and it uses the Rocks
> cluster distribution. The newest of Rocks is based on Centos 5
> Update 4, but Rocks systems do not support updates (via yum, for
> example). 
> 
> Updating the OS takes me a day or two for the whole cluster and
> all the user programs. If you're pretty sure that will fix the
> problem, I'll go for it tomorrow. I'd appreciate it very much if
> you could let me know if Centos 5.4 is recent enough that it will
> fix the problem..

The only way I can find out is to load CentOS 5.4 onto a
system and run the above test. You can probably do that just as
easily as I can...

> I will note that I've grown the filesystem several times, and
> while I recall having to unmount and remount the filesystem each
> time for it to report its new size, I've never seen it fall back
> to its old size when running xfs_repair. In fact, the original
> filesystem is about 12 TB, so xfs_repair only reverses the last
> grow and not the previous ones.

Hmmm - I can't recall any bug where unmount was required before
the new size would show up. I know we had problems with arithmetic
overflows in both the xfs_growfs binary and the kernel code, but
they did not manifest in this manner. Hence I can't really say why
you are seeing that behaviour or why this time it is different.

The suggestion of using a recent live CD to do the grow is a good
one - it might be your best option, rather than upgrading everything....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26  9:49                     ` Emmanuel Florac
@ 2010-07-26 17:22                       ` Eli Morris
  2010-07-26 18:33                         ` Stuart Rowan
  2010-07-26 21:06                         ` Emmanuel Florac
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-26 17:22 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs


On Jul 26, 2010, at 2:49 AM, Emmanuel Florac wrote:

> Le Sun, 25 Jul 2010 23:46:29 -0700
> Eli Morris <ermorris@ucsc.edu> écrivait:
> 
>> Updating the OS takes me a day or two for the whole cluster and all
>> the user programs. If you're pretty sure that will fix the problem,
>> I'll go for it tomorrow. I'd appreciate it very much if you could let
>> me know if Centos 5.4 is recent enough that it will fix the problem.
>> 
> 
> Alternatively simply boot the system from a live CD with a recent
> kernel to do the resize.
> 
>> I will note that I've grown the filesystem several times, and while I
>> recall having to unmount and remount the filesystem each time for it
>> to report its new size [...]
> 
> This definitely should not happen. After using xfs_growfs, the
> filesystem appears immediately bigger.
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------

Hi Emmanuel,

I think the Live CD is a great suggestion. I was just thinking though. do you know of a Live CD that has XFS utilities on it? From what I've seen, I've had to install XFS utilities after installing the OS, which I don't think one can do with a live CD. I can look, but if you know of one, that would probably save some digging.

thanks,

Eli

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26 17:22                       ` Eli Morris
@ 2010-07-26 18:33                         ` Stuart Rowan
  2010-07-26 21:06                         ` Emmanuel Florac
  1 sibling, 0 replies; 45+ messages in thread
From: Stuart Rowan @ 2010-07-26 18:33 UTC (permalink / raw)
  To: xfs

How about:
http://www.sysresccd.org/

This has xfsprogs 3.1.2 and kernels 2.6.32.16 and 2.6.34.1 in version 
1.5.8 of the CD according to the changelog.

Cheers,
Stu.

On 26/07/2010 18:22, Eli Morris wrote:
>
> On Jul 26, 2010, at 2:49 AM, Emmanuel Florac wrote:
>
>> Le Sun, 25 Jul 2010 23:46:29 -0700
>> Eli Morris<ermorris@ucsc.edu>  écrivait:
>>
>>> Updating the OS takes me a day or two for the whole cluster and all
>>> the user programs. If you're pretty sure that will fix the problem,
>>> I'll go for it tomorrow. I'd appreciate it very much if you could let
>>> me know if Centos 5.4 is recent enough that it will fix the problem.
>>>
>>
>> Alternatively simply boot the system from a live CD with a recent
>> kernel to do the resize.
>>
>>> I will note that I've grown the filesystem several times, and while I
>>> recall having to unmount and remount the filesystem each time for it
>>> to report its new size [...]
>>
>> This definitely should not happen. After using xfs_growfs, the
>> filesystem appears immediately bigger.
>>
>> --
>> ------------------------------------------------------------------------
>> Emmanuel Florac     |   Direction technique
>>                     |   Intellique
>>                     |	<eflorac@intellique.com>
>>                     |   +33 1 78 94 84 02
>> ------------------------------------------------------------------------
>
> Hi Emmanuel,
>
> I think the Live CD is a great suggestion. I was just thinking though. do you know of a Live CD that has XFS utilities on it? From what I've seen, I've had to install XFS utilities after installing the OS, which I don't think one can do with a live CD. I can look, but if you know of one, that would probably save some digging.
>
> thanks,
>
> Eli
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26 17:22                       ` Eli Morris
  2010-07-26 18:33                         ` Stuart Rowan
@ 2010-07-26 21:06                         ` Emmanuel Florac
  2010-07-27  5:02                           ` Eli Morris
  1 sibling, 1 reply; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-26 21:06 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Mon, 26 Jul 2010 10:22:11 -0700 vous écriviez:

> I think the Live CD is a great suggestion. I was just thinking
> though. do you know of a Live CD that has XFS utilities on it? From
> what I've seen, I've had to install XFS utilities after installing
> the OS, which I don't think one can do with a live CD. I can look,
> but if you know of one, that would probably save some digging.

I'm pretty sure Ubuntu and Knoppix both come with xfs-progs. 

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26 21:06                         ` Emmanuel Florac
@ 2010-07-27  5:02                           ` Eli Morris
  2010-07-27  6:48                             ` Stan Hoeppner
  2010-07-27  8:21                             ` Michael Monnerie
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-27  5:02 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs

Hi,

Thanks. I tried a Knoppix live DVD (latest version) today and it does seem to have xfs-progs, but I can't get it to recognize the lvm volume that the filesystem is on. LVM2 programs are available, but 'vgscan  --mknodes' turns up nothing. I'll continue to look into it, but if anyone has any ideas, please let me know.

thanks,

Eli


On Jul 26, 2010, at 2:06 PM, Emmanuel Florac wrote:

> Le Mon, 26 Jul 2010 10:22:11 -0700 vous écriviez:
> 
>> I think the Live CD is a great suggestion. I was just thinking
>> though. do you know of a Live CD that has XFS utilities on it? From
>> what I've seen, I've had to install XFS utilities after installing
>> the OS, which I don't think one can do with a live CD. I can look,
>> but if you know of one, that would probably save some digging.
> 
> I'm pretty sure Ubuntu and Knoppix both come with xfs-progs. 
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-27  5:02                           ` Eli Morris
@ 2010-07-27  6:48                             ` Stan Hoeppner
  2010-07-27  8:21                             ` Michael Monnerie
  1 sibling, 0 replies; 45+ messages in thread
From: Stan Hoeppner @ 2010-07-27  6:48 UTC (permalink / raw)
  To: xfs

Eli Morris put forth on 7/27/2010 12:02 AM:
> Hi,
> 
> Thanks. I tried a Knoppix live DVD (latest version) today and it does seem to have xfs-progs, but I can't get it to recognize the lvm volume that the filesystem is on. LVM2 programs are available, but 'vgscan  --mknodes' turns up nothing. I'll continue to look into it, but if anyone has any ideas, please let me know.

www.sysresccd.org

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-27  5:02                           ` Eli Morris
  2010-07-27  6:48                             ` Stan Hoeppner
@ 2010-07-27  8:21                             ` Michael Monnerie
  1 sibling, 0 replies; 45+ messages in thread
From: Michael Monnerie @ 2010-07-27  8:21 UTC (permalink / raw)
  To: xfs; +Cc: Eli Morris


[-- Attachment #1.1: Type: Text/Plain, Size: 891 bytes --]

On Dienstag, 27. Juli 2010 Eli Morris wrote:
> Thanks. I tried a Knoppix live DVD (latest version) today and it does
>  seem to have xfs-progs, but I can't get it to recognize the lvm
>  volume that the filesystem is on. LVM2 programs are available, but
>  'vgscan  --mknodes' turns up nothing. I'll continue to look into it,
>  but if anyone has any ideas, please let me know.
 
A plain openSUSE DVD http://opensuse.org/ has a "rescue system" on it, 
available in every language. It has LVM and XFS support with it.

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-26 10:20                     ` Dave Chinner
@ 2010-07-28  5:12                       ` Eli Morris
  2010-07-29 19:22                         ` Eli Morris
  0 siblings, 1 reply; 45+ messages in thread
From: Eli Morris @ 2010-07-28  5:12 UTC (permalink / raw)
  To: xfs; +Cc: Michael Monnerie


On Jul 26, 2010, at 3:20 AM, Dave Chinner wrote:

> On Sun, Jul 25, 2010 at 11:46:29PM -0700, Eli Morris wrote:
>> On Jul 25, 2010, at 11:06 PM, Dave Chinner wrote:
>>> On Sun, Jul 25, 2010 at 09:04:03PM -0700, Eli Morris wrote:
>>>> On Jul 25, 2010, at 8:45 PM, Dave Chinner wrote:
>>> I've just confirmed that the problem does not exist at top-of-tree.
>>> The following commands gives the right output, and the repair at the
>>> end does not truncate the filesystem:
>>> 
>>> xfs_io -f -c "truncate $((13427728384 * 4096))" fsfile
>>> mkfs.xfs -f -l size=128m,lazy-count=0 -d size=13427728384b,agcount=126,file,name=fsfile
>>> xfs_io -f -c "truncate $((16601554944 * 4096))" fsfile
>>> mount -o loop fsfile /mnt/scratch
>>> xfs_growfs /mnt/scratch
>>> xfs_info /mnt/scratch
>>> umount /mnt/scratch
>>> xfs_db -c "sb 0" -c "p agcount" -c "p dblocks" -f fsfile
>>> xfs_db -c "sb 1" -c "p agcount" -c "p dblocks" -f fsfile
>>> xfs_db -c "sb 127" -c "p agcount" -c "p dblocks" -f fsfile
>>> xfs_repair -f fsfile
>>> 
>>> So rather than try to triage this any further, can you upgrade your
>>> kernel/system to something more recent?
>> 
>> I can update this to Centos 5 Update 4, but I can't install
>> updates forward of it's release date of Dec 15, 2009. The reason
>> is that this is the head node of a cluster and it uses the Rocks
>> cluster distribution. The newest of Rocks is based on Centos 5
>> Update 4, but Rocks systems do not support updates (via yum, for
>> example). 
>> 
>> Updating the OS takes me a day or two for the whole cluster and
>> all the user programs. If you're pretty sure that will fix the
>> problem, I'll go for it tomorrow. I'd appreciate it very much if
>> you could let me know if Centos 5.4 is recent enough that it will
>> fix the problem..
> 
> The only way I can find out is to load CentOS 5.4 onto a
> system and run the above test. You can probably do that just as
> easily as I can...
> 
>> I will note that I've grown the filesystem several times, and
>> while I recall having to unmount and remount the filesystem each
>> time for it to report its new size, I've never seen it fall back
>> to its old size when running xfs_repair. In fact, the original
>> filesystem is about 12 TB, so xfs_repair only reverses the last
>> grow and not the previous ones.
> 
> Hmmm - I can't recall any bug where unmount was required before
> the new size would show up. I know we had problems with arithmetic
> overflows in both the xfs_growfs binary and the kernel code, but
> they did not manifest in this manner. Hence I can't really say why
> you are seeing that behaviour or why this time it is different.
> 
> The suggestion of using a recent live CD to do the grow is a good
> one - it might be your best option, rather than upgrading everything....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


Hi All,

Thanks for all the help. I was finally able to get a USB thumb drive made up with Fedora 13 (64 bit version-that turned out to be important!). I did the xfs_growfs after booting off that, then rebooted back to my normal configuration, ran xfs_repair, and this time the file system stayed OK. I'm doing an overnight write test and will run xfs_repair again tomorrow morning, but I think that solved the problem. BTW, Fedora has a great tool for making USB thumb drives with the live distro on it. It does everything for you, including downloading the disc image. nice. That's a pretty nasty bug.

thanks again!

Eli

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-28  5:12                       ` Eli Morris
@ 2010-07-29 19:22                         ` Eli Morris
  2010-07-29 22:09                           ` Emmanuel Florac
  2010-07-29 23:01                           ` Dave Chinner
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-29 19:22 UTC (permalink / raw)
  To: xfs


On Jul 27, 2010, at 10:12 PM, Eli Morris wrote:

> 
> On Jul 26, 2010, at 3:20 AM, Dave Chinner wrote:
> 
>> On Sun, Jul 25, 2010 at 11:46:29PM -0700, Eli Morris wrote:
>>> On Jul 25, 2010, at 11:06 PM, Dave Chinner wrote:
>>>> On Sun, Jul 25, 2010 at 09:04:03PM -0700, Eli Morris wrote:
>>>>> On Jul 25, 2010, at 8:45 PM, Dave Chinner wrote:
>>>> I've just confirmed that the problem does not exist at top-of-tree.
>>>> The following commands gives the right output, and the repair at the
>>>> end does not truncate the filesystem:
>>>> 
>>>> xfs_io -f -c "truncate $((13427728384 * 4096))" fsfile
>>>> mkfs.xfs -f -l size=128m,lazy-count=0 -d size=13427728384b,agcount=126,file,name=fsfile
>>>> xfs_io -f -c "truncate $((16601554944 * 4096))" fsfile
>>>> mount -o loop fsfile /mnt/scratch
>>>> xfs_growfs /mnt/scratch
>>>> xfs_info /mnt/scratch
>>>> umount /mnt/scratch
>>>> xfs_db -c "sb 0" -c "p agcount" -c "p dblocks" -f fsfile
>>>> xfs_db -c "sb 1" -c "p agcount" -c "p dblocks" -f fsfile
>>>> xfs_db -c "sb 127" -c "p agcount" -c "p dblocks" -f fsfile
>>>> xfs_repair -f fsfile
>>>> 
>>>> So rather than try to triage this any further, can you upgrade your
>>>> kernel/system to something more recent?
>>> 
>>> I can update this to Centos 5 Update 4, but I can't install
>>> updates forward of it's release date of Dec 15, 2009. The reason
>>> is that this is the head node of a cluster and it uses the Rocks
>>> cluster distribution. The newest of Rocks is based on Centos 5
>>> Update 4, but Rocks systems do not support updates (via yum, for
>>> example). 
>>> 
>>> Updating the OS takes me a day or two for the whole cluster and
>>> all the user programs. If you're pretty sure that will fix the
>>> problem, I'll go for it tomorrow. I'd appreciate it very much if
>>> you could let me know if Centos 5.4 is recent enough that it will
>>> fix the problem..
>> 
>> The only way I can find out is to load CentOS 5.4 onto a
>> system and run the above test. You can probably do that just as
>> easily as I can...
>> 
>>> I will note that I've grown the filesystem several times, and
>>> while I recall having to unmount and remount the filesystem each
>>> time for it to report its new size, I've never seen it fall back
>>> to its old size when running xfs_repair. In fact, the original
>>> filesystem is about 12 TB, so xfs_repair only reverses the last
>>> grow and not the previous ones.
>> 
>> Hmmm - I can't recall any bug where unmount was required before
>> the new size would show up. I know we had problems with arithmetic
>> overflows in both the xfs_growfs binary and the kernel code, but
>> they did not manifest in this manner. Hence I can't really say why
>> you are seeing that behaviour or why this time it is different.
>> 
>> The suggestion of using a recent live CD to do the grow is a good
>> one - it might be your best option, rather than upgrading everything....
>> 
>> Cheers,
>> 
>> Dave.
>> -- 
>> Dave Chinner
>> david@fromorbit.com
> 
> 
> Hi All,
> 
> Thanks for all the help. I was finally able to get a USB thumb drive made up with Fedora 13 (64 bit version-that turned out to be important!). I did the xfs_growfs after booting off that, then rebooted back to my normal configuration, ran xfs_repair, and this time the file system stayed OK. I'm doing an overnight write test and will run xfs_repair again tomorrow morning, but I think that solved the problem. BTW, Fedora has a great tool for making USB thumb drives with the live distro on it. It does everything for you, including downloading the disc image. nice. That's a pretty nasty bug.
> 
> thanks again!
> 
> Eli
> 

Hi guys,

I tried filling up the disk with data to see if that worked Ok and it did, up until this point. There is something I don't understand going on though. 'df' says that I have 381 GB free on the disk, but I can't write to the disk anymore because it says it there isn't any space left on it. Is this some insane round off error or is there something going on here?

thanks,

Eli



[root@nimbus vol5]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb2              24G  7.6G   15G  34% /
/dev/sda5             1.7T  1.3T  391G  77% /export
/dev/sda2             3.8G  1.5G  2.2G  40% /var
tmpfs                  16G     0   16G   0% /dev/shm
/dev/sdb1             995G  946G   19G  99% /storage
tmpfs                 7.7G  7.9M  7.7G   1% /var/lib/ganglia/rrds
/dev/mapper/vg1-vol5   62T   62T  381G 100% /export/vol5
[root@nimbus vol5]# mkdir /export/vol5/testdir
mkdir: cannot create directory `/export/vol5/testdir': No space left on device
[root@nimbus vol5]# 






_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-29 19:22                         ` Eli Morris
@ 2010-07-29 22:09                           ` Emmanuel Florac
  2010-07-29 22:48                             ` Eli Morris
  2010-07-29 23:01                           ` Dave Chinner
  1 sibling, 1 reply; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-29 22:09 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Thu, 29 Jul 2010 12:22:30 -0700 vous écriviez:

> I tried filling up the disk with data to see if that worked Ok and it
> did, up until this point. There is something I don't understand going
> on though. 'df' says that I have 381 GB free on the disk, but I can't
> write to the disk anymore because it says it there isn't any space
> left on it. Is this some insane round off error or is there something
> going on here?

You probably ran out of inodes. Do you have many small files?

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-29 22:09                           ` Emmanuel Florac
@ 2010-07-29 22:48                             ` Eli Morris
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-29 22:48 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: xfs


On Jul 29, 2010, at 3:09 PM, Emmanuel Florac wrote:

> Le Thu, 29 Jul 2010 12:22:30 -0700 vous écriviez:
> 
>> I tried filling up the disk with data to see if that worked Ok and it
>> did, up until this point. There is something I don't understand going
>> on though. 'df' says that I have 381 GB free on the disk, but I can't
>> write to the disk anymore because it says it there isn't any space
>> left on it. Is this some insane round off error or is there something
>> going on here?
> 
> You probably ran out of inodes. Do you have many small files?
> 
> -- 
> ------------------------------------------------------------------------
> Emmanuel Florac     |   Direction technique
>                    |   Intellique
>                    |	<eflorac@intellique.com>
>                    |   +33 1 78 94 84 02
> ------------------------------------------------------------------------

Hi,

I checked the inodes with this command, and it looks like I have plenty left. It's this entry, '/dev/mapper/vg1-vol5'.  

[root@nimbus ~]# df -hi
Filesystem            Inodes   IUsed   IFree IUse% Mounted on
/dev/sdb2               6.2M    208K    6.0M    4% /
/dev/sda5               219M    452K    218M    1% /export
/dev/sda2              1001K    5.5K    996K    1% /var
tmpfs                   4.0M       1    4.0M    1% /dev/shm
/dev/mapper/vg1-vol5    1.5G   1020K    1.5G    1% /export/vol5
/dev/sdb1               127M     42K    127M    1% /storage
tmpfs                   4.0M     629    4.0M    1% /var/lib/ganglia/rrds


thanks,

Eli


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-29 19:22                         ` Eli Morris
  2010-07-29 22:09                           ` Emmanuel Florac
@ 2010-07-29 23:01                           ` Dave Chinner
  2010-07-29 23:15                             ` Eli Morris
  1 sibling, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2010-07-29 23:01 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

On Thu, Jul 29, 2010 at 12:22:30PM -0700, Eli Morris wrote:
> I tried filling up the disk with data to see if that worked Ok and
> it did, up until this point. There is something I don't understand
> going on though. 'df' says that I have 381 GB free on the disk,
> but I can't write to the disk anymore because it says it there
> isn't any space left on it. Is this some insane round off error or
> is there something going on here?

If you haven't specified inode64, then all inodes are located below
the 1TB mark. You've probably run out of space (or contiguous 16k
chunks of free space) below 1TB. Using inode64 will avoid this - for
a filesystem of that size, inode64 is probably a good idea as it
will significantly improve performance.

Keep in mind that legacy 32bit applications may have problems with
64 bit inode numbers, so if you have such applications (or 32 bit
NFS clients) then you need to check carefully as to whether 64 bit
inodes will work correctly or not....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-29 23:01                           ` Dave Chinner
@ 2010-07-29 23:15                             ` Eli Morris
  2010-07-30  0:39                               ` Michael Monnerie
  2010-07-30  7:15                               ` Emmanuel Florac
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-29 23:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


On Jul 29, 2010, at 4:01 PM, Dave Chinner wrote:

> On Thu, Jul 29, 2010 at 12:22:30PM -0700, Eli Morris wrote:
>> I tried filling up the disk with data to see if that worked Ok and
>> it did, up until this point. There is something I don't understand
>> going on though. 'df' says that I have 381 GB free on the disk,
>> but I can't write to the disk anymore because it says it there
>> isn't any space left on it. Is this some insane round off error or
>> is there something going on here?
> 
> If you haven't specified inode64, then all inodes are located below
> the 1TB mark. You've probably run out of space (or contiguous 16k
> chunks of free space) below 1TB. Using inode64 will avoid this - for
> a filesystem of that size, inode64 is probably a good idea as it
> will significantly improve performance.
> 
> Keep in mind that legacy 32bit applications may have problems with
> 64 bit inode numbers, so if you have such applications (or 32 bit
> NFS clients) then you need to check carefully as to whether 64 bit
> inodes will work correctly or not....
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com


Hi Dave,

Is that something I could do now, or only when the filesystem is created? If I mount it with '-o inode64', can I mount it without that later, or once I write to the filesystem like that, can I not go back to mounting it in 32 bit mode? 

Thanks again,

Eli



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-29 23:15                             ` Eli Morris
@ 2010-07-30  0:39                               ` Michael Monnerie
  2010-07-30  1:49                                 ` Eli Morris
  2010-07-30  7:15                               ` Emmanuel Florac
  1 sibling, 1 reply; 45+ messages in thread
From: Michael Monnerie @ 2010-07-30  0:39 UTC (permalink / raw)
  To: xfs; +Cc: Eli Morris


[-- Attachment #1.1: Type: Text/Plain, Size: 1008 bytes --]

On Freitag, 30. Juli 2010 Eli Morris wrote:
> Is that something I could do now, or only when the filesystem is
>  created? If I mount it with '-o inode64', can I mount it without
>  that later, or once I write to the filesystem like that, can I not
>  go back to mounting it in 32 bit mode? 
 
Once you mount with inode64, you can't go back.

I'm not sure which applications are "broken", but I use inode64 for a 
backup server where lots of Linux boxes connect to over NFS, and a Samba 
for Windows PCs which has inode64, and at least didn't see any problem 
over the years. That doesn't mean there isn't a problem, of course ;-)

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-30  0:39                               ` Michael Monnerie
@ 2010-07-30  1:49                                 ` Eli Morris
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-30  1:49 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: xfs


On Jul 29, 2010, at 5:39 PM, Michael Monnerie wrote:

> On Freitag, 30. Juli 2010 Eli Morris wrote:
>> Is that something I could do now, or only when the filesystem is
>> created? If I mount it with '-o inode64', can I mount it without
>> that later, or once I write to the filesystem like that, can I not
>> go back to mounting it in 32 bit mode? 
> 
> Once you mount with inode64, you can't go back.
> 
> I'm not sure which applications are "broken", but I use inode64 for a 
> backup server where lots of Linux boxes connect to over NFS, and a Samba 
> for Windows PCs which has inode64, and at least didn't see any problem 
> over the years. That doesn't mean there isn't a problem, of course ;-)
> 
> -- 
> mit freundlichen Grüssen,
> Michael Monnerie, Ing. BSc
> 
> it-management Internet Services
> http://proteger.at [gesprochen: Prot-e-schee]
> Tel: 0660 / 415 65 31
> 
> ****** Aktuelles Radiointerview! ******
> http://www.it-podcast.at/aktuelle-sendung.html
> 
> // Wir haben im Moment zwei Häuser zu verkaufen:
> // http://zmi.at/langegg/
> // http://zmi.at/haus2009/

Thanks guys,

That makes sense.

cheers,
Eli

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-29 23:15                             ` Eli Morris
  2010-07-30  0:39                               ` Michael Monnerie
@ 2010-07-30  7:15                               ` Emmanuel Florac
  2010-07-30  7:57                                 ` Christoph Hellwig
  1 sibling, 1 reply; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-30  7:15 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Thu, 29 Jul 2010 16:15:55 -0700 vous écriviez:

> Is that something I could do now, or only when the filesystem is
> created? If I mount it with '-o inode64', can I mount it without that
> later, or once I write to the filesystem like that, can I not go back
> to mounting it in 32 bit mode? 

You use inode64 at mount, however there is no return. Once you have
used a filesystem with "inode64", you must use it always.

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-30  7:15                               ` Emmanuel Florac
@ 2010-07-30  7:57                                 ` Christoph Hellwig
  2010-07-30 10:23                                   ` Michael Monnerie
  0 siblings, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2010-07-30  7:57 UTC (permalink / raw)
  To: Emmanuel Florac; +Cc: Eli Morris, xfs

On Fri, Jul 30, 2010 at 09:15:54AM +0200, Emmanuel Florac wrote:
> > Is that something I could do now, or only when the filesystem is
> > created? If I mount it with '-o inode64', can I mount it without that
> > later, or once I write to the filesystem like that, can I not go back
> > to mounting it in 32 bit mode? 
> 
> You use inode64 at mount, however there is no return. Once you have
> used a filesystem with "inode64", you must use it always.

Recent enough kernel work fine with filesystems that inode64 was used
on even if it's not specified anymore.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-30  7:57                                 ` Christoph Hellwig
@ 2010-07-30 10:23                                   ` Michael Monnerie
  2010-07-30 10:29                                     ` Christoph Hellwig
  0 siblings, 1 reply; 45+ messages in thread
From: Michael Monnerie @ 2010-07-30 10:23 UTC (permalink / raw)
  To: xfs; +Cc: Christoph Hellwig


[-- Attachment #1.1: Type: Text/Plain, Size: 723 bytes --]

On Freitag, 30. Juli 2010 Christoph Hellwig wrote:
> Recent enough kernel work fine with filesystems that inode64 was used
> on even if it's not specified anymore.
 
Really? Since when exactly? That would be a nice feature. If we can 
define it clearly, I could put that on the FAQ.

But how does it truncate the numbers >int32 and avoid collisions?

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-30 10:23                                   ` Michael Monnerie
@ 2010-07-30 10:29                                     ` Christoph Hellwig
  2010-07-30 12:40                                       ` Michael Monnerie
  2010-07-30 13:17                                       ` Emmanuel Florac
  0 siblings, 2 replies; 45+ messages in thread
From: Christoph Hellwig @ 2010-07-30 10:29 UTC (permalink / raw)
  To: Michael Monnerie; +Cc: Christoph Hellwig, xfs

On Fri, Jul 30, 2010 at 12:23:08PM +0200, Michael Monnerie wrote:
> On Freitag, 30. Juli 2010 Christoph Hellwig wrote:
> > Recent enough kernel work fine with filesystems that inode64 was used
> > on even if it's not specified anymore.
>  
> Really? Since when exactly? That would be a nice feature. If we can 
> define it clearly, I could put that on the FAQ.

Linux 2.6.35 will be the first kernel with the bugfixes for this to
work.

> But how does it truncate the numbers >int32 and avoid collisions?

It doesn't.  Existing inodes won't nessecarily fit into 32-bits, but
no new inodes above it will be allocated.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-30 10:29                                     ` Christoph Hellwig
@ 2010-07-30 12:40                                       ` Michael Monnerie
  2010-07-30 13:17                                       ` Emmanuel Florac
  1 sibling, 0 replies; 45+ messages in thread
From: Michael Monnerie @ 2010-07-30 12:40 UTC (permalink / raw)
  To: xfs; +Cc: Christoph Hellwig


[-- Attachment #1.1: Type: Text/Plain, Size: 1418 bytes --]

On Freitag, 30. Juli 2010 Christoph Hellwig wrote:
> On Fri, Jul 30, 2010 at 12:23:08PM +0200, Michael Monnerie wrote:
> > On Freitag, 30. Juli 2010 Christoph Hellwig wrote:
> > > Recent enough kernel work fine with filesystems that inode64 was
> > > used on even if it's not specified anymore.
> >
> >  
> > Really? Since when exactly? That would be a nice feature. If we
> > can  define it clearly, I could put that on the FAQ.
> 
> Linux 2.6.35 will be the first kernel with the bugfixes for this to
> work.

Hihi, *rofl*. That's what developers mean by "recent enough kernel": It 
will be "in the next release to come". :-)

> > But how does it truncate the numbers >int32 and avoid collisions?
> 
> It doesn't.  Existing inodes won't nessecarily fit into 32-bits, but
> no new inodes above it will be allocated.

OK, sounds simple. I wrote two new FAQ entries:

http://xfs.org/index.php/XFS_FAQ#Q:_What_is_the_inode64_mount_option_for.3F

Could "all who know better than me" please verify if the information is 
correct?

-- 
mit freundlichen Grüssen,
Michael Monnerie, Ing. BSc

it-management Internet Services
http://proteger.at [gesprochen: Prot-e-schee]
Tel: 0660 / 415 65 31

****** Aktuelles Radiointerview! ******
http://www.it-podcast.at/aktuelle-sendung.html

// Wir haben im Moment zwei Häuser zu verkaufen:
// http://zmi.at/langegg/
// http://zmi.at/haus2009/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

[-- Attachment #2: Type: text/plain, Size: 121 bytes --]

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-30 10:29                                     ` Christoph Hellwig
  2010-07-30 12:40                                       ` Michael Monnerie
@ 2010-07-30 13:17                                       ` Emmanuel Florac
  1 sibling, 0 replies; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-30 13:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Michael Monnerie, xfs, Christoph

Le Fri, 30 Jul 2010 06:29:43 -0400
Christoph Hellwig <hch@infradead.org> écrivait:

> Linux 2.6.35 will be the first kernel with the bugfixes for this to
> work.

Then it's hardly a feature we may count on for a while :)

-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-09 23:07 Eli Morris
  2010-07-10  8:16 ` Stan Hoeppner
@ 2010-07-24 21:09 ` Eric Sandeen
  1 sibling, 0 replies; 45+ messages in thread
From: Eric Sandeen @ 2010-07-24 21:09 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Eli Morris wrote:
> Hi All,
> 
> I've got this problem where if I run xfs_repair, my filesystem
> shrinks by 11 TB, from a volume size of 62 TB to 51 TB. I can grow
> the filesystem again with xfs_growfs, but then rerunning xfs_repair
> shrinks it back down again. The first time this happened was a few
> days ago and running xfs_repair took about 7 TB of data with it. That
> is, out of the 11 TB of disk space that vanished, 7 TB had data on
> it, and 4 TB was empty space. XFS is running on top of an LVM volume.
> It's on an Intel/Linux system running Centos 5 (2.6.18-128.1.14.el5).

Running 2.6.18-138.el5 and beyond will have a much more uptodate xfs,
just FWIW.

-Eric

> Does anyone have an idea on what would cause such a thing and what I
> might try to keep it from continuing to happen. I could just never
> run xfs_repair again, but that doesn't seem like a good thing to
> count on. Major bonus points if anyone has any ideas on how to get my
> 7 TB of data back also. It must be there somewhere and it would be
> very bad to lose.
> 
> thanks for any help and ideas. I'm just stumped right now.
> 
> Eli _______________________________________________ xfs mailing list 
> xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
@ 2010-07-12  6:39 Eli Morris
  0 siblings, 0 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-12  6:39 UTC (permalink / raw)
  To: xfs

Eli Morris put forth on 7/11/2010 8:10 PM:
>Hi guys,
>
>Here are some of the log files from my XFS problem. Yes, I think this all 
>started with a hardware failure of some sort. My storage is RAID 6, a an 
>Astra SecureStor ES.
>
>
> [root@nimbus log]# more messages.1 | grep I/O
>Jul  2 17:02:30 nimbus kernel: sd 6:0:0:0: rejecting I/O to offline device
>Jul  2 17:02:30 nimbus kernel: sd 6:0:0:0: rejecting I/O to offline device
>Jul  2 17:02:30 nimbus kernel: sr 5:0:0:0: rejecting I/O to offline device

<snip>

What does the web gui log on the Astra ES tell you?

If the Astra supports syslogging (I assume it does as it is billed as
"enterprise class") you should configure that to facilitate consistent error
information gathering--i.e. grep everything from one terminal session.

-- 
Stan

Hi,
After I got into work on Tues, I looked at the log files from the web interface for the Astra RAID controller. I also contacted support and sent them a system report, which contains the logs as well as information about the system. Neither he, nor I, saw any problems in the log files. The support person said he could not find any problems with the units. The only thing he mentioned was that I should turn off the SMART daemon, because it does not work with RAID units. I'll note also that since the day when the I/O errors occurred, I have not seen any additional errors from the units, although that may be because we are not reading or writing to them for obvious reasons.
If it turns out we can not recover the lost amount of data, I still would need to get the remaining filesystem stable, so we can restore what we can from backup and get going again. And I'm really wondering why if I grow the filesystem back to the size of the LVM logical volume, when I run xfs_repair, it shrinks back down again. For all I know, that always happens, when one runs xfs_repair on a just expanded file system. I'll check into the syslogging capability of the Astra, but as of now, I have to look at its separate log files from the web gui.

thanks,
Eli




_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-11  6:32 Eli Morris
  2010-07-11 10:56 ` Stan Hoeppner
@ 2010-07-11 16:29 ` Emmanuel Florac
  1 sibling, 0 replies; 45+ messages in thread
From: Emmanuel Florac @ 2010-07-11 16:29 UTC (permalink / raw)
  To: Eli Morris; +Cc: xfs

Le Sat, 10 Jul 2010 23:32:57 -0700 vous écriviez:

> I got some automated emails this Sunday about I/O errors coming from
> the computer 

That smells like a hardware problem. What type of RAID is this?
RAID-5, RAID-10, RAID-6? are there any alarms from the RAID controller?
Can you test the SMART status of the drives? What are the JBODs, are
these dell MD-1000? 

> One one of the
> physical volumes  (PVs) - on /dev/sdc1, I noticed when I ran
> pvdisplay that of the 12.75 TB comprising the volume, 12.00! TB was
> being shown as 'not usable'.

Smells more like a hardware problem. Check all your systems logs for IO
errors and errors coming from the sas driver. Are you using mptsas or
megaraid driver? Grep the logs with the driver name to check for any
message (time outs, IO errors, etc).


> thinking it might find the missing
> data. Instead the filesystem decreased back to 51 TB. I rebooted and
> tried again a couple of times and the same thing happened. I'd
> really, really like to get that data back somehow and also to get the
> filesystem to where we can start using it again.

Check the dmesg output right after the xfs_repair. My bet : there is an
IO error (bad cable? hosed drive?) (message from the controller), the PV
is failed (message from LVM), then xfs_repair does what it must do : it
truncates the filesystem to the size of the underlying device.

Unfortunately  the data may still be on the drives, but a tool like
photorec is probably your only chance to get it back from the raw
drives. Metadata, filenames, directory hierarchies are almost certainly
gone once and for all.


-- 
------------------------------------------------------------------------
Emmanuel Florac     |   Direction technique
                    |   Intellique
                    |	<eflorac@intellique.com>
                    |   +33 1 78 94 84 02
------------------------------------------------------------------------

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-11  6:32 Eli Morris
@ 2010-07-11 10:56 ` Stan Hoeppner
  2010-07-11 16:29 ` Emmanuel Florac
  1 sibling, 0 replies; 45+ messages in thread
From: Stan Hoeppner @ 2010-07-11 10:56 UTC (permalink / raw)
  To: xfs

Eli Morris put forth on 7/11/2010 1:32 AM:
>> Eli Morris put forth on 7/9/2010 6:07 PM:
>>> Hi All,
>>>
>>> I've got this problem where if I run xfs_repair, my filesystem shrinks by 11 TB, from a volume size of 62 TB to 51 TB. I can grow the filesystem again with xfs_growfs, but then rerunning xfs_repair shrinks it back down again. The first time this happened was a few days ago and running xfs_repair took about 7 TB of data with it. That is, out of the 11 TB of disk space that vanished, 7 TB had data on it, and 4 TB was empty space. XFS is running on top of an LVM volume. It's on an Intel/Linux system running Centos 5 (2.6.18-128.1.14.el5). Does anyone have an idea on what would cause such a thing and what I might try to keep it from continuing to happen. I could just never run xfs_repair again, but that doesn't seem like a good thing to count on. Major bonus points if anyone has any ideas on how to get my 7 TB of data back also. It must be there somewhere and it would be very bad to lose.
>>>
>>> thanks for any help and ideas. I'm just stumped right now.
>>
>> It may be helpful if you can provide more history (how long has this been
>> happening, recent upgrade?), the exact xfs_repair command line used, why you
>> were running xfs_repair in the first place, hardware or software RAID, what
>> xfsprogs version, relevant log snippets, etc.
> 
> Hi Stan,
> 
> Thanks for responding. Sure, I'll try and give more information.
> 
> I got some automated emails this Sunday about I/O errors coming from the computer (which is a Dell Poweredge 2950 w/ a connected 16 bay hardware RAID which is connected itself to 4 16 bay JBODs. The RAID controller is connected via SAS / LSI Fusion card to the Poweredge - Nimbus). It was Sunday, so I just logged in, rebooted, ran xfs_repair, then mounted the filesystem back. I tried a quick little write test, just to make sure I could write a file to it and read it back and called it a day until work the next day. When I came into work, I looked at the volume more closely and noticed that the filesystem shrank as I stated. Each of the RAID/JBODs is configured as a separate device and represents one physical volume in my LVM2 scheme, and those physical volumes are then combined into one logical volume. Then the filesystem sits on top of this. One one of the physical volumes  (PVs) - on /dev/sdc1, I noticed when I ran pvdisplay that of the 12.75 TB comprising the volume, 12.
 
00!
>   TB was being shown as 'not usable'. Usually this number is a couple of megabytes. So, after staring at this a while, I ran pvresize on that PV. The volume then listed 12.75 as usable, with a couple of megabytes not usable as one would expect. I then gave the command xfs_growfs on my filesystem and once again the file system was back to 62 TB. But it was showing the increased space as free space, instead of only 4.x TB of it as free as before all this happened. I then ran xfs_repair on this again, thinking it might find the missing data. Instead the filesystem decreased back to 51 TB. I rebooted and tried again a couple of times and the same thing happened. I'd really, really like to get that data back somehow and also to get the filesystem to where we can start using it again.
> 
> Version 2.9.4 of xfsprogs. xfs_repair line used 'xfs_repair /dev/vg1/vol5', vol5 being the LVM2 logical volume.  I spoke with tech support from my RAID vendor and he said he did not see any sign of errors with the RAID itself for what that is worth.
> 
> Nimbus is the hostname of the computer that is connected to the RAID/JBODs unit. The other computers (compute-0-XX) are only connected via NFS to the RAID/JBODs.
> 
> I've tried to provide a lot here,  but if I can provide any more information, please let me know. Thanks very much,
> 
> Eli
> 
> I'm trying to post logs, but my emails keep getting bounced. I'll see if this one makes it.

We just need the snippets relating to the problem at hand, not an entire
syslog file.  I'm guessing you attempted to attach the entire log, which is
likely what caused the rejection by the list server.

You said you received an email alert when the first errors occurred.
Correlate the time stamp in that alert msg to lines in syslog relating to the
LSI controller, LVM, XFS, etc.  Also grab the log entries for each xfs_repair
run you performed, along with log entries for the xfs_growfs operation.

Log errors/information is always critical to solving problems such as this.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
@ 2010-07-11  6:32 Eli Morris
  2010-07-11 10:56 ` Stan Hoeppner
  2010-07-11 16:29 ` Emmanuel Florac
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-11  6:32 UTC (permalink / raw)
  To: xfs

> Eli Morris put forth on 7/9/2010 6:07 PM:
>> Hi All,
>> 
>> I've got this problem where if I run xfs_repair, my filesystem shrinks by 11 TB, from a volume size of 62 TB to 51 TB. I can grow the filesystem again with xfs_growfs, but then rerunning xfs_repair shrinks it back down again. The first time this happened was a few days ago and running xfs_repair took about 7 TB of data with it. That is, out of the 11 TB of disk space that vanished, 7 TB had data on it, and 4 TB was empty space. XFS is running on top of an LVM volume. It's on an Intel/Linux system running Centos 5 (2.6.18-128.1.14.el5). Does anyone have an idea on what would cause such a thing and what I might try to keep it from continuing to happen. I could just never run xfs_repair again, but that doesn't seem like a good thing to count on. Major bonus points if anyone has any ideas on how to get my 7 TB of data back also. It must be there somewhere and it would be very bad to lose.
>> 
>> thanks for any help and ideas. I'm just stumped right now.
> 
> It may be helpful if you can provide more history (how long has this been
> happening, recent upgrade?), the exact xfs_repair command line used, why you
> were running xfs_repair in the first place, hardware or software RAID, what
> xfsprogs version, relevant log snippets, etc.

Hi Stan,

Thanks for responding. Sure, I'll try and give more information.

I got some automated emails this Sunday about I/O errors coming from the computer (which is a Dell Poweredge 2950 w/ a connected 16 bay hardware RAID which is connected itself to 4 16 bay JBODs. The RAID controller is connected via SAS / LSI Fusion card to the Poweredge - Nimbus). It was Sunday, so I just logged in, rebooted, ran xfs_repair, then mounted the filesystem back. I tried a quick little write test, just to make sure I could write a file to it and read it back and called it a day until work the next day. When I came into work, I looked at the volume more closely and noticed that the filesystem shrank as I stated. Each of the RAID/JBODs is configured as a separate device and represents one physical volume in my LVM2 scheme, and those physical volumes are then combined into one logical volume. Then the filesystem sits on top of this. One one of the physical volumes  (PVs) - on /dev/sdc1, I noticed when I ran pvdisplay that of the 12.75 TB comprising the volume, 12.00!
  TB was being shown as 'not usable'. Usually this number is a couple of megabytes. So, after staring at this a while, I ran pvresize on that PV. The volume then listed 12.75 as usable, with a couple of megabytes not usable as one would expect. I then gave the command xfs_growfs on my filesystem and once again the file system was back to 62 TB. But it was showing the increased space as free space, instead of only 4.x TB of it as free as before all this happened. I then ran xfs_repair on this again, thinking it might find the missing data. Instead the filesystem decreased back to 51 TB. I rebooted and tried again a couple of times and the same thing happened. I'd really, really like to get that data back somehow and also to get the filesystem to where we can start using it again.

Version 2.9.4 of xfsprogs. xfs_repair line used 'xfs_repair /dev/vg1/vol5', vol5 being the LVM2 logical volume.  I spoke with tech support from my RAID vendor and he said he did not see any sign of errors with the RAID itself for what that is worth.

Nimbus is the hostname of the computer that is connected to the RAID/JBODs unit. The other computers (compute-0-XX) are only connected via NFS to the RAID/JBODs.

I've tried to provide a lot here,  but if I can provide any more information, please let me know. Thanks very much,

Eli

I'm trying to post logs, but my emails keep getting bounced. I'll see if this one makes it.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: filesystem shrinks after using xfs_repair
  2010-07-09 23:07 Eli Morris
@ 2010-07-10  8:16 ` Stan Hoeppner
  2010-07-24 21:09 ` Eric Sandeen
  1 sibling, 0 replies; 45+ messages in thread
From: Stan Hoeppner @ 2010-07-10  8:16 UTC (permalink / raw)
  To: xfs

Eli Morris put forth on 7/9/2010 6:07 PM:
> Hi All,
> 
> I've got this problem where if I run xfs_repair, my filesystem shrinks by 11 TB, from a volume size of 62 TB to 51 TB. I can grow the filesystem again with xfs_growfs, but then rerunning xfs_repair shrinks it back down again. The first time this happened was a few days ago and running xfs_repair took about 7 TB of data with it. That is, out of the 11 TB of disk space that vanished, 7 TB had data on it, and 4 TB was empty space. XFS is running on top of an LVM volume. It's on an Intel/Linux system running Centos 5 (2.6.18-128.1.14.el5). Does anyone have an idea on what would cause such a thing and what I might try to keep it from continuing to happen. I could just never run xfs_repair again, but that doesn't seem like a good thing to count on. Major bonus points if anyone has any ideas on how to get my 7 TB of data back also. It must be there somewhere and it would be very bad to lose.
> 
> thanks for any help and ideas. I'm just stumped right now.

It may be helpful if you can provide more history (how long has this been
happening, recent upgrade?), the exact xfs_repair command line used, why you
were running xfs_repair in the first place, hardware or software RAID, what
xfsprogs version, relevant log snippets, etc.

-- 
Stan

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* filesystem shrinks after using xfs_repair
@ 2010-07-09 23:07 Eli Morris
  2010-07-10  8:16 ` Stan Hoeppner
  2010-07-24 21:09 ` Eric Sandeen
  0 siblings, 2 replies; 45+ messages in thread
From: Eli Morris @ 2010-07-09 23:07 UTC (permalink / raw)
  To: xfs

Hi All,

I've got this problem where if I run xfs_repair, my filesystem shrinks by 11 TB, from a volume size of 62 TB to 51 TB. I can grow the filesystem again with xfs_growfs, but then rerunning xfs_repair shrinks it back down again. The first time this happened was a few days ago and running xfs_repair took about 7 TB of data with it. That is, out of the 11 TB of disk space that vanished, 7 TB had data on it, and 4 TB was empty space. XFS is running on top of an LVM volume. It's on an Intel/Linux system running Centos 5 (2.6.18-128.1.14.el5). Does anyone have an idea on what would cause such a thing and what I might try to keep it from continuing to happen. I could just never run xfs_repair again, but that doesn't seem like a good thing to count on. Major bonus points if anyone has any ideas on how to get my 7 TB of data back also. It must be there somewhere and it would be very bad to lose.

thanks for any help and ideas. I'm just stumped right now.

Eli
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2010-07-29 19:19 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-12  1:10 filesystem shrinks after using xfs_repair Eli Morris
2010-07-12  2:24 ` Stan Hoeppner
2010-07-12 11:47 ` Emmanuel Florac
2010-07-23  8:30   ` Eli Morris
2010-07-23 10:23     ` Emmanuel Florac
2010-07-23 16:36       ` Eli Morris
2010-07-24  0:54     ` Dave Chinner
2010-07-24  1:08       ` Eli Morris
2010-07-24  2:39         ` Dave Chinner
2010-07-26  3:20           ` Eli Morris
2010-07-26  3:45             ` Dave Chinner
2010-07-26  4:04               ` Eli Morris
2010-07-26  5:57                 ` Michael Monnerie
2010-07-26  6:06                 ` Dave Chinner
2010-07-26  6:46                   ` Eli Morris
2010-07-26  8:40                     ` Michael Monnerie
2010-07-26  9:49                     ` Emmanuel Florac
2010-07-26 17:22                       ` Eli Morris
2010-07-26 18:33                         ` Stuart Rowan
2010-07-26 21:06                         ` Emmanuel Florac
2010-07-27  5:02                           ` Eli Morris
2010-07-27  6:48                             ` Stan Hoeppner
2010-07-27  8:21                             ` Michael Monnerie
2010-07-26 10:20                     ` Dave Chinner
2010-07-28  5:12                       ` Eli Morris
2010-07-29 19:22                         ` Eli Morris
2010-07-29 22:09                           ` Emmanuel Florac
2010-07-29 22:48                             ` Eli Morris
2010-07-29 23:01                           ` Dave Chinner
2010-07-29 23:15                             ` Eli Morris
2010-07-30  0:39                               ` Michael Monnerie
2010-07-30  1:49                                 ` Eli Morris
2010-07-30  7:15                               ` Emmanuel Florac
2010-07-30  7:57                                 ` Christoph Hellwig
2010-07-30 10:23                                   ` Michael Monnerie
2010-07-30 10:29                                     ` Christoph Hellwig
2010-07-30 12:40                                       ` Michael Monnerie
2010-07-30 13:17                                       ` Emmanuel Florac
  -- strict thread matches above, loose matches on Subject: below --
2010-07-12  6:39 Eli Morris
2010-07-11  6:32 Eli Morris
2010-07-11 10:56 ` Stan Hoeppner
2010-07-11 16:29 ` Emmanuel Florac
2010-07-09 23:07 Eli Morris
2010-07-10  8:16 ` Stan Hoeppner
2010-07-24 21:09 ` Eric Sandeen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.