From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from cuda.sgi.com (cuda2.sgi.com [192.48.176.25]) by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id oADFOIRd012847 for ; Sat, 13 Nov 2010 09:24:18 -0600 Received: from mailsrv14.zmi.at (localhost [127.0.0.1]) by cuda.sgi.com (Spam Firewall) with ESMTP id 812C51776B0 for ; Sat, 13 Nov 2010 07:25:46 -0800 (PST) Received: from mailsrv14.zmi.at (mailsrv1.zmi.at [212.69.164.54]) by cuda.sgi.com with ESMTP id 2kL7k3Z7IY6CLLbp for ; Sat, 13 Nov 2010 07:25:46 -0800 (PST) From: Michael Monnerie Subject: Re: xfs_repair of critical volume Date: Sat, 13 Nov 2010 16:25:42 +0100 References: <75C248E3-2C99-426E-AE7D-9EC543726796@ucsc.edu> <201011121422.28993@zmi.at> In-Reply-To: MIME-Version: 1.0 Message-Id: <201011131625.43576@zmi.at> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: multipart/mixed; boundary="===============2957589349603072173==" Sender: xfs-bounces@oss.sgi.com Errors-To: xfs-bounces@oss.sgi.com To: xfs@oss.sgi.com Cc: Eli Morris --===============2957589349603072173== Content-Type: multipart/signed; boundary="nextPart3964445.bVSuBY1gDW"; protocol="application/pgp-signature"; micalg=pgp-sha1 Content-Transfer-Encoding: 7bit --nextPart3964445.bVSuBY1gDW Content-Type: Text/Plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable On Samstag, 13. November 2010 Eli Morris wrote: > This is a small University lab setup. We do not have access to a lot > of resources. We do have a partial tape backup of this data, but... Yes, Eli, I understand you. We also have universities as customers, and=20 I know there's no money. But you're definitely deep in shit now. Isn't=20 there another department with tape backup that you could "borrow" in=20 this state of crisis? =20 > a) tape backup So, if you can't do that, we forget it. =20 > b) I don't have another system to copy the files to. (disk backup) So, you can't even copy the rest of the still-existing data away. The way you describe it, you will have to mess around with the existing=20 data. So first, did you run xfs-repair without "-n", so that it actually=20 repairs whatever it can? Maybe run it several times, until no more error=20 shows up. You need to ensure you are in a clean state. Then, try to access the files that are still there. A simple script like find /mydestroyedfs -exec dd if=3D{} of=3D/dev/null bs=3D1024k \; would read all files once. If this causes errors, either remove the=20 problematic files, or maybe xfs-repair will clean those out then. Now try to access the data with your application, and see which contents=20 are still valid. I guess there will be files that are truncated, or=20 partly overwritten, or otherwise badly messed. Delete all those files. Maybe, if you're lucky, you can still use some of that data. I've once=20 had a filesystem where the first 1/3rd of the disks has been zeroed, and=20 till most files could be recovered. But then again, another customer had=20 only about 5-10% overwritten, and could drop all data because an index=20 was destroyed so the data was worthless. It definitely depends on your app. Hopefully that app uses checksums,=20 that would make your life easier now. > c) We are working on making sure everything is working OK. I think > the power output from our UPS might be problematic. We are > definitely investigating that, because it could be behind all these > crazy problems. I generally do the following, if only one UPS is available: put one=20 power supply on the UPS, and the other on the normal line. I hope you=20 have redundant PS, do you? That helps whenever the UPS is crazy, at=20 least the normal power is available. Better would be two different=20 UPSes, but budget is scarce very often. > d) Checking every files' content manually is not something that is > going to work. It would, literally, take years. OK, so what you want to do? Just use it and hope the data is valid? If=20 you don't check the files, every calculation you do with that broken=20 data is *bogus*, so you better delete it than have wrong data, or no? =20 > Would de-fraging the filesystem remove those zeroed files from the > filesystem? Does anyone make a XFS utility program that might help? > Maybe an XFS utility that can be used to remove zeroed files from > the filesystem? Or remove files that are stored in that one bad LVM > volume? Maybe xfs_db can help you find and identify files that had parts or all=20 of their data in that area, and remove them. =2D-=20 mit freundlichen Gr=FCssen, Michael Monnerie, Ing. BSc it-management Internet Services: Prot=E9ger http://proteger.at [gesprochen: Prot-e-schee] Tel: +43 660 / 415 6531 // ****** Radiointerview zum Thema Spam ****** // http://www.it-podcast.at/archiv.html#podcast-100716 //=20 // Haus zu verkaufen: http://zmi.at/langegg/ --nextPart3964445.bVSuBY1gDW Content-Type: application/pgp-signature; name=signature.asc Content-Description: This is a digitally signed message part. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) iEYEABECAAYFAkzerfcACgkQzhSR9xwSCbTPfgCbB+z33M+oIwb2kcXcoiEpBqm8 8nQAoMoKkysL0XbOtfMldUENELs0jE+B =rEfM -----END PGP SIGNATURE----- --nextPart3964445.bVSuBY1gDW-- --===============2957589349603072173== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs --===============2957589349603072173==--