From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mout.gmx.net ([212.227.17.22]:59207 "EHLO mout.gmx.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753418AbcLZRlo (ORCPT ); Mon, 26 Dec 2016 12:41:44 -0500 MIME-Version: 1.0 Message-ID: From: "Xin Zhou" To: "Giuseppe Della Bianca" Cc: "Btrfs BTRFS" Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive Content-Type: text/plain; charset=UTF-8 Date: Mon, 26 Dec 2016 18:41:38 +0100 In-Reply-To: <3040771.5eLdPxENH3@exnet.gdb.it> References: <1479730155.5832e3eb3fde8@webmail.adria.it> <26479704.a8Su2NvQ2R@exnet.gdb.it> , <3040771.5eLdPxENH3@exnet.gdb.it> Sender: linux-btrfs-owner@vger.kernel.org List-ID: That is one way to diagnose the issue in data path. If ssh can guarantee data transfer and retry, then those data protection company does not need to have a whole team handle the send / receive for remote data backup. In your case, if the conection is very light, then the issue could be in other place. Xin   Sent: Monday, December 26, 2016 at 3:04 AM From: "Giuseppe Della Bianca" To: "Xin Zhou" , "Btrfs BTRFS" Subject: Re: [CORRUPTION FILESYSTEM] Corrupted and unrecoverable file system during the snapshot receive Hi. I agree with Duncan, and I add: - For remote transfer is used ssh. ssh is designed to ensure integrity of data. - Remote transfer uses a Gigabit Ethernet, it is never congested. - I had the same problems with a local btrfs receive. - The script currently has 907 lines of code, many of which are to ensure the detection and display of btrfs tools errors. - The script stops executing when btrs tools return an error code. - Is not possible that the script does not display error messages or ignore error code of btrfs tools. An example of today: (2016-12-26 10:53:51) Start btrfsManage . . . Start managing SEND ' / ' filesystem ' root ' snapshot in ' /dev/sda2 ' Sending ' root-2016-12-04_18:13:57.35 ' source snapshot to ' btrfsreceive ' subvolume . . . btrfs send -p /tmp/tmp.xJWkEN1U23/btrfssnapshot/root/root-2016-12-03_18:07:09.34 /tmp/tmp.xJWkEN1U23/btrfssnapshot/root/root-2016-12-04_18:13:57.35 | btrfs receive /tmp/tmp.pWWKP4vfAy/btrfsreceive/root/.part/ . . . At subvol /tmp/tmp.xJWkEN1U23/btrfssnapshot/root/root-2016-12-04_18:13:57.35 . . . ERROR: truncate usr/share/locale/it/LC_MESSAGES/kio4.mo failed: Read-only file system . . . At snapshot root-2016-12-04_18:13:57.35 . . . _EC_ERR_ 1 . . . _EC_ERR_ 141 (2016-12-26 10:54:28) End btrfsManage . . . End managing SEND ' / ' filesystem ' root ' snapshot in ' /dev/sda2 ' WITH ERRORS Checking filesystem on /dev/sda2 UUID: 44f1de7e-a65b-41ce-8ff4-20f7ed83e106 checking extents ref mismatch on [62408097792 16384] extent item 0, found 1 Backref 62408097792 parent 1060 root 1060 not found in extent tree backpointer mismatch on [62408097792 16384] owner ref check failed [62408097792 16384] ref mismatch on [77565509632 16384] extent item 0, found 1 Backref 77565509632 parent 1060 root 1060 not found in extent tree backpointer mismatch on [77565509632 16384] ]zac[ Backref 77826916352 parent 1060 root 1060 not found in extent tree backpointer mismatch on [77826916352 16384] owner ref check failed [77826916352 16384] ref mismatch on [77853933568 16384] extent item 0, found 1 Backref 77853933568 parent 1060 root 1060 not found in extent tree backpointer mismatch on [77853933568 16384] owner ref check failed [77853933568 16384] checking free space cache checking fs roots warning line 3822 checking csums checking root refs found 135128678400 bytes used err is 0 total csum bytes: 126946572 total tree bytes: 5132206080 total fs tree bytes: 4744757248 total extent tree bytes: 240795648 btree space waste bytes: 914832832 file data blocks allocated: 3311786532864 referenced 703616266240 Is likely that mine is a special case. But a special case, with a code change in other points, can become a problem for many. It's not nice to say, but it seems I have to hope that my problem becomes a problem of many. Meanwhile, I'll find my own workaround of a probable serious btrfs bug. Thank you. Gdb > Hi, > > Probably can try to use "-v" to enable more output print. > A quick look at the send / receive code, it seems a little bit risky. > It seems lack of specific error handlings, and in most cases, return the > same error code. I think it might be helpful, when a transfer succeed, the > command prints the transfer id, source / dest, and a specific "success" > string. > Such output could help the script to figure out if a transfer really > succeed. > > The code is relatively new to me, I did not see retry logic in stream > handling, please correct me if I am wrong about this. So, I am not quite > sure about the transfer behavior, if the system subject to network issues > in heavy workload, in which packets missing or connect issues are not rare. > > Since the test mentioned at the begining deletes the snapshots after a > transfer, while most users keep the middle snapshot even in cascading > transfer, probably the current btrfs and cmds still works for regular > users. > > Thanks, > Xin > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html