* btrfs check inconsistency with raid1, part 1 @ 2015-12-14 4:16 Chris Murphy 2015-12-14 5:48 ` Qu Wenruo 0 siblings, 1 reply; 18+ messages in thread From: Chris Murphy @ 2015-12-14 4:16 UTC (permalink / raw) To: Btrfs BTRFS Part 1= What to do about it? This post. Part 2 = How I got here? I'm still working on the write up, so it's not yet posted. Summary: 2 dev (spinning rust) raid1 for data and metadata. kernel 4.2.6, btrfs-progs 4.2.2 btrfs check with devid 1 and 2 present produces thousands of scary messages, e.g. checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 btrfs check with devid 1 or devid2 separate (the other is missing) produces no such scary messages at all, but instead messages e.g. failed to load free space cache for block group 357585387520 a. This inconsistency is unexpected. b. the 'btrfs check' with combined devices gives no insight to the seriousness of "checksum verify failed" messages, or what the solution is. c. combined or separate+degraded, read-only mounts succeed with no errors in user space or dmesg; only normal mount messages happen. With both devs ro mounted, I was able to completely btrfs send/receive the most recent two ro snapshots comprising 100% (minus stale historical) data on the drive, with zero errors reported. d. no read-write mount attempt has happened since "the incident" which will be detailed in part 2. Details: The full devid1&2 btrfs check is long and not very interesting, so I've put that here: https://drive.google.com/open?id=0B_2Asp8DGjJ9Vjd0VlNYb09LVFU btrfs-show-super shows some differences, values denoted as devid1/devid2. If there's no split, those values are the same for both devids. generation 4924/4923 root 714189258752/714188554240 sys_array_size 129 chunk_root_generation 4918 root_level 1 chunk_root 715141414912 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 1500312748032 bytes_used 537228206080 sectorsize 4096 nodesize 16384 [snip] cache_generation 4924/4923 uuid_tree_generation 4924/4923 [snip] dev_item.total_bytes 750156374016 dev_item.bytes_used 541199433728 Perhaps useful, is at the time of "the incident" this volume was rw mounted, but was being used by a single process only: btrfs send. So it was used as a source. No writes, other than btrfs's own generation increment, were happening. So in theory, this should perhaps be the simplest case of "what do I do now?" and even makes me wonder if a normal rw mount should just fix this up: either btrfs uses generation 4924 and updates all changes from 4923 and 4924 automatically to devid2 so they are now in sync, or it automatically discards generation 4924 from devid1, so both devices are in sync. The workload, circumstances of "the incident", the general purpose of btrfs, and the likelihood a typical user would never have even become aware of "the incident" until much later than I did, makes me strongly feel like Btrfs should be able to completely recover from this, with just a rw mount and eventually the missync'd generations will autocorrect. But I don't know that. And I get essentially no advice from btrfs check results. So. What's the theory in this case? And then does it differ from reality? -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-14 4:16 btrfs check inconsistency with raid1, part 1 Chris Murphy @ 2015-12-14 5:48 ` Qu Wenruo 2015-12-14 7:24 ` Chris Murphy 0 siblings, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2015-12-14 5:48 UTC (permalink / raw) To: Chris Murphy, Btrfs BTRFS Chris Murphy wrote on 2015/12/13 21:16 -0700: > Part 1= What to do about it? This post. > Part 2 = How I got here? I'm still working on the write up, so it's > not yet posted. > > Summary: > > 2 dev (spinning rust) raid1 for data and metadata. > kernel 4.2.6, btrfs-progs 4.2.2 > > btrfs check with devid 1 and 2 present produces thousands of scary > messages, e.g. > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 Checked the full output. The interesting part is, the calculated result is always E4E3BDB6, and wanted is always all 0. I assume E4E3BDB6 is crc32 of all 0 data. If there is a full disk dump, it will be much easier to find where the problem is. But I'm a afraid it won't be possible. At least, 'btrfs-debug-tree -t 2' should help to locate what's wrong with the bytenr in the warning. The good news is, the fs seems to be OK without major problem. As except the csum error, btrfsck doesn't give other error/warning. > > btrfs check with devid 1 or devid2 separate (the other is missing) > produces no such scary messages at all, but instead messages e.g. > failed to load free space cache for block group 357585387520 > > a. This inconsistency is unexpected. > b. the 'btrfs check' with combined devices gives no insight to the > seriousness of "checksum verify failed" messages, or what the solution > is. I guess btrfsck did the wrong device assemble, but that's just my personal guess. And since I can't reproduce in my test environment, it won't be easy to find the root cause. > c. combined or separate+degraded, read-only mounts succeed with no > errors in user space or dmesg; only normal mount messages happen. With > both devs ro mounted, I was able to completely btrfs send/receive the > most recent two ro snapshots comprising 100% (minus stale historical) > data on the drive, with zero errors reported. > d. no read-write mount attempt has happened since "the incident" which > will be detailed in part 2. > > > Details: > > > The full devid1&2 btrfs check is long and not very interesting, so > I've put that here: > https://drive.google.com/open?id=0B_2Asp8DGjJ9Vjd0VlNYb09LVFU > > btrfs-show-super shows some differences, values denoted as > devid1/devid2. If there's no split, those values are the same for both > devids. > > > generation 4924/4923 > root 714189258752/714188554240 > sys_array_size 129 > chunk_root_generation 4918 > root_level 1 > chunk_root 715141414912 > chunk_root_level 1 > log_root 0 > log_root_transid 0 > log_root_level 0 > total_bytes 1500312748032 > bytes_used 537228206080 > sectorsize 4096 > nodesize 16384 > [snip] > cache_generation 4924/4923 > uuid_tree_generation 4924/4923 > [snip] > dev_item.total_bytes 750156374016 > dev_item.bytes_used 541199433728 > > Perhaps useful, is at the time of "the incident" this volume was rw > mounted, but was being used by a single process only: btrfs send. So > it was used as a source. No writes, other than btrfs's own generation > increment, were happening. > > So in theory, this should perhaps be the simplest case of "what do I > do now?" and even makes me wonder if a normal rw mount should just fix > this up: either btrfs uses generation 4924 and updates all changes > from 4923 and 4924 automatically to devid2 so they are now in sync, or > it automatically discards generation 4924 from devid1, so both devices > are in sync. > > The workload, circumstances of "the incident", the general purpose of > btrfs, and the likelihood a typical user would never have even become > aware of "the incident" until much later than I did, makes me strongly > feel like Btrfs should be able to completely recover from this, with > just a rw mount and eventually the missync'd generations will > autocorrect. But I don't know that. And I get essentially no advice > from btrfs check results. > > So. What's the theory in this case? And then does it differ from reality? Personally speaking, it may be a false alert from btrfsck. So in this case, I can't provide much help. If you're brave enough, mount it rw to see what will happen(although it may mount just OK). Thanks, Qu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-14 5:48 ` Qu Wenruo @ 2015-12-14 7:24 ` Chris Murphy 2015-12-14 8:04 ` Qu Wenruo 2015-12-14 11:51 ` Duncan 0 siblings, 2 replies; 18+ messages in thread From: Chris Murphy @ 2015-12-14 7:24 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS Thanks for the reply. On Sun, Dec 13, 2015 at 10:48 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > > Chris Murphy wrote on 2015/12/13 21:16 -0700: >> btrfs check with devid 1 and 2 present produces thousands of scary >> messages, e.g. >> checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > > > Checked the full output. > The interesting part is, the calculated result is always E4E3BDB6, and > wanted is always all 0. > > I assume E4E3BDB6 is crc32 of all 0 data. > > > If there is a full disk dump, it will be much easier to find where the > problem is. > But I'm a afraid it won't be possible. What is a full disk dump? I can try to see if it's possible. Main thing though is only if it can make Btrfs overall better, because I don't need this volume repaired, there's no data loss (backups!) so this volume's purpose now is for study. > At least, 'btrfs-debug-tree -t 2' should help to locate what's wrong with > the bytenr in the warning. Both devs attached (not mounted). [root@f23a ~]# btrfs-debug-tree -t 2 /dev/sdb > btrfsdebugtreet2_verb.txt checksum verify failed on 714189570048 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189570048 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189471744 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189471744 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189750272 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189750272 found E4E3BDB6 wanted 00000000 https://drive.google.com/open?id=0B_2Asp8DGjJ9NUdmdXZFQ1Myek0 > > > The good news is, the fs seems to be OK without major problem. > As except the csum error, btrfsck doesn't give other error/warning. Yes, I think so. Main issue here seems to be the scary warnings and uncertainty what the user should do next, if anything at all. > I guess btrfsck did the wrong device assemble, but that's just my personal > guess. > And since I can't reproduce in my test environment, it won't be easy to find > the root cause. It might be reproducible. More on that in the next email. Easy to get you remote access if useful. >> So. What's the theory in this case? And then does it differ from reality? > > > Personally speaking, it may be a false alert from btrfsck. > So in this case, I can't provide much help. > > If you're brave enough, mount it rw to see what will happen(although it may > mount just OK). I'm brave enough. I'll give it a try tomorrow unless there's another request for more info before then. -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-14 7:24 ` Chris Murphy @ 2015-12-14 8:04 ` Qu Wenruo 2015-12-14 17:59 ` Chris Murphy 2015-12-14 11:51 ` Duncan 1 sibling, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2015-12-14 8:04 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS Chris Murphy wrote on 2015/12/14 00:24 -0700: > Thanks for the reply. > > > On Sun, Dec 13, 2015 at 10:48 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> >> >> Chris Murphy wrote on 2015/12/13 21:16 -0700: >>> btrfs check with devid 1 and 2 present produces thousands of scary >>> messages, e.g. >>> checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 >> >> >> Checked the full output. >> The interesting part is, the calculated result is always E4E3BDB6, and >> wanted is always all 0. >> >> I assume E4E3BDB6 is crc32 of all 0 data. >> >> >> If there is a full disk dump, it will be much easier to find where the >> problem is. >> But I'm a afraid it won't be possible. > > What is a full disk dump? I can try to see if it's possible. Just a dd dump. dd if=<disk1> of=disk1.img bs=1M > Main > thing though is only if it can make Btrfs overall better, because I > don't need this volume repaired, there's no data loss (backups!) so > this volume's purpose now is for study. But please also consider your privacy before doing this. And more important thing is the size... Considering how large your -t 2 dump is, I won't ever try to do the dump even I have enough spare space to contain the image, it won't be an easy thing to find a place to upload them. > > >> At least, 'btrfs-debug-tree -t 2' should help to locate what's wrong with >> the bytenr in the warning. > > Both devs attached (not mounted). > > [root@f23a ~]# btrfs-debug-tree -t 2 /dev/sdb > btrfsdebugtreet2_verb.txt > checksum verify failed on 714189570048 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189570048 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189471744 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189471744 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189750272 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189750272 found E4E3BDB6 wanted 00000000 > > https://drive.google.com/open?id=0B_2Asp8DGjJ9NUdmdXZFQ1Myek0 > Got the result, and things is very interesting. It seems all these tree blocks (search by the bytenr) shares the same crc32 by coincidence. Or we won't be able to read them all (and their contents all seems valid). I hope if I can have some raw blocks dump of that bytenr. Here is the procedure: $ btrfs-map-logical -l <LOGICAL> -n 16384 -c 2 <DEVICE1or2> mirror 1 logical <LOGICAL> physical XXXXXXXX device <DEVICE1> mirror 2 logical <LOGICAL> physical YYYYYYYY device <DEVICE2> $ dd if=<DEVICE1> of=dev1_<LOGICAL>.img bs=1 count=16384 skip=XXXXXXX $ dd if=<DEVICE2> of=dev2_<LOGICAL>.img bs=1 count=16384 skip=YYYYYYY In your output, there are 12 different bytenr, but the most interesting ones are *714189357056* and *714189471744*. They are extent tree blocks. If they are really broken, btrfsck should complain about it. Others are mostly csum tree block, less interesting. And unlike the super large disk dump, it's very small, exactly 16K each. 64K in total. > >> >> >> The good news is, the fs seems to be OK without major problem. >> As except the csum error, btrfsck doesn't give other error/warning. > > Yes, I think so. Main issue here seems to be the scary warnings and > uncertainty what the user should do next, if anything at all. > >> I guess btrfsck did the wrong device assemble, but that's just my personal >> guess. >> And since I can't reproduce in my test environment, it won't be easy to find >> the root cause. > > It might be reproducible. More on that in the next email. Easy to get > you remote access if useful. > > >>> So. What's the theory in this case? And then does it differ from reality? >> >> >> Personally speaking, it may be a false alert from btrfsck. >> So in this case, I can't provide much help. >> >> If you're brave enough, mount it rw to see what will happen(although it may >> mount just OK). > > I'm brave enough. I'll give it a try tomorrow unless there's another > request for more info before then. > > Great! Thanks, Qu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-14 8:04 ` Qu Wenruo @ 2015-12-14 17:59 ` Chris Murphy 2015-12-20 22:32 ` Chris Murphy [not found] ` <CAJCQCtSEx_wYPkfazik0bcpQwXxJCA=O5f0o6RbxON4jjB4q7A@mail.gmail.com> 0 siblings, 2 replies; 18+ messages in thread From: Chris Murphy @ 2015-12-14 17:59 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 2766 bytes --] On Mon, Dec 14, 2015 at 1:04 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > > Chris Murphy wrote on 2015/12/14 00:24 -0700: >> What is a full disk dump? I can try to see if it's possible. > > > Just a dd dump. OK, yeah. That's 750GB per drive. >t won't be an easy > thing to find a place to upload them. Right. I have no ideas. I'll give you the rest of what you asked for, and won't do the rw mount yet in case you need more. > Got the result, and things is very interesting. > > It seems all these tree blocks (search by the bytenr) shares the same crc32 > by coincidence. > Or we won't be able to read them all (and their contents all seems valid). > > > I hope if I can have some raw blocks dump of that bytenr. > Here is the procedure: > $ btrfs-map-logical -l <LOGICAL> -n 16384 -c 2 <DEVICE1or2> > mirror 1 logical <LOGICAL> physical XXXXXXXX device <DEVICE1> > mirror 2 logical <LOGICAL> physical YYYYYYYY device <DEVICE2> Option -n is invalid, I'll use option -b. ##btrfs fi show has this mapping, seems opposite from btrfs-map-logical (although it uses the term mirror rather than devid). So I will use devid and ignore mirror number. /dev/sdb = devid1 /dev/sdc = devid2 # btrfs-map-logical -l 714189357056 -b 16384 -c 2 /dev/sdb checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 mirror 1 logical 714189357056 physical 356605018112 device /dev/sdc mirror 2 logical 714189357056 physical 3380658176 device /dev/sdb # btrfs-map-logical -l 714189471744 -b 16384 -c 2 /dev/sdb checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 mirror 1 logical 714189471744 physical 356605132800 device /dev/sdc mirror 2 logical 714189471744 physical 3380772864 device /dev/sdb > > $ dd if=<DEVICE1> of=dev1_<LOGICAL>.img bs=1 count=16384 skip=XXXXXXX > $ dd if=<DEVICE2> of=dev2_<LOGICAL>.img bs=1 count=16384 skip=YYYYYYY > > In your output, there are 12 different bytenr, but the most interesting ones > are *714189357056* and *714189471744*. dd if=/dev/sdb of=dev1_714189357056.img bs=1 count=16384 skip=3380658176 dd if=/dev/sdc of=dev2_714189357056.img bs=1 count=16384 skip=356605018112 dd if=/dev/sdb of=dev1_714189471744.img bs=1 count=16384 skip=3380772864 dd if=/dev/sdc of=dev2_714189471744.img bs=1 count=16384 skip=356605132800 Files are attached to this email. -- Chris Murphy [-- Attachment #2: dev2_714189471744.img --] [-- Type: application/x-raw-disk-image, Size: 16384 bytes --] [-- Attachment #3: dev2_714189357056.img --] [-- Type: application/x-raw-disk-image, Size: 16384 bytes --] [-- Attachment #4: dev1_714189471744.img --] [-- Type: application/x-raw-disk-image, Size: 16384 bytes --] [-- Attachment #5: dev1_714189357056.img --] [-- Type: application/x-raw-disk-image, Size: 16384 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-14 17:59 ` Chris Murphy @ 2015-12-20 22:32 ` Chris Murphy [not found] ` <CAJCQCtSEx_wYPkfazik0bcpQwXxJCA=O5f0o6RbxON4jjB4q7A@mail.gmail.com> 1 sibling, 0 replies; 18+ messages in thread From: Chris Murphy @ 2015-12-20 22:32 UTC (permalink / raw) To: Btrfs BTRFS On Mon, Dec 14, 2015 at 10:59 AM, Chris Murphy <lists@colorremedies.com> wrote: > On Mon, Dec 14, 2015 at 1:04 AM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> >> >> Chris Murphy wrote on 2015/12/14 00:24 -0700: >>> What is a full disk dump? I can try to see if it's possible. >> >> >> Just a dd dump. > > OK, yeah. That's 750GB per drive. > >>t won't be an easy >> thing to find a place to upload them. > > Right. I have no ideas. I'll give you the rest of what you asked for, > and won't do the rw mount yet in case you need more. > > >> Got the result, and things is very interesting. >> >> It seems all these tree blocks (search by the bytenr) shares the same crc32 >> by coincidence. >> Or we won't be able to read them all (and their contents all seems valid). >> >> >> I hope if I can have some raw blocks dump of that bytenr. >> Here is the procedure: >> $ btrfs-map-logical -l <LOGICAL> -n 16384 -c 2 <DEVICE1or2> >> mirror 1 logical <LOGICAL> physical XXXXXXXX device <DEVICE1> >> mirror 2 logical <LOGICAL> physical YYYYYYYY device <DEVICE2> > > Option -n is invalid, I'll use option -b. > > ##btrfs fi show has this mapping, seems opposite from > btrfs-map-logical (although it uses the term mirror rather than > devid). So I will use devid and ignore mirror number. > /dev/sdb = devid1 > /dev/sdc = devid2 > > > # btrfs-map-logical -l 714189357056 -b 16384 -c 2 /dev/sdb > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > mirror 1 logical 714189357056 physical 356605018112 device /dev/sdc > mirror 2 logical 714189357056 physical 3380658176 device /dev/sdb > > > > # btrfs-map-logical -l 714189471744 -b 16384 -c 2 /dev/sdb > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > checksum verify failed on 714189357056 found E4E3BDB6 wanted 00000000 > mirror 1 logical 714189471744 physical 356605132800 device /dev/sdc > mirror 2 logical 714189471744 physical 3380772864 device /dev/sdb > > >> >> $ dd if=<DEVICE1> of=dev1_<LOGICAL>.img bs=1 count=16384 skip=XXXXXXX >> $ dd if=<DEVICE2> of=dev2_<LOGICAL>.img bs=1 count=16384 skip=YYYYYYY >> >> In your output, there are 12 different bytenr, but the most interesting ones >> are *714189357056* and *714189471744*. > > > dd if=/dev/sdb of=dev1_714189357056.img bs=1 count=16384 skip=3380658176 > dd if=/dev/sdc of=dev2_714189357056.img bs=1 count=16384 skip=356605018112 > > dd if=/dev/sdb of=dev1_714189471744.img bs=1 count=16384 skip=3380772864 > dd if=/dev/sdc of=dev2_714189471744.img bs=1 count=16384 skip=356605132800 > > Files are attached to this email. > Hi Qu, any insight with these attachements? I will likely try a normal rw mount once 4.4.0rc6 is done and built in Fedora's koji (24-48 hours). If that goes OK I'll try some reads and see if that triggers any problems, and if there are no problems then I'll do some writes and see if the two device generations end up back in sync. If there continue to be no complaints, I'll do a scrub and we'll see if that notices anything or fixes things or what. I think the cause is related to bus power with buggy USB 3 LPM firmware (these enclosures are cheap maybe $6). I've found some threads about this being a problem, but it's not expected to cause any corruptions. So, the fact Btrfs picks up one some problems might prove that (somewhat) incorrect. http://permalink.gmane.org/gmane.linux.usb.general/105502 http://www.spinics.net/lists/linux-usb/msg108949.html I have the same exactly enclosure mentioned in the 2nd link (which is the last email in the thread, with no real resolution). The usb reset messages never happen when the same enclosure+drive is attached to a 1.5A USB connector on the NUC. It only happens (with two of the same model enclosures with different drive make/models) on the standard USB connectors on the Intel NUC. But I have a hard time believing a laptop drive needs more than 900mA continuously, rather than just at spin up time. -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
[parent not found: <CAJCQCtSEx_wYPkfazik0bcpQwXxJCA=O5f0o6RbxON4jjB4q7A@mail.gmail.com>]
[parent not found: <5677592F.5000202@cn.fujitsu.com>]
* Re: btrfs check inconsistency with raid1, part 1 [not found] ` <5677592F.5000202@cn.fujitsu.com> @ 2015-12-21 2:12 ` Chris Murphy 2015-12-21 2:23 ` Qu Wenruo 0 siblings, 1 reply; 18+ messages in thread From: Chris Murphy @ 2015-12-21 2:12 UTC (permalink / raw) To: Qu Wenruo; +Cc: Btrfs BTRFS [-- Attachment #1: Type: text/plain, Size: 911 bytes --] On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > > Chris Murphy wrote on 2015/12/20 15:31 -0700: >> I think the cause is related to bus power with buggy USB 3 LPM >> firmware (these enclosures are cheap maybe $6). I've found some >> threads about this being a problem, but it's not expected to cause any >> corruptions. So, the fact Btrfs picks up one some problems might prove >> that (somewhat) incorrect. > > > Seems possible. Maybe some metadata just failed to reach disk. > BTW, did I asked for a btrfs-show-super output? Nope. I will attach to this email below for both devices. > If that's the case, superblock on device 2 maybe older than superblock on > device 1. Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923. And it's devid 2 that had device reset and write errors when it vanished and reappeared as a different block device. -- Chris Murphy [-- Attachment #2: btrfsshowsuper_devid1.txt --] [-- Type: text/plain, Size: 9711 bytes --] [liveuser@localhost ~]$ sudo btrfs-show-super -af /dev/sdc superblock: bytenr=65536, device=/dev/sdc --------------------------------------------------------- csum 0x93333bd8 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 197606b2-9f4a-4742-8824-7fc93285c29c label verb generation 4924 root 714189258752 sys_array_size 129 chunk_root_generation 4918 root_level 1 chunk_root 715141414912 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 1500312748032 bytes_used 537228206080 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 4924 uuid_tree_generation 4924 dev_item.uuid 94c62352-2568-4abe-8a58-828d1766719c dev_item.fsid 197606b2-9f4a-4742-8824-7fc93285c29c [match] dev_item.type 0 dev_item.total_bytes 750156374016 dev_item.bytes_used 541199433728 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 715141414912) chunk length 33554432 owner 2 stripe_len 65536 type SYSTEM|RAID1 num_stripes 2 stripe 0 devid 2 offset 357557075968 dev uuid: f98143e4-24a2-4a2a-8dbf-2871c75f7b78 stripe 1 devid 1 offset 2185232384 dev uuid: 94c62352-2568-4abe-8a58-828d1766719c backup_roots[4]: backup 0: backup_tree_root: 714616012800 gen: 4921 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714190635008 gen: 4921 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714186326016 gen: 4921 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 1: backup_tree_root: 714186997760 gen: 4922 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714187014144 gen: 4922 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714187505664 gen: 4922 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 2: backup_tree_root: 714188554240 gen: 4923 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714188505088 gen: 4923 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714188668928 gen: 4923 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 3: backup_tree_root: 714189258752 gen: 4924 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714189324288 gen: 4924 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714189422592 gen: 4924 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 superblock: bytenr=67108864, device=/dev/sdc --------------------------------------------------------- csum 0x33521316 [match] bytenr 67108864 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 197606b2-9f4a-4742-8824-7fc93285c29c label verb generation 4924 root 714189258752 sys_array_size 129 chunk_root_generation 4918 root_level 1 chunk_root 715141414912 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 1500312748032 bytes_used 537228206080 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 4924 uuid_tree_generation 4924 dev_item.uuid 94c62352-2568-4abe-8a58-828d1766719c dev_item.fsid 197606b2-9f4a-4742-8824-7fc93285c29c [match] dev_item.type 0 dev_item.total_bytes 750156374016 dev_item.bytes_used 541199433728 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 715141414912) chunk length 33554432 owner 2 stripe_len 65536 type SYSTEM|RAID1 num_stripes 2 stripe 0 devid 2 offset 357557075968 dev uuid: f98143e4-24a2-4a2a-8dbf-2871c75f7b78 stripe 1 devid 1 offset 2185232384 dev uuid: 94c62352-2568-4abe-8a58-828d1766719c backup_roots[4]: backup 0: backup_tree_root: 714616012800 gen: 4921 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714190635008 gen: 4921 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714186326016 gen: 4921 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 1: backup_tree_root: 714186997760 gen: 4922 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714187014144 gen: 4922 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714187505664 gen: 4922 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 2: backup_tree_root: 714188554240 gen: 4923 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714188505088 gen: 4923 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714188668928 gen: 4923 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 3: backup_tree_root: 714189258752 gen: 4924 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714189324288 gen: 4924 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714189422592 gen: 4924 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 superblock: bytenr=274877906944, device=/dev/sdc --------------------------------------------------------- csum 0xced54527 [match] bytenr 274877906944 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 197606b2-9f4a-4742-8824-7fc93285c29c label verb generation 4924 root 714189258752 sys_array_size 129 chunk_root_generation 4918 root_level 1 chunk_root 715141414912 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 1500312748032 bytes_used 537228206080 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 4924 uuid_tree_generation 4924 dev_item.uuid 94c62352-2568-4abe-8a58-828d1766719c dev_item.fsid 197606b2-9f4a-4742-8824-7fc93285c29c [match] dev_item.type 0 dev_item.total_bytes 750156374016 dev_item.bytes_used 541199433728 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 1 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 715141414912) chunk length 33554432 owner 2 stripe_len 65536 type SYSTEM|RAID1 num_stripes 2 stripe 0 devid 2 offset 357557075968 dev uuid: f98143e4-24a2-4a2a-8dbf-2871c75f7b78 stripe 1 devid 1 offset 2185232384 dev uuid: 94c62352-2568-4abe-8a58-828d1766719c backup_roots[4]: backup 0: backup_tree_root: 714616012800 gen: 4921 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714190635008 gen: 4921 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714186326016 gen: 4921 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 1: backup_tree_root: 714186997760 gen: 4922 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714187014144 gen: 4922 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714187505664 gen: 4922 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 2: backup_tree_root: 714188554240 gen: 4923 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714188505088 gen: 4923 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714188668928 gen: 4923 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 3: backup_tree_root: 714189258752 gen: 4924 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714189324288 gen: 4924 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714189422592 gen: 4924 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 [-- Attachment #3: btrfsshowsuper_devid2.txt --] [-- Type: text/plain, Size: 9703 bytes --] [chris@f23m ~]$ sudo btrfs-show-super -af /dev/sdb superblock: bytenr=65536, device=/dev/sdb --------------------------------------------------------- csum 0x3364e6b8 [match] bytenr 65536 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 197606b2-9f4a-4742-8824-7fc93285c29c label verb generation 4923 root 714188554240 sys_array_size 129 chunk_root_generation 4918 root_level 1 chunk_root 715141414912 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 1500312748032 bytes_used 537228206080 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 4923 uuid_tree_generation 4923 dev_item.uuid f98143e4-24a2-4a2a-8dbf-2871c75f7b78 dev_item.fsid 197606b2-9f4a-4742-8824-7fc93285c29c [match] dev_item.type 0 dev_item.total_bytes 750156374016 dev_item.bytes_used 541199433728 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 2 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 715141414912) chunk length 33554432 owner 2 stripe_len 65536 type SYSTEM|RAID1 num_stripes 2 stripe 0 devid 2 offset 357557075968 dev uuid: f98143e4-24a2-4a2a-8dbf-2871c75f7b78 stripe 1 devid 1 offset 2185232384 dev uuid: 94c62352-2568-4abe-8a58-828d1766719c backup_roots[4]: backup 0: backup_tree_root: 714616012800 gen: 4921 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714190635008 gen: 4921 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714186326016 gen: 4921 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 1: backup_tree_root: 714186997760 gen: 4922 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714187014144 gen: 4922 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714187505664 gen: 4922 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 2: backup_tree_root: 714188554240 gen: 4923 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714188505088 gen: 4923 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714188668928 gen: 4923 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 3: backup_tree_root: 809898442752 gen: 4920 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 809898459136 gen: 4920 level: 2 backup_fs_root: 810253713408 gen: 4805 level: 0 backup_dev_root: 809896886272 gen: 4918 level: 1 backup_csum_root: 809898557440 gen: 4920 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 superblock: bytenr=67108864, device=/dev/sdb --------------------------------------------------------- csum 0x9305ce76 [match] bytenr 67108864 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 197606b2-9f4a-4742-8824-7fc93285c29c label verb generation 4923 root 714188554240 sys_array_size 129 chunk_root_generation 4918 root_level 1 chunk_root 715141414912 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 1500312748032 bytes_used 537228206080 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 4923 uuid_tree_generation 4923 dev_item.uuid f98143e4-24a2-4a2a-8dbf-2871c75f7b78 dev_item.fsid 197606b2-9f4a-4742-8824-7fc93285c29c [match] dev_item.type 0 dev_item.total_bytes 750156374016 dev_item.bytes_used 541199433728 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 2 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 715141414912) chunk length 33554432 owner 2 stripe_len 65536 type SYSTEM|RAID1 num_stripes 2 stripe 0 devid 2 offset 357557075968 dev uuid: f98143e4-24a2-4a2a-8dbf-2871c75f7b78 stripe 1 devid 1 offset 2185232384 dev uuid: 94c62352-2568-4abe-8a58-828d1766719c backup_roots[4]: backup 0: backup_tree_root: 714616012800 gen: 4921 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714190635008 gen: 4921 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714186326016 gen: 4921 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 1: backup_tree_root: 714186997760 gen: 4922 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714187014144 gen: 4922 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714187505664 gen: 4922 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 2: backup_tree_root: 714188554240 gen: 4923 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714188505088 gen: 4923 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714188668928 gen: 4923 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 3: backup_tree_root: 809898442752 gen: 4920 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 809898459136 gen: 4920 level: 2 backup_fs_root: 810253713408 gen: 4805 level: 0 backup_dev_root: 809896886272 gen: 4918 level: 1 backup_csum_root: 809898557440 gen: 4920 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 superblock: bytenr=274877906944, device=/dev/sdb --------------------------------------------------------- csum 0x6e829847 [match] bytenr 274877906944 flags 0x1 ( WRITTEN ) magic _BHRfS_M [match] fsid 197606b2-9f4a-4742-8824-7fc93285c29c label verb generation 4923 root 714188554240 sys_array_size 129 chunk_root_generation 4918 root_level 1 chunk_root 715141414912 chunk_root_level 1 log_root 0 log_root_transid 0 log_root_level 0 total_bytes 1500312748032 bytes_used 537228206080 sectorsize 4096 nodesize 16384 leafsize 16384 stripesize 4096 root_dir 6 num_devices 2 compat_flags 0x0 compat_ro_flags 0x0 incompat_flags 0x161 ( MIXED_BACKREF | BIG_METADATA | EXTENDED_IREF | SKINNY_METADATA ) csum_type 0 csum_size 4 cache_generation 4923 uuid_tree_generation 4923 dev_item.uuid f98143e4-24a2-4a2a-8dbf-2871c75f7b78 dev_item.fsid 197606b2-9f4a-4742-8824-7fc93285c29c [match] dev_item.type 0 dev_item.total_bytes 750156374016 dev_item.bytes_used 541199433728 dev_item.io_align 4096 dev_item.io_width 4096 dev_item.sector_size 4096 dev_item.devid 2 dev_item.dev_group 0 dev_item.seek_speed 0 dev_item.bandwidth 0 dev_item.generation 0 sys_chunk_array[2048]: item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 715141414912) chunk length 33554432 owner 2 stripe_len 65536 type SYSTEM|RAID1 num_stripes 2 stripe 0 devid 2 offset 357557075968 dev uuid: f98143e4-24a2-4a2a-8dbf-2871c75f7b78 stripe 1 devid 1 offset 2185232384 dev uuid: 94c62352-2568-4abe-8a58-828d1766719c backup_roots[4]: backup 0: backup_tree_root: 714616012800 gen: 4921 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714190635008 gen: 4921 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714186326016 gen: 4921 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 1: backup_tree_root: 714186997760 gen: 4922 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714187014144 gen: 4922 level: 2 backup_fs_root: 714186096640 gen: 4921 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714187505664 gen: 4922 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 2: backup_tree_root: 714188554240 gen: 4923 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 714188505088 gen: 4923 level: 2 backup_fs_root: 714188488704 gen: 4923 level: 0 backup_dev_root: 715082776576 gen: 4921 level: 1 backup_csum_root: 714188668928 gen: 4923 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 backup 3: backup_tree_root: 809898442752 gen: 4920 level: 1 backup_chunk_root: 715141414912 gen: 4918 level: 1 backup_extent_root: 809898459136 gen: 4920 level: 2 backup_fs_root: 810253713408 gen: 4805 level: 0 backup_dev_root: 809896886272 gen: 4918 level: 1 backup_csum_root: 809898557440 gen: 4920 level: 2 backup_total_bytes: 1500312748032 backup_bytes_used: 537228206080 backup_num_devices: 2 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-21 2:12 ` Chris Murphy @ 2015-12-21 2:23 ` Qu Wenruo 2015-12-21 2:46 ` Chris Murphy 2015-12-22 1:05 ` Kai Krakow 0 siblings, 2 replies; 18+ messages in thread From: Qu Wenruo @ 2015-12-21 2:23 UTC (permalink / raw) To: Chris Murphy; +Cc: Btrfs BTRFS Chris Murphy wrote on 2015/12/20 19:12 -0700: > On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: >> >> >> Chris Murphy wrote on 2015/12/20 15:31 -0700: > >>> I think the cause is related to bus power with buggy USB 3 LPM >>> firmware (these enclosures are cheap maybe $6). I've found some >>> threads about this being a problem, but it's not expected to cause any >>> corruptions. So, the fact Btrfs picks up one some problems might prove >>> that (somewhat) incorrect. >> >> >> Seems possible. Maybe some metadata just failed to reach disk. >> BTW, did I asked for a btrfs-show-super output? > > Nope. I will attach to this email below for both devices. > >> If that's the case, superblock on device 2 maybe older than superblock on >> device 1. > > Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923. And > it's devid 2 that had device reset and write errors when it vanished > and reappeared as a different block device. > Now all the problem is explained. You should be good to mount it rw, as RAID1 will handle all the problem. Then you can either use scrub on dev2 to fix all the generation mismatch. Although I prefer to wipe dev2 and mount dev1 as degraded, and replace the missing dev2 with a good device/usb port. Thanks, Qu ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-21 2:23 ` Qu Wenruo @ 2015-12-21 2:46 ` Chris Murphy 2015-12-22 1:05 ` Kai Krakow 1 sibling, 0 replies; 18+ messages in thread From: Chris Murphy @ 2015-12-21 2:46 UTC (permalink / raw) To: Qu Wenruo; +Cc: Chris Murphy, Btrfs BTRFS On Sun, Dec 20, 2015 at 7:23 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> wrote: > > > Chris Murphy wrote on 2015/12/20 19:12 -0700: >> >> On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo <quwenruo@cn.fujitsu.com> >> wrote: >>> >>> >>> >>> Chris Murphy wrote on 2015/12/20 15:31 -0700: >> >> >>>> I think the cause is related to bus power with buggy USB 3 LPM >>>> firmware (these enclosures are cheap maybe $6). I've found some >>>> threads about this being a problem, but it's not expected to cause any >>>> corruptions. So, the fact Btrfs picks up one some problems might prove >>>> that (somewhat) incorrect. >>> >>> >>> >>> Seems possible. Maybe some metadata just failed to reach disk. >>> BTW, did I asked for a btrfs-show-super output? >> >> >> Nope. I will attach to this email below for both devices. >> >>> If that's the case, superblock on device 2 maybe older than superblock on >>> device 1. >> >> >> Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923. And >> it's devid 2 that had device reset and write errors when it vanished >> and reappeared as a different block device. >> > > Now all the problem is explained. > > You should be good to mount it rw, as RAID1 will handle all the problem. > Then you can either use scrub on dev2 to fix all the generation mismatch. > > Although I prefer to wipe dev2 and mount dev1 as degraded, and replace the > missing dev2 with a good device/usb port. Yeah. Best info I have right now is this particular make/model of USB 3.0 enclosure is common and sometimes has this reset and vanish problem with only certain controllers. In my case all four of the same kind of enclosure does this but only with 900mA ports. There's never a problem with 1.5A ports. I think it's just a slightly out of spec product. But usb-storage kernel developers said the warnings shouldn't result in corruptions. Another user with the same enclosure reported the problem only happens on Linux, not Windows, on the same host hardware. So it could also be some Linux SCSI layer error handling that's not working around a pre-existing issue when the device is flaky. Thanks! -- Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-21 2:23 ` Qu Wenruo 2015-12-21 2:46 ` Chris Murphy @ 2015-12-22 1:05 ` Kai Krakow 2015-12-22 1:22 ` Qu Wenruo 1 sibling, 1 reply; 18+ messages in thread From: Kai Krakow @ 2015-12-22 1:05 UTC (permalink / raw) To: linux-btrfs Am Mon, 21 Dec 2015 10:23:31 +0800 schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>: > > > Chris Murphy wrote on 2015/12/20 19:12 -0700: > > On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo > > <quwenruo@cn.fujitsu.com> wrote: > >> > >> > >> Chris Murphy wrote on 2015/12/20 15:31 -0700: > > > >>> I think the cause is related to bus power with buggy USB 3 LPM > >>> firmware (these enclosures are cheap maybe $6). I've found some > >>> threads about this being a problem, but it's not expected to > >>> cause any corruptions. So, the fact Btrfs picks up one some > >>> problems might prove that (somewhat) incorrect. > >> > >> > >> Seems possible. Maybe some metadata just failed to reach disk. > >> BTW, did I asked for a btrfs-show-super output? > > > > Nope. I will attach to this email below for both devices. > > > >> If that's the case, superblock on device 2 maybe older than > >> superblock on device 1. > > > > Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923. And > > it's devid 2 that had device reset and write errors when it vanished > > and reappeared as a different block device. > > > > Now all the problem is explained. > > You should be good to mount it rw, as RAID1 will handle all the > problem. How should RAID1 handle this if both copies have valid checksums (as I would assume here unless shown otherwise)? This is an even bigger problem with block based RAID1 which does not have checksums at all. Luckily, btrfs works different here. > Then you can either use scrub on dev2 to fix all the > generation mismatch. I better understand why this could fix a problem... > Although I prefer to wipe dev2 and mount dev1 as degraded, and > replace the missing dev2 with a good device/usb port. Given the assumption above I'd do that, too (but check if the "original" has no block errors before discarding the mirror). -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-22 1:05 ` Kai Krakow @ 2015-12-22 1:22 ` Qu Wenruo 2015-12-22 1:48 ` Kai Krakow 0 siblings, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2015-12-22 1:22 UTC (permalink / raw) To: Kai Krakow, linux-btrfs Kai Krakow wrote on 2015/12/22 02:05 +0100: > Am Mon, 21 Dec 2015 10:23:31 +0800 > schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>: > >> >> >> Chris Murphy wrote on 2015/12/20 19:12 -0700: >>> On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo >>> <quwenruo@cn.fujitsu.com> wrote: >>>> >>>> >>>> Chris Murphy wrote on 2015/12/20 15:31 -0700: >>> >>>>> I think the cause is related to bus power with buggy USB 3 LPM >>>>> firmware (these enclosures are cheap maybe $6). I've found some >>>>> threads about this being a problem, but it's not expected to >>>>> cause any corruptions. So, the fact Btrfs picks up one some >>>>> problems might prove that (somewhat) incorrect. >>>> >>>> >>>> Seems possible. Maybe some metadata just failed to reach disk. >>>> BTW, did I asked for a btrfs-show-super output? >>> >>> Nope. I will attach to this email below for both devices. >>> >>>> If that's the case, superblock on device 2 maybe older than >>>> superblock on device 1. >>> >>> Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923. And >>> it's devid 2 that had device reset and write errors when it vanished >>> and reappeared as a different block device. >>> >> >> Now all the problem is explained. >> >> You should be good to mount it rw, as RAID1 will handle all the >> problem. > > How should RAID1 handle this if both copies have valid checksums (as I > would assume here unless shown otherwise)? This is an even bigger > problem with block based RAID1 which does not have checksums at all. > Luckily, btrfs works different here. No, these two devices don't have the same generation, which means they point to *different* bytenr. Like the following: Super of Dev1: gen: X + 1 root bytenr: A (Btrfs logical) logical A is mapped to A1 on dev1 and A2 on dev2. Super of Dev2: gen: X root bytenr: B Here we don't need to bother bytenr B though. Due to the power bug, A2 and super of dev2 is not written to dev2. So you should see the problem now. A1 on dev1 contains *valid* tree block, but A2 on dev2 doesn't(empty data only). And your assumption on "both have valid copies" is wrong. Check all the 4 attachment in previous mail. > >> Then you can either use scrub on dev2 to fix all the >> generation mismatch. > > I better understand why this could fix a problem... Why not? Tree block/data copy on dev1 is valid, but tree block/data copy on dev2 is empty(not written), so btrfs detects the csum error, and scrub will try to rewrite it. After rewrite, both copy on dev1 and dev2 with match and fix the problem. Thanks, Qu > >> Although I prefer to wipe dev2 and mount dev1 as degraded, and >> replace the missing dev2 with a good device/usb port. > > Given the assumption above I'd do that, too (but check if the > "original" has no block errors before discarding the mirror). > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-22 1:22 ` Qu Wenruo @ 2015-12-22 1:48 ` Kai Krakow 2015-12-22 2:15 ` Qu Wenruo 2015-12-22 10:23 ` Duncan 0 siblings, 2 replies; 18+ messages in thread From: Kai Krakow @ 2015-12-22 1:48 UTC (permalink / raw) To: linux-btrfs Am Tue, 22 Dec 2015 09:22:20 +0800 schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>: > > > Kai Krakow wrote on 2015/12/22 02:05 +0100: > > Am Mon, 21 Dec 2015 10:23:31 +0800 > > schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>: > > > >> > >> > >> Chris Murphy wrote on 2015/12/20 19:12 -0700: > >>> On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo > >>> <quwenruo@cn.fujitsu.com> wrote: > >>>> > >>>> > >>>> Chris Murphy wrote on 2015/12/20 15:31 -0700: > >>> > >>>>> I think the cause is related to bus power with buggy USB 3 LPM > >>>>> firmware (these enclosures are cheap maybe $6). I've found some > >>>>> threads about this being a problem, but it's not expected to > >>>>> cause any corruptions. So, the fact Btrfs picks up one some > >>>>> problems might prove that (somewhat) incorrect. > >>>> > >>>> > >>>> Seems possible. Maybe some metadata just failed to reach disk. > >>>> BTW, did I asked for a btrfs-show-super output? > >>> > >>> Nope. I will attach to this email below for both devices. > >>> > >>>> If that's the case, superblock on device 2 maybe older than > >>>> superblock on device 1. > >>> > >>> Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923. > >>> And it's devid 2 that had device reset and write errors when it > >>> vanished and reappeared as a different block device. > >>> > >> > >> Now all the problem is explained. > >> > >> You should be good to mount it rw, as RAID1 will handle all the > >> problem. > > > > How should RAID1 handle this if both copies have valid checksums > > (as I would assume here unless shown otherwise)? This is an even > > bigger problem with block based RAID1 which does not have checksums > > at all. Luckily, btrfs works different here. > > No, these two devices don't have the same generation, which means > they point to *different* bytenr. > > Like the following: > > Super of Dev1: > gen: X + 1 > root bytenr: A (Btrfs logical) > logical A is mapped to A1 on dev1 and A2 on dev2. > > Super of Dev2: > gen: X > root bytenr: B > Here we don't need to bother bytenr B though. > > Due to the power bug, A2 and super of dev2 is not written to dev2. > > So you should see the problem now. > A1 on dev1 contains *valid* tree block, but A2 on dev2 doesn't(empty > data only). > > And your assumption on "both have valid copies" is wrong. > > Check all the 4 attachment in previous mail. I did only see those attachments at a second glance. Sry. Primarily I just wanted to note that RAID1 per-se doesn't mean anything more than: we have two readable copies but we don't know which one is correct. As in: let the admin think twice about it before blindly following a guide. This is why I pointed out btrfs csums which make this a little better which in turn has further consequences as you describe (for the treeblock). In contrast to block-level RAID btrfs usually has the knowledge which block is correct and which is not. I just wondered if btrfs allows for the case where both stripes could have valid checksums despite of btrfs-RAID - just because a failure occurred right on the spot. Is this possible? What happens then? If yes, it would mean not to blindly trust the RAID without doing the homeworks. > >> Then you can either use scrub on dev2 to fix all the > >> generation mismatch. > > > > I better understand why this could fix a problem... > > Why not? > > Tree block/data copy on dev1 is valid, but tree block/data copy on > dev2 is empty(not written), so btrfs detects the csum error, and > scrub will try to rewrite it. > > After rewrite, both copy on dev1 and dev2 with match and fix the > problem. Exactly. ;-) Didn't say anything against it. -- Regards, Kai Replies to list-only preferred. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-22 1:48 ` Kai Krakow @ 2015-12-22 2:15 ` Qu Wenruo 2015-12-22 4:21 ` Chris Murphy 2015-12-22 10:23 ` Duncan 1 sibling, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2015-12-22 2:15 UTC (permalink / raw) To: Kai Krakow, linux-btrfs Kai Krakow wrote on 2015/12/22 02:48 +0100: > Am Tue, 22 Dec 2015 09:22:20 +0800 > schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>: > >> >> >> Kai Krakow wrote on 2015/12/22 02:05 +0100: >>> Am Mon, 21 Dec 2015 10:23:31 +0800 >>> schrieb Qu Wenruo <quwenruo@cn.fujitsu.com>: >>> >>>> >>>> >>>> Chris Murphy wrote on 2015/12/20 19:12 -0700: >>>>> On Sun, Dec 20, 2015 at 6:43 PM, Qu Wenruo >>>>> <quwenruo@cn.fujitsu.com> wrote: >>>>>> >>>>>> >>>>>> Chris Murphy wrote on 2015/12/20 15:31 -0700: >>>>> >>>>>>> I think the cause is related to bus power with buggy USB 3 LPM >>>>>>> firmware (these enclosures are cheap maybe $6). I've found some >>>>>>> threads about this being a problem, but it's not expected to >>>>>>> cause any corruptions. So, the fact Btrfs picks up one some >>>>>>> problems might prove that (somewhat) incorrect. >>>>>> >>>>>> >>>>>> Seems possible. Maybe some metadata just failed to reach disk. >>>>>> BTW, did I asked for a btrfs-show-super output? >>>>> >>>>> Nope. I will attach to this email below for both devices. >>>>> >>>>>> If that's the case, superblock on device 2 maybe older than >>>>>> superblock on device 1. >>>>> >>>>> Yes, looks iike devid 1 transid 4924, and devid 2 transid 4923. >>>>> And it's devid 2 that had device reset and write errors when it >>>>> vanished and reappeared as a different block device. >>>>> >>>> >>>> Now all the problem is explained. >>>> >>>> You should be good to mount it rw, as RAID1 will handle all the >>>> problem. >>> >>> How should RAID1 handle this if both copies have valid checksums >>> (as I would assume here unless shown otherwise)? This is an even >>> bigger problem with block based RAID1 which does not have checksums >>> at all. Luckily, btrfs works different here. >> >> No, these two devices don't have the same generation, which means >> they point to *different* bytenr. >> >> Like the following: >> >> Super of Dev1: >> gen: X + 1 >> root bytenr: A (Btrfs logical) >> logical A is mapped to A1 on dev1 and A2 on dev2. >> >> Super of Dev2: >> gen: X >> root bytenr: B >> Here we don't need to bother bytenr B though. >> >> Due to the power bug, A2 and super of dev2 is not written to dev2. >> >> So you should see the problem now. >> A1 on dev1 contains *valid* tree block, but A2 on dev2 doesn't(empty >> data only). >> >> And your assumption on "both have valid copies" is wrong. >> >> Check all the 4 attachment in previous mail. > > I did only see those attachments at a second glance. Sry. > > Primarily I just wanted to note that RAID1 per-se doesn't mean anything > more than: we have two readable copies but we don't know which one is > correct. As in: let the admin think twice about it before blindly > following a guide. > > This is why I pointed out btrfs csums which make this a little better > which in turn has further consequences as you describe (for the > treeblock). > > In contrast to block-level RAID btrfs usually has the knowledge which > block is correct and which is not. > > I just wondered if btrfs allows for the case where both stripes could > have valid checksums despite of btrfs-RAID - just because a failure > occurred right on the spot. > > Is this possible? What happens then? If yes, it would mean not to > blindly trust the RAID without doing the homeworks. Very interesting question. Although btrfs is a little beyond your expectation on block based RAID1. 1) Yes, it is possible. 2) Btrfs still detects it as an transid error and won't trust the metadata.(kernel behavior) And since it's raid1, it will try next copy to go on. The trick here is, btrfs metadata doesn't only record bytenr of its child tree block, but also the tranid(generation) of the tree block. So even such case happens, the transid won't match, and cause btrfs detects the error. Thanks, Qu > >>>> Then you can either use scrub on dev2 to fix all the >>>> generation mismatch. >>> >>> I better understand why this could fix a problem... >> >> Why not? >> >> Tree block/data copy on dev1 is valid, but tree block/data copy on >> dev2 is empty(not written), so btrfs detects the csum error, and >> scrub will try to rewrite it. >> >> After rewrite, both copy on dev1 and dev2 with match and fix the >> problem. > > Exactly. ;-) Didn't say anything against it. > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-22 2:15 ` Qu Wenruo @ 2015-12-22 4:21 ` Chris Murphy 0 siblings, 0 replies; 18+ messages in thread From: Chris Murphy @ 2015-12-22 4:21 UTC (permalink / raw) To: Qu Wenruo; +Cc: Kai Krakow, Btrfs BTRFS Latest update. 4.4.0-0.rc6.git0.1.fc24.x86_64 btrfs-progs v4.3.1 Mounted the volume normally with both devices available, no mount options, so it is a rw mount. And it mounts with only the normal kernel messages: [ 9458.290778] BTRFS info (device sdc): disk space caching is enabled [ 9458.290788] BTRFS: has skinny extents I left the volume alone for 20 minutes. After that time, btrfs-show-super still shows different generation numbers for the two devids. I did an ls -l at the top level of the fs. And btrfs-show-super now shows the same generation numbers and backup_roots information for both devids. Next, I read the most recently modified files, they all read OK, no kernel messages, no missing files. Last, I umounted the volume and did a btrfs check, and it comes up completely clean, no errors. No scrub done yet, no (user space) writes done yet. But going back to the original btrfs check with all the errors, it really doesn't give a user/admin of the volume any useful information what the problem is. After-the-fact it's relatively clear that devid 1 has generation 4924, and devid 2 has generation 4923, and that's what the btrfs check complaints are about: just a generation mismatch and the associated missing metadata on one device. By all measures it's checking out and behaving completely healthy and OK. So I'm going to play with some fire, and treat it normally for a few days: including making snapshots and writing files. I'll do a scrub in a few days and report back. Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-22 1:48 ` Kai Krakow 2015-12-22 2:15 ` Qu Wenruo @ 2015-12-22 10:23 ` Duncan 2015-12-22 15:44 ` Austin S. Hemmelgarn 1 sibling, 1 reply; 18+ messages in thread From: Duncan @ 2015-12-22 10:23 UTC (permalink / raw) To: linux-btrfs Kai Krakow posted on Tue, 22 Dec 2015 02:48:04 +0100 as excerpted: > I just wondered if btrfs allows for the case where both stripes could > have valid checksums despite of btrfs-RAID - just because a failure > occurred right on the spot. > > Is this possible? What happens then? If yes, it would mean not to > blindly trust the RAID without doing the homeworks. The one case where btrfs could get things wrong that I know of is as I discovered in my initial pre-btrfs-raid1-deployment testing... 1) Create a two-device btrfs raid1 (data and metadata) and ensure some data on it, including a test file with some content to be modified later. Sync and unmount normally. 2) Remove one of the two devices. 3) Mount the remaining device degraded-writable (it shouldn't allow mounting without degraded) and modify that test file. Sync and unmount. 4) Switch devices and repeat, modifying that test file in some other incompatible way. Sync and unmount. To this point, everything should be fine, except that you now have two incompatible versions of the test file, potentially with the same separate-but-equal generation numbers after the separate degraded- writable mount, modify, unmount, cycles. 5) Plug both devices in and mount normally. Unless this has changed since my tests, btrfs will neither complain in dmesg nor otherwise provide any hint than anything is wrong. If you read the file, it'll give you one of the versions, still not complaining or providing any hint that something's wrong. Again unmount, without writing anything to the test file this time. 6) Try separately mounting each device individually again (without the other one available so degraded, can be writable or read-only this time) and check the file. Each incompatible copy should remain in place on its respective device. Reading the one copy (randomly chosen or more precisely, chosen based on PID even/odd, as that's what the btrfs raid1 read-scheduler uses to decide which copy to read) didn't change the other one -- btrfs remained oblivious to the incompatible versions. Again unmount. 7) Plug both devices in and mount the combined filesystem writable once again. Scrub. Back when I did my testing, I stopped at step 6 as I didn't understand that scrub was what I should use to resolve the problem. However, based on quite a bit of later experience due to keeping a failing device (more and more sectors replaced with spares, turns out at least the SSD I was working with had way more spares than I would have expected, and even after several months when I finally gave up and replaced it, I was only down to about 85% of spares left, 15% used) around in raid1 mode for awhile, this should *NORMALLY* not be a problem. As long as the generations differ, btrfs scrub can sort things out and catch up the "behind" device, resolving all differences to the latest generation copy. 8) But if both generations happen to be the same, having both been mounted separately and written so they diverged, but so they end up at the same generation when recombined... >From all I know and from everything others told me when I asked at the time, which copy you get then is entirely unpredictable, and worse yet, you might get btrfs acting on divergent metadata when writing to the other device. The caution, therefore, is to do your best not to ever let the two copies be both mounted degraded-writable, separately. If only one copy is written to, then its generation will be higher than the other one, and scrub should have no problem resolving things. Even if both copies are separately written to incompatibly, in most real-world cases one's going to have more generations written than the other and scrub should reliably and predictably resolve differences in favor of that one. The problem only appears if they actually happen to have the same generation number, relatively unlikely except under controlled test conditions, but that has the potential to be a *BIG* problem should it actually occur. So if for some reason you MUST mount both copies degraded-writable separately, the following are your options: a) don't ever recombine them, doing a device replace missing with a third device instead (or a convert to single/dup); use one of the options below if you do need to recombine, or... b) manually verify (using btrfs-show-super or the like) that the supers on each don't have the same generation before attempting a recombine, or... c) wipe the one device and treat it as a new device add, so btrfs can't get mixed up with differing versions at the same generation number, or... d) simply take your chances and hope that the generation numbers don't match. (D should in practice be "good enough" if one was only mounted writable a very short time, while the other was written to over a rather longer period, such that it almost certainly had far more intervening commits and thus generations than the other.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-22 10:23 ` Duncan @ 2015-12-22 15:44 ` Austin S. Hemmelgarn 2015-12-29 21:33 ` Chris Murphy 0 siblings, 1 reply; 18+ messages in thread From: Austin S. Hemmelgarn @ 2015-12-22 15:44 UTC (permalink / raw) To: Duncan, linux-btrfs On 2015-12-22 05:23, Duncan wrote: > Kai Krakow posted on Tue, 22 Dec 2015 02:48:04 +0100 as excerpted: > >> I just wondered if btrfs allows for the case where both stripes could >> have valid checksums despite of btrfs-RAID - just because a failure >> occurred right on the spot. >> >> Is this possible? What happens then? If yes, it would mean not to >> blindly trust the RAID without doing the homeworks. > > The one case where btrfs could get things wrong that I know of is as I > discovered in my initial pre-btrfs-raid1-deployment testing... I've had exactly one case where I got _really_ unlucky and had a bunch of media errors on a BTRFS raid1 setup that happened to result in something similar to this. Things happened such that one copy of a block (we'll call this one copy 1) had correct data, and the other (we'll call this one copy 2) had incorrect data, except that one copy of the metadata had the correct checksum for copy 2, and the other metadata copy had a correct checksum for copy 1, but, due to a hash collision, the checksum for the metadata block was correct for both copies. As a result of this, I ended up getting a read-error about 25% of the time (which then forced a re-read of the data, the correct data about 37.5% of the time, and incorrect data the remaining 37.5% of the time. I actually ran the numbers on how likely this was to happen (more than a dozen errors on different disks in blocks that happened to reference each other, and a hash collision involving a 4 byte difference between two 16k blocks of data), and it's a statistical impossibility (It's more likely that one of Amazon or Google's data-centers goes offline due to hardware failures than it is that this will happen again). Obviously it did happen, but I would say it's such a unrealistic edge case that you probably don't need to worry about it (although I learned _a lot_ about the internals of BTRFS in trying to figure out what was going on). > [...snip...] > > From all I know and from everything others told me when I asked at the > time, which copy you get then is entirely unpredictable, and worse yet, > you might get btrfs acting on divergent metadata when writing to the > other device. > This is indeed the case. Because of how BTRFS verifies checksums, there's a roughly 50% chance that the first read attempt will result in picking a mismatched checksum and data block, which will trigger a re-read which has an independent 50% chance of again picking a mismatch, resulting in a 25% chance that any read that actually goes to the device returns a read error. The remaining 75% of the time, you'll get either one block or the other. These numbers of course get skewed by the VFS cache. In my case above, the file that was affected was one that is almost never in cache when it gets accessed, so I saw numbers relatively close to what you would get without the cache. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-22 15:44 ` Austin S. Hemmelgarn @ 2015-12-29 21:33 ` Chris Murphy 0 siblings, 0 replies; 18+ messages in thread From: Chris Murphy @ 2015-12-29 21:33 UTC (permalink / raw) Cc: Btrfs BTRFS Latest update on this thread. btrfs check (4.3.1) reports no problems. Volume mounts with kernel 4.2.8 with no errors. And I just did a scrub and there were no errors, not even any fix up messages. And dev stats are all zero. So... it appears it was a minor enough problem, and still consistent enough, that it fixed itself. Granted, there was no writing occurring at the time, just heavy reading, or perhaps this would be a different story. Chris Murphy ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: btrfs check inconsistency with raid1, part 1 2015-12-14 7:24 ` Chris Murphy 2015-12-14 8:04 ` Qu Wenruo @ 2015-12-14 11:51 ` Duncan 1 sibling, 0 replies; 18+ messages in thread From: Duncan @ 2015-12-14 11:51 UTC (permalink / raw) To: linux-btrfs Chris Murphy posted on Mon, 14 Dec 2015 00:24:21 -0700 as excerpted: >> Personally speaking, it may be a false alert from btrfsck. >> So in this case, I can't provide much help. >> >> If you're brave enough, mount it rw to see what will happen(although it >> may mount just OK). > > I'm brave enough. I'll give it a try tomorrow unless there's another > request for more info before then. Given the off-by-one generations and my own btrfs raid1 experience, I'm guessing the likely result is a good mount and either no problems or a good initial mount but lockup once you try actually doing too much (like actually reading the affected blocks) with the filesystem. Looks like a normal generation-out-of-sync condition, common with forced unsynced/not-remounted-ro shutdowns. If so, btrfs should redirect reads to the updated current generation device, but you'll need to do a scrub to get everything 100% back in sync. The catch I found, at least when I still had the then-failing (but not failed, it was just finding more and more sectors that needed redirected to spares) ssd still in my raid1, also with an on-boot service that read a rather large dir into cache, was that after so many errors from the failing device, instead of continuing to redirect errors to the good device, btrfs just gives up, which resulted in a system crash, here. But when there weren't that many errors on the failing device, or when I intercepted the boot process and mounted everything but didn't run normal post-mount services (systemd emergency target instead of my usual default multi-user) so the service that cached that dir didn't have a chance to run, so all those errors didn't trigger, I could still mount normally, and from there, I could run scrub, which took care of the problem without triggering the usual too many errors crash, and after scrub, I could invoke normal multi-user mode and start all services including the caching service, and go about my usual business. So if I'm correct, mount normally and scrub, and you should be fine, tho you may have to abort a normal boot if it accesses too many bad files, in ordered to be able to finish the scrub before a crash. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2015-12-29 21:33 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-12-14 4:16 btrfs check inconsistency with raid1, part 1 Chris Murphy 2015-12-14 5:48 ` Qu Wenruo 2015-12-14 7:24 ` Chris Murphy 2015-12-14 8:04 ` Qu Wenruo 2015-12-14 17:59 ` Chris Murphy 2015-12-20 22:32 ` Chris Murphy [not found] ` <CAJCQCtSEx_wYPkfazik0bcpQwXxJCA=O5f0o6RbxON4jjB4q7A@mail.gmail.com> [not found] ` <5677592F.5000202@cn.fujitsu.com> 2015-12-21 2:12 ` Chris Murphy 2015-12-21 2:23 ` Qu Wenruo 2015-12-21 2:46 ` Chris Murphy 2015-12-22 1:05 ` Kai Krakow 2015-12-22 1:22 ` Qu Wenruo 2015-12-22 1:48 ` Kai Krakow 2015-12-22 2:15 ` Qu Wenruo 2015-12-22 4:21 ` Chris Murphy 2015-12-22 10:23 ` Duncan 2015-12-22 15:44 ` Austin S. Hemmelgarn 2015-12-29 21:33 ` Chris Murphy 2015-12-14 11:51 ` Duncan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.