All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs check "Couldn't open file system" after error in transaction.c
@ 2016-08-28 10:04 Hendrik Friedel
  2016-08-28 17:18 ` Hendrik Friedel
  2016-08-28 17:33 ` Chris Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Hendrik Friedel @ 2016-08-28 10:04 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello,

I have a filesystem (three disks with no raid) that I can still mount 
ro, but I cannot check or scrub it.
In dmesg I see:
[So Aug 28 11:33:22 2016] BTRFS error (device sde): parent transid 
verify failed on 22168481054720 wanted 1826943 found 1828546
[So Aug 28 11:33:22 2016] BTRFS warning (device sde): Skipping commit of 
aborted transaction.
(more complete at the end of this mail)

What I did up to now in order to recover:
- mount ro,recovery 
(http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html)
that works.
btrfs check will lead to  "Couldn't open file system"

root@homeserver:~# btrfs scrub start /mnt/test
scrub started on /mnt/test, fsid a8af3832-48c7-4568-861f-e80380dd7e0b 
(pid=18953)
root@homeserver:~# btrfs scrub status /mnt/test
scrub status for a8af3832-48c7-4568-861f-e80380dd7e0b
        scrub started at Sun Aug 28 12:02:46 2016 and was aborted after 
00:00:00
        total bytes scrubbed: 0.00B with 0 errors

First thing to do now is probably to check the backups. But is there a 
way to repair this filesystem?

Besides this: What could be the reason for this error?
Scrubs were regular and good. There was no power loss and also smartctl 
looks fine on all three drives.

Greetings,
Hendrik


root@homeserver:~# btrfs fi show
Label: 'BigStorage'  uuid: a8af3832-48c7-4568-861f-e80380dd7e0b
        Total devices 3 FS bytes used 7.66TiB
        devid    1 size 2.73TiB used 2.72TiB path /dev/sde
        devid    2 size 2.73TiB used 2.72TiB path /dev/sdc
        devid    3 size 2.73TiB used 2.73TiB path /dev/sdd


[   98.534830] BTRFS error (device sde): parent transid verify failed on 
22168481054720 wanted 1826943 found 1828546
[   98.534866] BTRFS error (device sde): parent transid verify failed on 
22168481054720 wanted 1826943 found 1828546
[   98.534891] BTRFS error (device sde): parent transid verify failed on 
22168481054720 wanted 1826943 found 1828546
[   98.534920] BTRFS warning (device sde): Skipping commit of aborted 
transaction.
[   98.534921] ------------[ cut here ]------------
[   98.534939] WARNING: CPU: 1 PID: 3643 at 
/home/zumbi/linux-4.6.4/fs/btrfs/transaction.c:1771 
cleanup_transaction+0x96/0x300 [btrfs]
[   98.534940] BTRFS: Transaction aborted (error -5)
[   98.534940] Modules linked in: xt_nat(E) xt_tcpudp(E) veth(E) 
ftdi_sio(E) usbserial(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) 
xfrm_user(E) xfrm_algo(E) iptable_nat(E) nf_conntrack_ipv4(E) 
nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) 
ip_tables(E) xt_conntrack(E) x_tables(E) nf_nat(E) nf_conntrack(E) 
br_netfilter(E) bridge(E) stp(E) llc(E) cpufreq_stats(E) 
cpufreq_userspace(E) cpufreq_conservative(E) cpufreq_powersave(E) 
binfmt_misc(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) 
grace(E) fscache(E) sunrpc(E) snd_hda_codec_hdmi(E) iTCO_wdt(E) 
iTCO_vendor_support(E) stv6110x(E) lnbp21(E) intel_rapl(E) 
x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) 
kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) 
ghash_clmulni_intel(E) cryptd(E) pcspkr(E) serio_raw(E)
[   98.534963]  snd_hda_codec_realtek(E) snd_hda_codec_generic(E) 
i2c_i801(E) i915(E) stv090x(E) snd_hda_intel(E) snd_hda_codec(E) 
snd_hda_core(E) snd_hwdep(E) snd_pcm(E) drm_kms_helper(E) ngene(E) 
ddbridge(E) snd_timer(E) snd(E) lpc_ich(E) mei_me(E) dvb_core(E) 
mfd_core(E) drm(E) soundcore(E) mei(E) i2c_algo_bit(E) shpchp(E) 
evdev(E) battery(E) tpm_tis(E) video(E) tpm(E) processor(E) button(E) 
fuse(E) autofs4(E) btrfs(E) xor(E) raid6_pq(E) dm_mod(E) md_mod(E) 
hid_generic(E) usbhid(E) hid(E) sg(E) sd_mod(E) ahci(E) libahci(E) 
crc32c_intel(E) libata(E) psmouse(E) scsi_mod(E) fan(E) thermal(E) 
xhci_pci(E) xhci_hcd(E) fjes(E) e1000e(E) ptp(E) pps_core(E) ehci_pci(E) 
ehci_hcd(E) usbcore(E) usb_common(E)
[   98.534988] CPU: 1 PID: 3643 Comm: btrfs-transacti Tainted: G         
    E   4.6.0-0.bpo.1-amd64 #1 Debian 4.6.4-1~bpo8+1
[   98.534989] Hardware name:                  /DH87RL, BIOS 
RLH8710H.86A.0325.2014.0417.1800 04/17/2014
[   98.534990]  0000000000000286 00000000007b2061 ffffffff813124c5 
ffff8804a15f3d40
[   98.534992]  0000000000000000 ffffffff8107af94 ffff8804e884ac10 
ffff8804a15f3d98
[   98.534993]  ffff8804b9bbc500 00000000fffffffb ffff8804e884ac10 
00000000fffffffb
[   98.534995] Call Trace:
[   98.535000]  [<ffffffff813124c5>] ? dump_stack+0x5c/0x77
[   98.535003]  [<ffffffff8107af94>] ? __warn+0xc4/0xe0
[   98.535005]  [<ffffffff8107b00f>] ? warn_slowpath_fmt+0x5f/0x80
[   98.535014]  [<ffffffffc02c28b6>] ? cleanup_transaction+0x96/0x300 
[btrfs]
[   98.535017]  [<ffffffff810bb6c0>] ? wait_woken+0x90/0x90
[   98.535026]  [<ffffffffc02c3663>] ? 
btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
[   98.535028]  [<ffffffff810bb6c0>] ? wait_woken+0x90/0x90
[   98.535036]  [<ffffffffc02be76e>] ? transaction_kthread+0x1ce/0x1f0 
[btrfs]
[   98.535043]  [<ffffffffc02be5a0>] ? 
btrfs_cleanup_transaction+0x590/0x590 [btrfs]
[   98.535045]  [<ffffffff81099ecf>] ? kthread+0xdf/0x100
[   98.535048]  [<ffffffff815c8772>] ? ret_from_fork+0x22/0x40
[   98.535049]  [<ffffffff81099df0>] ? kthread_park+0x50/0x50
[   98.535050] ---[ end trace 91a2f65df3d53d48 ]---
[   98.535055] BTRFS: error (device sde) in cleanup_transaction:1771: 
errno=-5 IO failure
[   98.535059] BTRFS: error (device sde) in btrfs_drop_snapshot:9061: 
errno=-5 IO failure
[   98.535060] BTRFS info (device sde): forced readonly
[   98.535571] BTRFS info (device sde): delayed_refs has NO entry
[  689.672724] BTRFS info (device sde): disk space caching is enabled
[  689.672727] BTRFS error (device sde): Remounting read-write after 
error is not allowed
[ 1124.707300] BTRFS info (device sde): disk space caching is enabled
[ 1124.707303] BTRFS error (device sde): Remounting read-write after 
error is not allowed


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs check "Couldn't open file system" after error in transaction.c
  2016-08-28 10:04 btrfs check "Couldn't open file system" after error in transaction.c Hendrik Friedel
@ 2016-08-28 17:18 ` Hendrik Friedel
  2016-08-28 17:33 ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Hendrik Friedel @ 2016-08-28 17:18 UTC (permalink / raw)
  To: Btrfs BTRFS

Hello,

some more info:
The system is Debian jessie with kernel 4.6.0 and btrfs-tools 4.6.

I did go through the recovery steps from the wiki:
-btrfs scrub to detect issues on live filesystems
see my original mail. Is aborted immediately

-look at btrfs detected errors in syslog (look at Marc's blog above on 
how to use sec.pl to do this)
see my original mail

-mount -o ro,recovery to mount a filesystem with issues
does work, but I get many errors like this one
[  325.360115] BTRFS info (device sdd): no csum found for inode 1703 
start 2072977408

-btrfs-zero-log might help in specific cases. Go read Btrfs-zero-log
I would like to get your ok and instructions on this first

-btrfs restore will help you copy data off a broken btrfs filesystem. 
See its page: Restore
see above. Recovering the data does seem to work with ro,recovery.
By the way: can I be sure somehow that the Data is correct when I read 
it this way, despite the "no csum found for inode" ?

-btrfs check --repair, aka btrfsck is your last option if the ones above 
have not worked.
Does not work:
"Couldn't open file system"

I also went through Marcs page:
http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html
but without further hints.

So I have now these objectives:
-If possible repair the filesystem
-Understand the reason behind the issue and prevent it in future
-If not possible to repair the filesystem:
    -understand if the data that I read from the drive is valid or 
corrupted

I'd appreciate your help on this.

Greetings,
Hendrik


------ Originalnachricht ------
Von: "Hendrik Friedel" <hendrik@friedels.name>
An: "Btrfs BTRFS" <linux-btrfs@vger.kernel.org>
Gesendet: 28.08.2016 12:04:18
Betreff: btrfs check "Couldn't open file system" after error in 
transaction.c

>Hello,
>
>I have a filesystem (three disks with no raid) that I can still mount 
>ro, but I cannot check or scrub it.
>In dmesg I see:
>[So Aug 28 11:33:22 2016] BTRFS error (device sde): parent transid 
>verify failed on 22168481054720 wanted 1826943 found 1828546
>[So Aug 28 11:33:22 2016] BTRFS warning (device sde): Skipping commit 
>of aborted transaction.
>(more complete at the end of this mail)
>
>What I did up to now in order to recover:
>- mount ro,recovery 
>(http://marc.merlins.org/perso/btrfs/post_2014-03-19_Btrfs-Tips_-Btrfs-Scrub-and-Btrfs-Filesystem-Repair.html)
>that works.
>btrfs check will lead to  "Couldn't open file system"
>
>root@homeserver:~# btrfs scrub start /mnt/test
>scrub started on /mnt/test, fsid a8af3832-48c7-4568-861f-e80380dd7e0b 
>(pid=18953)
>root@homeserver:~# btrfs scrub status /mnt/test
>scrub status for a8af3832-48c7-4568-861f-e80380dd7e0b
>        scrub started at Sun Aug 28 12:02:46 2016 and was aborted after 
>00:00:00
>        total bytes scrubbed: 0.00B with 0 errors
>
>First thing to do now is probably to check the backups. But is there a 
>way to repair this filesystem?
>
>Besides this: What could be the reason for this error?
>Scrubs were regular and good. There was no power loss and also smartctl 
>looks fine on all three drives.
>
>Greetings,
>Hendrik
>
>
>root@homeserver:~# btrfs fi show
>Label: 'BigStorage'  uuid: a8af3832-48c7-4568-861f-e80380dd7e0b
>        Total devices 3 FS bytes used 7.66TiB
>        devid    1 size 2.73TiB used 2.72TiB path /dev/sde
>        devid    2 size 2.73TiB used 2.72TiB path /dev/sdc
>        devid    3 size 2.73TiB used 2.73TiB path /dev/sdd
>
>
>[   98.534830] BTRFS error (device sde): parent transid verify failed 
>on 22168481054720 wanted 1826943 found 1828546
>[   98.534866] BTRFS error (device sde): parent transid verify failed 
>on 22168481054720 wanted 1826943 found 1828546
>[   98.534891] BTRFS error (device sde): parent transid verify failed 
>on 22168481054720 wanted 1826943 found 1828546
>[   98.534920] BTRFS warning (device sde): Skipping commit of aborted 
>transaction.
>[   98.534921] ------------[ cut here ]------------
>[   98.534939] WARNING: CPU: 1 PID: 3643 at 
>/home/zumbi/linux-4.6.4/fs/btrfs/transaction.c:1771 
>cleanup_transaction+0x96/0x300 [btrfs]
>[   98.534940] BTRFS: Transaction aborted (error -5)
>[   98.534940] Modules linked in: xt_nat(E) xt_tcpudp(E) veth(E) 
>ftdi_sio(E) usbserial(E) ipt_MASQUERADE(E) nf_nat_masquerade_ipv4(E) 
>xfrm_user(E) xfrm_algo(E) iptable_nat(E) nf_conntrack_ipv4(E) 
>nf_defrag_ipv4(E) nf_nat_ipv4(E) xt_addrtype(E) iptable_filter(E) 
>ip_tables(E) xt_conntrack(E) x_tables(E) nf_nat(E) nf_conntrack(E) 
>br_netfilter(E) bridge(E) stp(E) llc(E) cpufreq_stats(E) 
>cpufreq_userspace(E) cpufreq_conservative(E) cpufreq_powersave(E) 
>binfmt_misc(E) nfsd(E) auth_rpcgss(E) nfs_acl(E) nfs(E) lockd(E) 
>grace(E) fscache(E) sunrpc(E) snd_hda_codec_hdmi(E) iTCO_wdt(E) 
>iTCO_vendor_support(E) stv6110x(E) lnbp21(E) intel_rapl(E) 
>x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) 
>kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) 
>ghash_clmulni_intel(E) cryptd(E) pcspkr(E) serio_raw(E)
>[   98.534963]  snd_hda_codec_realtek(E) snd_hda_codec_generic(E) 
>i2c_i801(E) i915(E) stv090x(E) snd_hda_intel(E) snd_hda_codec(E) 
>snd_hda_core(E) snd_hwdep(E) snd_pcm(E) drm_kms_helper(E) ngene(E) 
>ddbridge(E) snd_timer(E) snd(E) lpc_ich(E) mei_me(E) dvb_core(E) 
>mfd_core(E) drm(E) soundcore(E) mei(E) i2c_algo_bit(E) shpchp(E) 
>evdev(E) battery(E) tpm_tis(E) video(E) tpm(E) processor(E) button(E) 
>fuse(E) autofs4(E) btrfs(E) xor(E) raid6_pq(E) dm_mod(E) md_mod(E) 
>hid_generic(E) usbhid(E) hid(E) sg(E) sd_mod(E) ahci(E) libahci(E) 
>crc32c_intel(E) libata(E) psmouse(E) scsi_mod(E) fan(E) thermal(E) 
>xhci_pci(E) xhci_hcd(E) fjes(E) e1000e(E) ptp(E) pps_core(E) 
>ehci_pci(E) ehci_hcd(E) usbcore(E) usb_common(E)
>[   98.534988] CPU: 1 PID: 3643 Comm: btrfs-transacti Tainted: G        
>     E   4.6.0-0.bpo.1-amd64 #1 Debian 4.6.4-1~bpo8+1
>[   98.534989] Hardware name:                  /DH87RL, BIOS 
>RLH8710H.86A.0325.2014.0417.1800 04/17/2014
>[   98.534990]  0000000000000286 00000000007b2061 ffffffff813124c5 
>ffff8804a15f3d40
>[   98.534992]  0000000000000000 ffffffff8107af94 ffff8804e884ac10 
>ffff8804a15f3d98
>[   98.534993]  ffff8804b9bbc500 00000000fffffffb ffff8804e884ac10 
>00000000fffffffb
>[   98.534995] Call Trace:
>[   98.535000]  [<ffffffff813124c5>] ? dump_stack+0x5c/0x77
>[   98.535003]  [<ffffffff8107af94>] ? __warn+0xc4/0xe0
>[   98.535005]  [<ffffffff8107b00f>] ? warn_slowpath_fmt+0x5f/0x80
>[   98.535014]  [<ffffffffc02c28b6>] ? cleanup_transaction+0x96/0x300 
>[btrfs]
>[   98.535017]  [<ffffffff810bb6c0>] ? wait_woken+0x90/0x90
>[   98.535026]  [<ffffffffc02c3663>] ? 
>btrfs_commit_transaction+0x2b3/0xa30 [btrfs]
>[   98.535028]  [<ffffffff810bb6c0>] ? wait_woken+0x90/0x90
>[   98.535036]  [<ffffffffc02be76e>] ? transaction_kthread+0x1ce/0x1f0 
>[btrfs]
>[   98.535043]  [<ffffffffc02be5a0>] ? 
>btrfs_cleanup_transaction+0x590/0x590 [btrfs]
>[   98.535045]  [<ffffffff81099ecf>] ? kthread+0xdf/0x100
>[   98.535048]  [<ffffffff815c8772>] ? ret_from_fork+0x22/0x40
>[   98.535049]  [<ffffffff81099df0>] ? kthread_park+0x50/0x50
>[   98.535050] ---[ end trace 91a2f65df3d53d48 ]---
>[   98.535055] BTRFS: error (device sde) in cleanup_transaction:1771: 
>errno=-5 IO failure
>[   98.535059] BTRFS: error (device sde) in btrfs_drop_snapshot:9061: 
>errno=-5 IO failure
>[   98.535060] BTRFS info (device sde): forced readonly
>[   98.535571] BTRFS info (device sde): delayed_refs has NO entry
>[  689.672724] BTRFS info (device sde): disk space caching is enabled
>[  689.672727] BTRFS error (device sde): Remounting read-write after 
>error is not allowed
>[ 1124.707300] BTRFS info (device sde): disk space caching is enabled
>[ 1124.707303] BTRFS error (device sde): Remounting read-write after 
>error is not allowed


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs check "Couldn't open file system" after error in transaction.c
  2016-08-28 10:04 btrfs check "Couldn't open file system" after error in transaction.c Hendrik Friedel
  2016-08-28 17:18 ` Hendrik Friedel
@ 2016-08-28 17:33 ` Chris Murphy
  2016-08-28 18:04   ` Re[2]: " Hendrik Friedel
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Murphy @ 2016-08-28 17:33 UTC (permalink / raw)
  To: Hendrik Friedel; +Cc: Btrfs BTRFS

On Sun, Aug 28, 2016 at 4:04 AM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hello,
>
> I have a filesystem (three disks with no raid)

So it's data single *and* metadata single?


> btrfs check will lead to  "Couldn't open file system"

Try btrfs-progs all the most recent btrfs-progs to see if it's any
different: 4.5.3, 4.6.1, 4.7 (or 4.7.1 if you can get it, it's days
old).


>
>
> [   98.534830] BTRFS error (device sde): parent transid verify failed on
> 22168481054720 wanted 1826943 found 1828546


That's pretty weird. It wants a LOWER generation number than what it
found? By quite a bit. It's nearly 1500 generations different. I don't
know what can cause this kind of confusion or how to fix it. Maybe
take advantage of the fact it does read only and recreate it. You
could take a btrfs-image and btrfs-debug-tree first, because there's
some bug somewhere: somehow it became inconsistent, and can't be fixed
at mount time or even with btrfs check.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re[2]: btrfs check "Couldn't open file system" after error in transaction.c
  2016-08-28 17:33 ` Chris Murphy
@ 2016-08-28 18:04   ` Hendrik Friedel
  2016-09-04 18:51     ` Re[3]: " Hendrik Friedel
  2016-09-04 21:57     ` Re[2]: " Chris Murphy
  0 siblings, 2 replies; 9+ messages in thread
From: Hendrik Friedel @ 2016-08-28 18:04 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hi Chris,

thanks for your reply -especially on a Sunday.
>>  I have a filesystem (three disks with no raid)
>
>So it's data single *and* metadata single?
>
No:
Data, single: total=8.14TiB, used=7.64TiB
System, RAID1: total=32.00MiB, used=912.00KiB
Metadata, RAID1: total=18.00GiB, used=16.45GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
>>  btrfs check will lead to  "Couldn't open file system"
>
>Try btrfs-progs all the most recent btrfs-progs to see if it's any
>different: 4.5.3, 4.6.1, 4.7 (or 4.7.1 if you can get it, it's days
>old).
Ok, I will try.
>>  [   98.534830] BTRFS error (device sde): parent transid verify failed 
>>on
>>  22168481054720 wanted 1826943 found 1828546
>
>
>That's pretty weird. It wants a LOWER generation number than what it
>found? By quite a bit. It's nearly 1500 generations different. I don't
>know what can cause this kind of confusion or how to fix it.
Ok, time to get the data off it (I do have backups, but of course some 
weeks old). Answering this question:
-If possible repair the filesystem
NO

>  Maybe
>take advantage of the fact it does read only and recreate it. You
>could take a btrfs-image and btrfs-debug-tree first,
And what do I do with it?

>because there's
>some bug somewhere: somehow it became inconsistent, and can't be fixed
>at mount time or even with btrfs check.
Ok, so is there any way to help you finding this bug?
Coming back to my objectives:
-Understand the reason behind the issue and prevent it in future
Finding the but would help on the above

-If not possible to repair the filesystem:
    -understand if the data that I read from the drive is valid or 
corrupted
Can you answer this?

As mentioned: I do have a backup, a month old. The data does not change 
so regularly, so most should be ok.
Now I have two sources of data:
the backup and the current degraded filesystem.
If data differs, which one do I take? Is it safe to use the more recent 
one from the degraded filesystem?

Greetings,
Hendrik

>
>
>
>
>--
>Chris Murphy


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re[3]: btrfs check "Couldn't open file system" after error in transaction.c
  2016-08-28 18:04   ` Re[2]: " Hendrik Friedel
@ 2016-09-04 18:51     ` Hendrik Friedel
  2016-09-04 19:23       ` Re[4]: " Hendrik Friedel
  2016-09-04 22:01       ` Re[3]: " Chris Murphy
  2016-09-04 21:57     ` Re[2]: " Chris Murphy
  1 sibling, 2 replies; 9+ messages in thread
From: Hendrik Friedel @ 2016-09-04 18:51 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hello again,

before overwriting the filesystem, some last questions:

>>  Maybe
>>take advantage of the fact it does read only and recreate it. You
>>could take a btrfs-image and btrfs-debug-tree first,
>And what do I do with it?
>
>>because there's
>>some bug somewhere: somehow it became inconsistent, and can't be fixed
>>at mount time or even with btrfs check.
>Ok, so is there any way to help you finding this bug?
Anything, I can do here?

>Coming back to my objectives:
>-Understand the reason behind the issue and prevent it in future
>Finding the but would help on the above
>
>-If not possible to repair the filesystem:
>    -understand if the data that I read from the drive is valid or 
>corrupted
>Can you answer this?
>
>As mentioned: I do have a backup, a month old. The data does not change 
>so regularly, so most should be ok.
>Now I have two sources of data:
>the backup and the current degraded filesystem.
>If data differs, which one do I take? Is it safe to use the more recent 
>one from the degraded filesystem?
>
And can you help me on these points?

FYI, I did a
btrfsck --init-csum-tree /dev/sdd
btrfs rescue zero-log btrfs-zero-log
btrfsck /dev/sdd

now. The last command is still running. It seems to be working; Is there 
a way to be sure, that the data is all ok again?

Regards,
Hendrik


>
>Greetings,
>Hendrik
>
>>
>>
>>
>>
>>--
>>Chris Murphy


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re[4]: btrfs check "Couldn't open file system" after error in transaction.c
  2016-09-04 18:51     ` Re[3]: " Hendrik Friedel
@ 2016-09-04 19:23       ` Hendrik Friedel
  2016-09-04 22:04         ` Chris Murphy
  2016-09-04 22:01       ` Re[3]: " Chris Murphy
  1 sibling, 1 reply; 9+ messages in thread
From: Hendrik Friedel @ 2016-09-04 19:23 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

Hello,

here the output of btrfsck:
Checking filesystem on /dev/sdd
UUID: a8af3832-48c7-4568-861f-e80380dd7e0b
checking extents
checking free space cache
checking fs root
checking csums
checking root refs
checking quota groups
Ignoring qgroup relation key 24544
Ignoring qgroup relation key 24610
Ignoring qgroup relation key 24611
Ignoring qgroup relation key 25933
Ignoring qgroup relation key 25934
Ignoring qgroup relation key 25935
Ignoring qgroup relation key 25936
Ignoring qgroup relation key 25937
Ignoring qgroup relation key 25938
Ignoring qgroup relation key 25939
Ignoring qgroup relation key 25939
Ignoring qgroup relation key 25941
Ignoring qgroup relation key 25942
Ignoring qgroup relation key 25958
Ignoring qgroup relation key 25959
Ignoring qgroup relation key 25960
Ignoring qgroup relation key 25961
Ignoring qgroup relation key 25962
Ignoring qgroup relation key 25963
Ignoring qgroup relation key 25964
Ignoring qgroup relation key 25965
Ignoring qgroup relation key 25966
Ignoring qgroup relation key 25966
Ignoring qgroup relation key 25968
Ignoring qgroup relation key 25970
Ignoring qgroup relation key 25971
Ignoring qgroup relation key 25972
Ignoring qgroup relation key 25975
Ignoring qgroup relation key 25976
Ignoring qgroup relation key 25976
Ignoring qgroup relation key 25976
Ignoring qgroup relation key 567172078071971871
Ignoring qgroup relation key 567172078071971872
Ignoring qgroup relation key 567172078071971882
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971885
Ignoring qgroup relation key 567172078071971886
Ignoring qgroup relation key 567172078071971886
Ignoring qgroup relation key 567172078071971886
Ignoring qgroup relation key 567172078071971886
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Ignoring qgroup relation key 567172078071971892
Qgroup is already inconsistent before checking
Counts for qgroup id: 3102 are different
our:            referenced 174829252608 referenced compressed 
174829252608
disk:           referenced 174829252608 referenced compressed 
174829252608
our:            exclusive 2899968 exclusive compressed 2899968
disk:           exclusive 2916352 exclusive compressed 2916352
diff:           exclusive -16384 exclusive compressed -16384
Counts for qgroup id: 25977 are different
our:            referenced 47249391616 referenced compressed 47249391616
disk:           referenced 47249391616 referenced compressed 47249391616
our:            exclusive 90222592 exclusive compressed 90222592
disk:           exclusive 90238976 exclusive compressed 90238976
diff:           exclusive -16384 exclusive compressed -16384
Counts for qgroup id: 25978 are different
our:            referenced 174829252608 referenced compressed 
174829252608
disk:           referenced 174829252608 referenced compressed 
174829252608
our:            exclusive 1064960 exclusive compressed 1064960
disk:           exclusive 1081344 exclusive compressed 1081344
diff:           exclusive -16384 exclusive compressed -16384
Counts for qgroup id: 26162 are different
our:            referenced 65940500480 referenced compressed 65940500480
disk:           referenced 65866997760 referenced compressed 65866997760
diff:           referenced 73502720 referenced compressed 73502720
our:            exclusive 3991326720 exclusive compressed 3991326720
disk:           exclusive 3960582144 exclusive compressed 3960582144
diff:           exclusive 30744576 exclusive compressed 30744576
found 8423479726080 bytes used err is 1
total csum bytes: 8206766844
total tree bytes: 17669144576
total fs tree bytes: 7271251968
total extent tree bytes: 683851776
total csum bytes: 8206766844
total tree bytes: 17669144576
total fs tree bytes: 7271251968
total extent tree bytes: 683851776
btree space waste bytes: 2859469730
file data blocks allocated: 16171232772096
referenced 13512171663360

What does that tell us?

Greetings,
Hendrik


------ Originalnachricht ------
Von: "Hendrik Friedel" <hendrik@friedels.name>
An: "Chris Murphy" <lists@colorremedies.com>
Cc: "Btrfs BTRFS" <linux-btrfs@vger.kernel.org>
Gesendet: 04.09.2016 20:51:06
Betreff: Re[3]: btrfs check "Couldn't open file system" after error in 
transaction.c

>Hello again,
>
>before overwriting the filesystem, some last questions:
>
>>>  Maybe
>>>take advantage of the fact it does read only and recreate it. You
>>>could take a btrfs-image and btrfs-debug-tree first,
>>And what do I do with it?
>>
>>>because there's
>>>some bug somewhere: somehow it became inconsistent, and can't be 
>>>fixed
>>>at mount time or even with btrfs check.
>>Ok, so is there any way to help you finding this bug?
>Anything, I can do here?
>
>>Coming back to my objectives:
>>-Understand the reason behind the issue and prevent it in future
>>Finding the but would help on the above
>>
>>-If not possible to repair the filesystem:
>>    -understand if the data that I read from the drive is valid or 
>>corrupted
>>Can you answer this?
>>
>>As mentioned: I do have a backup, a month old. The data does not 
>>change so regularly, so most should be ok.
>>Now I have two sources of data:
>>the backup and the current degraded filesystem.
>>If data differs, which one do I take? Is it safe to use the more 
>>recent one from the degraded filesystem?
>>
>And can you help me on these points?
>
>FYI, I did a
>btrfsck --init-csum-tree /dev/sdd
>btrfs rescue zero-log btrfs-zero-log
>btrfsck /dev/sdd
>
>now. The last command is still running. It seems to be working; Is 
>there a way to be sure, that the data is all ok again?
>
>Regards,
>Hendrik
>
>
>>
>>Greetings,
>>Hendrik
>>
>>>
>>>
>>>
>>>
>>>--
>>>Chris Murphy


---
Diese E-Mail wurde von Avast Antivirus-Software auf Viren geprüft.
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re[2]: btrfs check "Couldn't open file system" after error in transaction.c
  2016-08-28 18:04   ` Re[2]: " Hendrik Friedel
  2016-09-04 18:51     ` Re[3]: " Hendrik Friedel
@ 2016-09-04 21:57     ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2016-09-04 21:57 UTC (permalink / raw)
  To: Btrfs BTRFS

Lost track of this...sorry.

On Sun, Aug 28, 2016 at 12:04 PM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hi Chris,
>
> thanks for your reply -especially on a Sunday.
>>>
>>>  I have a filesystem (three disks with no raid)
>>
>>
>> So it's data single *and* metadata single?
>>
> No:
> Data, single: total=8.14TiB, used=7.64TiB
> System, RAID1: total=32.00MiB, used=912.00KiB
> Metadata, RAID1: total=18.00GiB, used=16.45GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>>
>>
>>
>>>  btrfs check will lead to  "Couldn't open file system"

That's a bug worth filing. That bug report will need a URL for where
you put the btrfs-image file.


>>  Maybe
>> take advantage of the fact it does read only and recreate it. You
>> could take a btrfs-image and btrfs-debug-tree first,
>
> And what do I do with it?

Put it somewhere it can live a while, it might be months before a dev
gets around to looking at it. I usually put them on google drive in
the public folder, and then post the URL (get shareable link) in the
bug report.



>
>> because there's
>> some bug somewhere: somehow it became inconsistent, and can't be fixed
>> at mount time or even with btrfs check.
>
> Ok, so is there any way to help you finding this bug?
> Coming back to my objectives:
> -Understand the reason behind the issue and prevent it in future
> Finding the but would help on the above

No idea.

>
> -If not possible to repair the filesystem:
>    -understand if the data that I read from the drive is valid or corrupted
> Can you answer this?

Other than nocow files which do not have csums, Btrfs will spit back
an I/O error and path to the bad file rather than hand over data it
thinks is corrupt (doesn't match csum). So data read from the volume
should be valid.


>
> As mentioned: I do have a backup, a month old. The data does not change so
> regularly, so most should be ok.
> Now I have two sources of data:
> the backup and the current degraded filesystem.
> If data differs, which one do I take? Is it safe to use the more recent one
> from the degraded filesystem?

If data differs you have to figure out a way to inspect the file to
determine which one is correct. Databases have their own consistency
checks, for example, if it's an image, open it in a viewer - big
problems will be visible, small problems might just be one wrong pixel
and you may not even notice it.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re[3]: btrfs check "Couldn't open file system" after error in transaction.c
  2016-09-04 18:51     ` Re[3]: " Hendrik Friedel
  2016-09-04 19:23       ` Re[4]: " Hendrik Friedel
@ 2016-09-04 22:01       ` Chris Murphy
  1 sibling, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2016-09-04 22:01 UTC (permalink / raw)
  To: Hendrik Friedel; +Cc: Chris Murphy, Btrfs BTRFS

On Sun, Sep 4, 2016 at 12:51 PM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hello again,
>
> before overwriting the filesystem, some last questions:
>
>>>  Maybe
>>> take advantage of the fact it does read only and recreate it. You
>>> could take a btrfs-image and btrfs-debug-tree first,
>>
>> And what do I do with it?
>>
>>> because there's
>>> some bug somewhere: somehow it became inconsistent, and can't be fixed
>>> at mount time or even with btrfs check.
>>
>> Ok, so is there any way to help you finding this bug?
>
> Anything, I can do here?
>
>> Coming back to my objectives:
>> -Understand the reason behind the issue and prevent it in future
>> Finding the but would help on the above
>>
>> -If not possible to repair the filesystem:
>>    -understand if the data that I read from the drive is valid or
>> corrupted
>> Can you answer this?
>>
>> As mentioned: I do have a backup, a month old. The data does not change so
>> regularly, so most should be ok.
>> Now I have two sources of data:
>> the backup and the current degraded filesystem.
>> If data differs, which one do I take? Is it safe to use the more recent
>> one from the degraded filesystem?
>>
> And can you help me on these points?
>
> FYI, I did a
> btrfsck --init-csum-tree /dev/sdd
> btrfs rescue zero-log btrfs-zero-log
> btrfsck /dev/sdd

Curious that this is fixing a parenttransid problem...not sure why.
Only a developer working on btrfsck could answer this. They'd need the
btrfs-image before these things were done and see what's wrong with
the file system that causes check to fail. Changing anything changes
the evidence of what was wrong.

>
> now. The last command is still running. It seems to be working; Is there a
> way to be sure, that the data is all ok again?

Not by Brfs. The problem now is that by init-csum-tree it recomputed
the csums for everything. If there were any files corrupt, they now
have csums based on that corruption, so they will read as OK by Btrfs.
That's the problem with init-csum-tree. So now you need a different
way to confirm/deny if they files are really good or not.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Re[4]: btrfs check "Couldn't open file system" after error in transaction.c
  2016-09-04 19:23       ` Re[4]: " Hendrik Friedel
@ 2016-09-04 22:04         ` Chris Murphy
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Murphy @ 2016-09-04 22:04 UTC (permalink / raw)
  To: Hendrik Friedel; +Cc: Chris Murphy, Btrfs BTRFS

On Sun, Sep 4, 2016 at 1:23 PM, Hendrik Friedel <hendrik@friedels.name> wrote:
> Hello,
>
> here the output of btrfsck:
> Checking filesystem on /dev/sdd
> UUID: a8af3832-48c7-4568-861f-e80380dd7e0b
> checking extents
> checking free space cache
> checking fs root
> checking csums
> checking root refs
> checking quota groups
> Ignoring qgroup relation key 24544
> Ignoring qgroup relation key 24610
> Ignoring qgroup relation key 24611
> Ignoring qgroup relation key 25933
> Ignoring qgroup relation key 25934
> Ignoring qgroup relation key 25935
> Ignoring qgroup relation key 25936
> Ignoring qgroup relation key 25937
> Ignoring qgroup relation key 25938
> Ignoring qgroup relation key 25939
> Ignoring qgroup relation key 25939
> Ignoring qgroup relation key 25941
> Ignoring qgroup relation key 25942
> Ignoring qgroup relation key 25958
> Ignoring qgroup relation key 25959
> Ignoring qgroup relation key 25960
> Ignoring qgroup relation key 25961
> Ignoring qgroup relation key 25962
> Ignoring qgroup relation key 25963
> Ignoring qgroup relation key 25964
> Ignoring qgroup relation key 25965
> Ignoring qgroup relation key 25966
> Ignoring qgroup relation key 25966
> Ignoring qgroup relation key 25968
> Ignoring qgroup relation key 25970
> Ignoring qgroup relation key 25971
> Ignoring qgroup relation key 25972
> Ignoring qgroup relation key 25975
> Ignoring qgroup relation key 25976
> Ignoring qgroup relation key 25976
> Ignoring qgroup relation key 25976
> Ignoring qgroup relation key 567172078071971871
> Ignoring qgroup relation key 567172078071971872
> Ignoring qgroup relation key 567172078071971882
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971885
> Ignoring qgroup relation key 567172078071971886
> Ignoring qgroup relation key 567172078071971886
> Ignoring qgroup relation key 567172078071971886
> Ignoring qgroup relation key 567172078071971886
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Ignoring qgroup relation key 567172078071971892
> Qgroup is already inconsistent before checking
> Counts for qgroup id: 3102 are different
> our:            referenced 174829252608 referenced compressed 174829252608
> disk:           referenced 174829252608 referenced compressed 174829252608
> our:            exclusive 2899968 exclusive compressed 2899968
> disk:           exclusive 2916352 exclusive compressed 2916352
> diff:           exclusive -16384 exclusive compressed -16384
> Counts for qgroup id: 25977 are different
> our:            referenced 47249391616 referenced compressed 47249391616
> disk:           referenced 47249391616 referenced compressed 47249391616
> our:            exclusive 90222592 exclusive compressed 90222592
> disk:           exclusive 90238976 exclusive compressed 90238976
> diff:           exclusive -16384 exclusive compressed -16384
> Counts for qgroup id: 25978 are different
> our:            referenced 174829252608 referenced compressed 174829252608
> disk:           referenced 174829252608 referenced compressed 174829252608
> our:            exclusive 1064960 exclusive compressed 1064960
> disk:           exclusive 1081344 exclusive compressed 1081344
> diff:           exclusive -16384 exclusive compressed -16384
> Counts for qgroup id: 26162 are different
> our:            referenced 65940500480 referenced compressed 65940500480
> disk:           referenced 65866997760 referenced compressed 65866997760
> diff:           referenced 73502720 referenced compressed 73502720
> our:            exclusive 3991326720 exclusive compressed 3991326720
> disk:           exclusive 3960582144 exclusive compressed 3960582144
> diff:           exclusive 30744576 exclusive compressed 30744576
> found 8423479726080 bytes used err is 1
> total csum bytes: 8206766844
> total tree bytes: 17669144576
> total fs tree bytes: 7271251968
> total extent tree bytes: 683851776
> total csum bytes: 8206766844
> total tree bytes: 17669144576
> total fs tree bytes: 7271251968
> total extent tree bytes: 683851776
> btree space waste bytes: 2859469730
> file data blocks allocated: 16171232772096
> referenced 13512171663360
>
> What does that tell us?
>

Qu might have an idea, I don't. Looks like quotas are enabled, as well
as compression; but I have no idea if they're related to the problem.
I'd expect Qu would still rather have had a btrfs-image before
init-csum-tree and zero log but I'll leave that up to him to answer.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2016-09-04 22:04 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-28 10:04 btrfs check "Couldn't open file system" after error in transaction.c Hendrik Friedel
2016-08-28 17:18 ` Hendrik Friedel
2016-08-28 17:33 ` Chris Murphy
2016-08-28 18:04   ` Re[2]: " Hendrik Friedel
2016-09-04 18:51     ` Re[3]: " Hendrik Friedel
2016-09-04 19:23       ` Re[4]: " Hendrik Friedel
2016-09-04 22:04         ` Chris Murphy
2016-09-04 22:01       ` Re[3]: " Chris Murphy
2016-09-04 21:57     ` Re[2]: " Chris Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.