linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
@ 2019-06-24 21:19 Chris Murphy
  2019-06-24 22:07 ` Chris Murphy
  2019-06-24 23:48 ` Qu Wenruo
  0 siblings, 2 replies; 14+ messages in thread
From: Chris Murphy @ 2019-06-24 21:19 UTC (permalink / raw)
  To: Btrfs BTRFS

Short version:
sysroot is Btrfs. I made a rw snapshot of an ro snapshot in order to
do a rollback, and then rebooted. The umount was clean, but during
startup I noticed many services had failed, but couldn't tell why and
I had no shell or remote service to get into the system. I don't know
if a rw mount happened, journald is set to volatile journal so there's
no log of this failed boot. I waited about 5 minutes in this state
then forced a poweroff, rebooted from USB stick and btrfs check finds
errors and collect  information. There are many btrfs check errors in
lowmem mode. It does mount ro without complaint using kernel 5.0.7.

This system has used Btrfs for years with zero corruptions until now.
This system was clean installed with Fedora Rawhide ~ 2 weeks ago,
with a newly created file system.

Hardware:
Apple Macbook Pro (2011)
Samsung SSD 840 EVO 250GB, Firmware Version: EXT0DB6Q

Detail version:
only btrfs-progs 5.1 used
mix of kernel 5.2rc2, rc3, and rc5, at the time of the snapshot and
reboot with scary failures, it was using rc5+, specifically git
bed3c0d84e7e

mkfs.btrfs -d single -m single -n 32K /dev/sda6
mount options: noatime,ssd,space_cache=v2,compress=zstd

btrfs check (normal)
https://drive.google.com/open?id=1ZkIPq01lz1BxjOjxA22SUh0kSQaEeRex
btrfs check lowmem
https://drive.google.com/open?id=11INxc1OQbdpmqrVRlk0mNuLwmgSnnB1K
btrfs super
https://drive.google.com/open?id=13cNGSvjPTiYgG0smEFWecP3fk9P2Tn5G
btrfs debug tree, 97MB zstd compressed
https://drive.google.com/open?id=1TaxKikn9gKDjnkQpOrqWzdGYw17maCpt
btrfs image, 31MB
https://drive.google.com/open?id=1is_prjUhu8IBjq9oadugOi_YBYbGYGRA


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-24 21:19 5.2rc5 corruption, many "is compressed but inode flag doesn't allow" Chris Murphy
@ 2019-06-24 22:07 ` Chris Murphy
  2019-06-24 22:46   ` Chris Murphy
  2019-06-24 23:48 ` Qu Wenruo
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2019-06-24 22:07 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Btrfs BTRFS

This is with kernel 5.0.7, still ro mounted with blockdev --setro as well

[root@localhost-live ~]# btrfs scrub start -BrR /mnt
scrub done for 72df6d5b-26d1-47ff-a9ab-33f6a0b2c4cf
    scrub started at Mon Jun 24 17:59:57 2019 and finished after 00:00:15
    data_extents_scrubbed: 231002
    tree_extents_scrubbed: 3657
    data_bytes_scrubbed: 7518072832
    tree_bytes_scrubbed: 239665152
    read_errors: 0
    csum_errors: 0
    verify_errors: 0
    no_csum: 25134
    csum_discards: 0
    super_errors: 0
    malloc_errors: 0
    uncorrectable_errors: 0
    unverified_errors: 0
    corrected_errors: 0
    last_physical: 93437558784
[root@localhost-live ~]#

I'm not sure what's no_csum on this file system, are those in bytes?
The only file I can think of that's nocow is a qcow2 I made with
nocow=true (chattr +C) option.
-rw-r--r--. 1 root root 194K Jun 16 20:48 untitled.qcow2

At the same time as this scrub, the kernel complained:

[ 5255.219838] BTRFS info (device sda6): using free space tree
[ 5255.219903] BTRFS info (device sda6): has skinny extents
[ 5255.236065] BTRFS info (device sda6): enabling ssd optimizations
[ 5396.839752] BTRFS warning (device sda6): Skipping commit of aborted
transaction.
[ 5396.839853] ------------[ cut here ]------------
[ 5396.839910] BTRFS: Transaction aborted (error -28)
[ 5396.840081] WARNING: CPU: 0 PID: 4583 at
fs/btrfs/transaction.c:1831 cleanup_transaction+0x5d/0xa0 [btrfs]
[ 5396.840190] Modules linked in: devlink rfcomm fuse
nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter
ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ip6table_nat
nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat
nf_nat_ipv4 nf_nat iptable_mangle iptable_raw iptable_security
nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink
ebtable_filter ebtables ip6table_filter ip6_tables bnep rpcrdma
ib_isert iscsi_target_mod ib_iser ib_srpt target_core_mod ib_srp
scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad iw_cxgb4
rdma_cm iw_cm ib_cm iw_cxgb3 ib_core snd_hda_codec_cirrus
snd_hda_codec_generic ledtrig_audio b43 cordic uvcvideo mac80211
videobuf2_vmalloc i915 btusb videobuf2_memops videobuf2_v4l2 btrtl
videobuf2_common btbcm btintel intel_rapl videodev bluetooth
snd_hda_codec_hdmi kvmgt cfg80211 x86_pkg_temp_thermal snd_hda_intel
intel_powerclamp media vfio_mdev snd_hda_codec mdev coretemp
vfio_iommu_type1 kvm_intel snd_hda_core ecdh_generic
[ 5396.840256]  ssb vfio bcm5974 snd_hwdep kvm rfkill snd_seq joydev
snd_seq_device ttm snd_pcm i2c_algo_bit irqbypass drm_kms_helper
intel_cstate snd_timer drm bcma snd intel_uncore iTCO_wdt mei_me
acpi_als iTCO_vendor_support soundcore mei intel_rapl_perf kfifo_buf
applesmc apple_gmux lpc_ich industrialio i2c_i801 input_polldev
pcc_cpufreq sbs sbshc apple_bl nls_utf8 isofs squashfs dm_multipath
btrfs xor zstd_compress raid6_pq crct10dif_pclmul 8021q libcrc32c
crc32_pclmul garp crc32c_intel zstd_decompress sdhci_pci mrp stp llc
cqhci sdhci ghash_clmulni_intel firewire_ohci tg3 mmc_core thunderbolt
firewire_core crc_itu_t video uas usb_storage hid_apple hid_appleir
sunrpc be2iscsi bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi
libcxgb qla4xxx iscsi_boot_sysfs iscsi_tcp libiscsi_tcp libiscsi
scsi_transport_iscsi loop
[ 5396.841981] CPU: 0 PID: 4583 Comm: btrfs-transacti Not tainted
5.0.7-300.fc30.x86_64 #1
[ 5396.842069] Hardware name: Apple Inc.
MacBookPro8,2/Mac-94245A3940C91C80, BIOS
MBP81.88Z.0050.B00.1804101331 04/10/18
[ 5396.842241] RIP: 0010:cleanup_transaction+0x5d/0xa0 [btrfs]
[ 5396.842308] Code: 83 f8 01 77 63 f0 49 0f ba ad 38 23 00 00 02 72
1b 41 83 fc fb 0f 84 97 8d 08 00 44 89 e6 48 c7 c7 58 72 5c c0 e8 d2
37 bb f7 <0f> 0b 44 89 e1 ba 27 07 00 00 4d 8d 66 28 48 89 ef 48 c7 c6
90 ae
[ 5396.842507] RSP: 0018:ffffb81a81c8fde8 EFLAGS: 00010282
[ 5396.842569] RAX: 0000000000000000 RBX: ffff8d2be70ecc20 RCX: ffff8d2c678168c0
[ 5396.842649] RDX: 0000000000000007 RSI: 0000000000000006 RDI: ffff8d2c678168c0
[ 5396.842729] RBP: ffff8d2be70ecbc8 R08: 0000000000000001 R09: 00000000000004a6
[ 5396.842812] R10: 000000000001a804 R11: 0000000000000007 R12: 00000000ffffffe4
[ 5396.842891] R13: ffff8d2b97488000 R14: ffff8d2c56db7200 R15: ffff8d2be70ecb00
[ 5396.842972] FS:  0000000000000000(0000) GS:ffff8d2c67800000(0000)
knlGS:0000000000000000
[ 5396.848211] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5396.853447] CR2: 00007fea0109510c CR3: 00000001bb20e001 CR4: 00000000000606f0
[ 5396.858717] Call Trace:
[ 5396.864015]  btrfs_commit_transaction+0x83f/0x930 [btrfs]
[ 5396.869353]  ? start_transaction+0x9b/0x470 [btrfs]
[ 5396.874657]  ? __next_timer_interrupt+0x70/0xd0
[ 5396.879998]  transaction_kthread+0x144/0x180 [btrfs]
[ 5396.885310]  kthread+0xfb/0x130
[ 5396.890599]  ? btrfs_cleanup_transaction+0x540/0x540 [btrfs]
[ 5396.895841]  ? kthread_park+0x90/0x90
[ 5396.901059]  ret_from_fork+0x35/0x40
[ 5396.906268] ---[ end trace c42b154d80664636 ]---
[ 5396.911560] BTRFS warning (device sda6): cleanup_transaction:1831:
Aborting unused transaction(No space left).
[ 5396.913679] BTRFS info (device sda6): delayed_refs has NO entry
[root@localhost-live ~]#


I don't know. There's no log tree and no transactions have happened,
so I don't know what transaction it wants to clean up.


Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-24 22:07 ` Chris Murphy
@ 2019-06-24 22:46   ` Chris Murphy
  2019-06-24 23:29     ` Holger Hoffstätte
  0 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2019-06-24 22:46 UTC (permalink / raw)
  To: Btrfs BTRFS

Same call trace on ro mount and ro scrub with 5.2.0-rc2+, but also an
additional call trace related to zstd. As this is a zstd compressed
file system, it might be related.

[  366.319583] ================================
[  366.325036] WARNING: inconsistent lock state
[  366.330615] 5.2.0-0.rc2.git1.2.fc31.x86_64 #1 Tainted: G        W
[  366.336202] --------------------------------
[  366.341788] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[  366.347423] swapper/4/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[  366.353042] 000000006070e818 (&(&wsm.lock)->rlock){+.?.}, at:
zstd_reclaim_timer_fn+0x26/0x170 [btrfs]

full dmesg for that is  here:
https://drive.google.com/open?id=1GJlSExS2xQcplVRyg-hjI47KWR7Vs7iS

I'm going to try to boot off the damaged volume, and use
systemd.debug-shell=1 to see if I can gain access to the system at the
crash point. I'm gonna guess it's this same message, failure to clean
up a transaction due to some prior corruption, and then probably goes
read only.


Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-24 22:46   ` Chris Murphy
@ 2019-06-24 23:29     ` Holger Hoffstätte
  2019-06-25  0:53       ` Chris Murphy
  0 siblings, 1 reply; 14+ messages in thread
From: Holger Hoffstätte @ 2019-06-24 23:29 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS

On 6/25/19 12:46 AM, Chris Murphy wrote:
> Same call trace on ro mount and ro scrub with 5.2.0-rc2+, but also an
> additional call trace related to zstd. As this is a zstd compressed
> file system, it might be related.
> 
> [  366.319583] ================================
> [  366.325036] WARNING: inconsistent lock state
> [  366.330615] 5.2.0-0.rc2.git1.2.fc31.x86_64 #1 Tainted: G        W
> [  366.336202] --------------------------------
> [  366.341788] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> [  366.347423] swapper/4/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> [  366.353042] 000000006070e818 (&(&wsm.lock)->rlock){+.?.}, at:
> zstd_reclaim_timer_fn+0x26/0x170 [btrfs]

Seeing that this is apparently *rc2* (which is insane to ship) the explanation
for the above is that 5.2-rc2 was tagged on 2019-05-26, but the patch for the
above problem [1] was committed *after* that, on 2019-05-28.

That may or may not explain the reason for the original error and its
corruption, but whatever you're doing, stop running that kernel..

-h

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fee13fe96529523a709d1fff487f14a5e0d56d34

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-24 21:19 5.2rc5 corruption, many "is compressed but inode flag doesn't allow" Chris Murphy
  2019-06-24 22:07 ` Chris Murphy
@ 2019-06-24 23:48 ` Qu Wenruo
  2019-06-25  1:01   ` Chris Murphy
  2019-06-25  2:49   ` Chris Murphy
  1 sibling, 2 replies; 14+ messages in thread
From: Qu Wenruo @ 2019-06-24 23:48 UTC (permalink / raw)
  To: Chris Murphy, Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 2614 bytes --]



On 2019/6/25 上午5:19, Chris Murphy wrote:
> Short version:
> sysroot is Btrfs. I made a rw snapshot of an ro snapshot in order to
> do a rollback, and then rebooted. The umount was clean, but during
> startup I noticed many services had failed, but couldn't tell why and
> I had no shell or remote service to get into the system. I don't know
> if a rw mount happened, journald is set to volatile journal so there's
> no log of this failed boot. I waited about 5 minutes in this state
> then forced a poweroff, rebooted from USB stick and btrfs check finds
> errors and collect  information. There are many btrfs check errors in
> lowmem mode. It does mount ro without complaint using kernel 5.0.7.

The errors from lowmem and original mode are matching each other.
Just lowmem mode prints every missing csum it hits while original mode
report in an inode basis.

The problem is not complex, one inode which shouldn't go through
compression (maybe nodatacow or nodatasum set) has go through compression.

This leads to missing csum while still compressed.

This could cause read time error.

The solution should be pretty easy, find inode 4665463 in root 492, and
remove it.

But before deleting it, would you please provide the filename and the
following output?

# btrfs ins dump-tree -t 492 | grep -C20 "4665463 INODE_ITEM"

Normally we shouldn't have any nodatacow/nodatasum inode goes through
compression, maybe zstd compression code doesn't follow that rule?

Thanks,
Qu
> 
> This system has used Btrfs for years with zero corruptions until now.
> This system was clean installed with Fedora Rawhide ~ 2 weeks ago,
> with a newly created file system.
> 
> Hardware:
> Apple Macbook Pro (2011)
> Samsung SSD 840 EVO 250GB, Firmware Version: EXT0DB6Q
> 
> Detail version:
> only btrfs-progs 5.1 used
> mix of kernel 5.2rc2, rc3, and rc5, at the time of the snapshot and
> reboot with scary failures, it was using rc5+, specifically git
> bed3c0d84e7e
> 
> mkfs.btrfs -d single -m single -n 32K /dev/sda6
> mount options: noatime,ssd,space_cache=v2,compress=zstd
> 
> btrfs check (normal)
> https://drive.google.com/open?id=1ZkIPq01lz1BxjOjxA22SUh0kSQaEeRex
> btrfs check lowmem
> https://drive.google.com/open?id=11INxc1OQbdpmqrVRlk0mNuLwmgSnnB1K
> btrfs super
> https://drive.google.com/open?id=13cNGSvjPTiYgG0smEFWecP3fk9P2Tn5G
> btrfs debug tree, 97MB zstd compressed
> https://drive.google.com/open?id=1TaxKikn9gKDjnkQpOrqWzdGYw17maCpt
> btrfs image, 31MB
> https://drive.google.com/open?id=1is_prjUhu8IBjq9oadugOi_YBYbGYGRA
> 
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-24 23:29     ` Holger Hoffstätte
@ 2019-06-25  0:53       ` Chris Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2019-06-25  0:53 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Jun 24, 2019 at 5:29 PM Holger Hoffstätte
<holger@applied-asynchrony.com> wrote:
>
> On 6/25/19 12:46 AM, Chris Murphy wrote:
> > Same call trace on ro mount and ro scrub with 5.2.0-rc2+, but also an
> > additional call trace related to zstd. As this is a zstd compressed
> > file system, it might be related.
> >
> > [  366.319583] ================================
> > [  366.325036] WARNING: inconsistent lock state
> > [  366.330615] 5.2.0-0.rc2.git1.2.fc31.x86_64 #1 Tainted: G        W
> > [  366.336202] --------------------------------
> > [  366.341788] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> > [  366.347423] swapper/4/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
> > [  366.353042] 000000006070e818 (&(&wsm.lock)->rlock){+.?.}, at:
> > zstd_reclaim_timer_fn+0x26/0x170 [btrfs]
>
> Seeing that this is apparently *rc2* (which is insane to ship) the explanation
> for the above is that 5.2-rc2 was tagged on 2019-05-26, but the patch for the
> above problem [1] was committed *after* that, on 2019-05-28.

a. Please relax. I said from the outset it's Fedora Rawhide, they
build mainline rc kernels on Monday, and each weekday there's a plus
version built from whatever's in Linus's tree prior to the next rc.
It's been this way for a very long time. No one puts Rawhide into
production including myself, it's a test machine.

b. That rc2+ kernel above was used today for read-only inspection of
the target Btrfs file system, just because it's what I already had on
hand. That I've been using the rc5+ kernel I cited in the first email
for the past three days, so I'm not sure why the problem starts
suddenly. The problem is with a stale systemd journal.



-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-24 23:48 ` Qu Wenruo
@ 2019-06-25  1:01   ` Chris Murphy
  2019-06-25  1:33     ` Chris Murphy
  2019-06-25  2:49   ` Chris Murphy
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2019-06-25  1:01 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Chris Murphy, Btrfs BTRFS

On Mon, Jun 24, 2019 at 5:48 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
> The problem is not complex, one inode which shouldn't go through
> compression (maybe nodatacow or nodatasum set) has go through compression.
>
> This leads to missing csum while still compressed.
>
> This could cause read time error.
>
> The solution should be pretty easy, find inode 4665463 in root 492, and
> remove it.

They are systemd journals. Based on the ctime/mtime they were clearly
modified today, which I don't understand because I've had volatile
journal set for a few days now. But you can see they are set to nocow,
and yet there are zstd compressed extents. We've discussed the case
before, where defragmenting a nocow file while the file system is
mounted with compress option, permits compression. I think that's
what's going on here. I've seen it with zlib and lzo too, but I
haven't used them in years so I don't know if that's still the case. I
know systemd-journald does by default set /var/log/journal with chattr
+C and also defragments journals upon rotation.


    item 106 key (4665463 INODE_ITEM 0) itemoff 49169 itemsize 160
        generation 57085 transid 59069 size 33554432 nbytes 33554432
        block group 0 mode 100640 links 1 uid 0 gid 190 rdev 0
        sequence 6143 flags 0x13(NODATASUM|NODATACOW|PREALLOC)
        atime 1561330326.52881457 (2019-06-23 22:52:06)
        ctime 1561390782.569639513 (2019-06-24 15:39:42)
        mtime 1561390782.443639516 (2019-06-24 15:39:42)
        otime 1561330326.52881457 (2019-06-23 22:52:06)
    item 107 key (4665463 INODE_REF 169326) itemoff 49110 itemsize 59
        index 52 namelen 49 name:
system@00058c139ef35bc6-a1c70e9eca5616e6.journal~
    item 108 key (4665463 XATTR_ITEM 843765919) itemoff 49056 itemsize 54
        location key (0 UNKNOWN.0 0) type XATTR
        transid 57085 data_len 8 name_len 16
        name: user.crtime_usec
    item 109 key (4665463 XATTR_ITEM 2038346239) itemoff 48951 itemsize 105
        location key (0 UNKNOWN.0 0) type XATTR
        transid 57085 data_len 52 name_len 23
        name: system.posix_acl_access
        data
    item 110 key (4665463 XATTR_ITEM 3817753667) itemoff 48874 itemsize 77
        location key (0 UNKNOWN.0 0) type XATTR
        transid 57085 data_len 31 name_len 16
        name: security.selinux
        data system_u:object_r:var_log_t:s0
    item 111 key (4665463 EXTENT_DATA 0) itemoff 48821 itemsize 53
        generation 59069 type 1 (regular)
        extent data disk byte 1115086848 nr 16384
        extent data offset 0 nr 131072 ram 131072
        extent compression 3 (zstd)


Anyway the problem looks minor, the file system itself is fine. But it
does stop the startup process.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-25  1:01   ` Chris Murphy
@ 2019-06-25  1:33     ` Chris Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2019-06-25  1:33 UTC (permalink / raw)
  Cc: Qu Wenruo, Btrfs BTRFS

On Mon, Jun 24, 2019 at 7:01 PM Chris Murphy <lists@colorremedies.com> wrote:
>
> Anyway the problem looks minor, the file system itself is fine. But it
> does stop the startup process.

Ahhhha! I was on "autopilot" and made the root read-only. And that's
why the startup fails. It has nothing to do with the compressed nocow
journal files. Hilarious.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-24 23:48 ` Qu Wenruo
  2019-06-25  1:01   ` Chris Murphy
@ 2019-06-25  2:49   ` Chris Murphy
  2019-06-25  6:58     ` Nikolay Borisov
  1 sibling, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2019-06-25  2:49 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

False alarm, not a new issue at all!

I have a different system on kernel 5.1.11 using Btrfs as root with
persistent systemd-journald storage, and compress=zstd. And I never
have problems with it, so I never run btrfs check on it, until now.
And yep, same problem. All the journals that have been rotated, are
zstd compressed, are nocow, and btrfs check complains about them the
same way.

root 257 inode 62526 errors 1040, bad file extent, some csum missing
root 257 inode 62734 errors 1040, bad file extent, some csum missing

Turns out this is just btrfs check noise. There's definitely no
corruption. These files still pass journalctl --verify which is
checking its own internal checksumming in the journal file.

I don't know what's best practice. I think on any kind of flash media,
I'd rather not have +C by default, so that the logs compress on the
fly, and also rather not have defragment on rotate because that also
just increases wear by rewriting everything. Yes the journals are
heavily fragmented if they are allowed to COW, but I don't think I
really care about legacy files being a little slower on flash. *shrug*


Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-25  2:49   ` Chris Murphy
@ 2019-06-25  6:58     ` Nikolay Borisov
  2019-06-25 15:31       ` Chris Murphy
                         ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: Nikolay Borisov @ 2019-06-25  6:58 UTC (permalink / raw)
  To: Chris Murphy, Qu Wenruo; +Cc: Btrfs BTRFS



On 25.06.19 г. 5:49 ч., Chris Murphy wrote:
> False alarm, not a new issue at all!
> 
> I have a different system on kernel 5.1.11 using Btrfs as root with
> persistent systemd-journald storage, and compress=zstd. And I never
> have problems with it, so I never run btrfs check on it, until now.
> And yep, same problem. All the journals that have been rotated, are
> zstd compressed, are nocow, and btrfs check complains about them the
> same way.
> 
> root 257 inode 62526 errors 1040, bad file extent, some csum missing
> root 257 inode 62734 errors 1040, bad file extent, some csum missing
> 
> Turns out this is just btrfs check noise. There's definitely no
> corruption. These files still pass journalctl --verify which is
> checking its own internal checksumming in the journal file.
> 
> I don't know what's best practice. I think on any kind of flash media,
> I'd rather not have +C by default, so that the logs compress on the
> fly, and also rather not have defragment on rotate because that also
> just increases wear by rewriting everything. Yes the journals are
> heavily fragmented if they are allowed to COW, but I don't think I
> really care about legacy files being a little slower on flash. *shrug*

But why are your nocompress files being compressed? I just tested latest
misc-next branch and a mounted fs with -ocomopress=zstd correctly skips
compression on a file where chattr +C has been set?

> 
> 
> Chris Murphy
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-25  6:58     ` Nikolay Borisov
@ 2019-06-25 15:31       ` Chris Murphy
  2019-06-25 16:19       ` Chris Murphy
  2019-06-26 18:47       ` Chris Murphy
  2 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2019-06-25 15:31 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS

On Tue, Jun 25, 2019 at 12:58 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
>
>
> On 25.06.19 г. 5:49 ч., Chris Murphy wrote:
> > False alarm, not a new issue at all!
> >
> > I have a different system on kernel 5.1.11 using Btrfs as root with
> > persistent systemd-journald storage, and compress=zstd. And I never
> > have problems with it, so I never run btrfs check on it, until now.
> > And yep, same problem. All the journals that have been rotated, are
> > zstd compressed, are nocow, and btrfs check complains about them the
> > same way.
> >
> > root 257 inode 62526 errors 1040, bad file extent, some csum missing
> > root 257 inode 62734 errors 1040, bad file extent, some csum missing
> >
> > Turns out this is just btrfs check noise. There's definitely no
> > corruption. These files still pass journalctl --verify which is
> > checking its own internal checksumming in the journal file.
> >
> > I don't know what's best practice. I think on any kind of flash media,
> > I'd rather not have +C by default, so that the logs compress on the
> > fly, and also rather not have defragment on rotate because that also
> > just increases wear by rewriting everything. Yes the journals are
> > heavily fragmented if they are allowed to COW, but I don't think I
> > really care about legacy files being a little slower on flash. *shrug*
>
> But why are your nocompress files being compressed? I just tested latest
> misc-next branch and a mounted fs with -ocomopress=zstd correctly skips
> compression on a file where chattr +C has been set?

test.journal has +C set and filefrag shows it's not encoded, single extent.

$ sudo btrfs fi def -vf test.journal

Result? No compression still.

$ sudo btrfs fi def -vf -czstd test.journal

Result? It's compressed.

What is systemd doing? I can't really read the code, but in the
comments they say they are unsetting +C and then defragmenting?

https://github.com/systemd/systemd/blob/master/src/journal/journal-file.c

line 371:
/* Be friendly to btrfs: turn COW back on again now,
* and defragment the file. We won't write to the file
* ever again, hence remove all fragmentation, and
* reenable all the good bits COW usually provides
* (such as data checksumming). */

Except all of the logs in /var/log/journal/<machineid>/ have +C set.
So I don't understand that comment at all. They might think they're
unsetting it, but it has no effect because the new file inherits the
+C set on the enclosing directory the journal files are in:
/var/log/journal/<machineid>

There's more btrfs defragment stuff later in this same journal-file.c
code but I don't understand how it relates to "turn COW back on
again", just that they are definitely submitting files for
defragmentation at log rotate time. But possibly they way they submit
it is different than 'btrfs fi defrag' default, and somehow it gets
compressed.

Anyway this isn't new, all three of my cleanly installed systems on
Btrfs behave this way (all have compress=zstd set). Years ago I
mentioned this same behavior on this list with respect to zlib. The
difference back then I think is that btrfs check never complained.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-25  6:58     ` Nikolay Borisov
  2019-06-25 15:31       ` Chris Murphy
@ 2019-06-25 16:19       ` Chris Murphy
  2019-06-25 16:58         ` Chris Murphy
  2019-06-26 18:47       ` Chris Murphy
  2 siblings, 1 reply; 14+ messages in thread
From: Chris Murphy @ 2019-06-25 16:19 UTC (permalink / raw)
  To: Nikolay Borisov; +Cc: Chris Murphy, Qu Wenruo, Btrfs BTRFS

On Tue, Jun 25, 2019 at 12:58 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
> But why are your nocompress files being compressed?

$ sudo strace -p 624 -k -t -o sysdjourndstrace.txt
https://drive.google.com/open?id=1IspAjQ6b9dVizjqrX4E6ZErzUKH0CQBl

Maybe there's a better way to see what's going on. I do see chattr but
I don't know the value or the file. I wonder if the original file has
+C unset, then it's defragmented which somehow permits btrfs to
compress even though unsetting +C on a non-zero length file shouldn't
succeed, but then because it's written in a directory with +C the new
defragmented file inherits +C and gets no csums even though it's
compressed?

I'm throwing spaghetti at a wall though. (I know that's not a good way
to test spaghetti, so the analogy works perfectly.)


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-25 16:19       ` Chris Murphy
@ 2019-06-25 16:58         ` Chris Murphy
  0 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2019-06-25 16:58 UTC (permalink / raw)
  To: Chris Murphy; +Cc: Nikolay Borisov, Qu Wenruo, Btrfs BTRFS

The order I see is:


rename() currentsyslog archivesyslog
openat() currentsyslog
fsync() currentsyslog
fsync() dir of currentsyslog
fallocate() currentsyslog

repeats for the user log

Where I get lost is at

10:04:18 ioctl(103, FS_IOC_SETFLAGS, 0x7ffc6429954c) = 0
 > /usr/lib64/libc-2.29.9000.so(ioctl+0xb) [0xf736b]
 > /usr/lib/systemd/libsystemd-shared-242.so(chattr_fd+0xe0) [0x11da20]

I don't know if that's doing 'chattr -C' on the renamed, now archived,
log. But right after that is

10:04:18 ioctl(103, BTRFS_IOC_DEFRAG)   = 0
 > /usr/lib64/libc-2.29.9000.so(ioctl+0xb) [0xf736b]
 > /usr/lib/systemd/libsystemd-shared-242.so(btrfs_defrag_fd+0x23) [0x130803]

So even if -C can't really succeed on a non-empty file, maybe it
somehow allows defrag to compress on a filesystem that's mounted with
compress option?

Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: 5.2rc5 corruption, many "is compressed but inode flag doesn't allow"
  2019-06-25  6:58     ` Nikolay Borisov
  2019-06-25 15:31       ` Chris Murphy
  2019-06-25 16:19       ` Chris Murphy
@ 2019-06-26 18:47       ` Chris Murphy
  2 siblings, 0 replies; 14+ messages in thread
From: Chris Murphy @ 2019-06-26 18:47 UTC (permalink / raw)
  To: Btrfs BTRFS; +Cc: Qu Wenruo, Nikolay Borisov

On Tue, Jun 25, 2019 at 12:58 AM Nikolay Borisov <nborisov@suse.com> wrote:
>
> But why are your nocompress files being compressed? I just tested latest
> misc-next branch and a mounted fs with -ocomopress=zstd correctly skips
> compression on a file where chattr +C has been set?
>

This is reproducible with zlib and lzo. Upon rotate, the nocow journal
is compressed. And 'btrfs check' complains about them too with "bad
file extent, some csum missing'

compress=zlib set in fstab
journalctl --rotate

    item 98 key (4731175 INODE_ITEM 0) itemoff 54181 itemsize 160
        generation 61345 transid 62912 size 33554432 nbytes 33554432
        block group 0 mode 100640 links 1 uid 0 gid 190 rdev 0
        sequence 12617 flags 0x1b(NODATASUM|NODATACOW|NOCOMPRESS|PREALLOC)
        atime 1561478658.193673946 (2019-06-25 10:04:18)
        ctime 1561574037.650029396 (2019-06-26 12:33:57)
        mtime 1561574028.883189998 (2019-06-26 12:33:48)
        otime 1561478658.193673946 (2019-06-25 10:04:18)
    item 99 key (4731175 INODE_REF 169326) itemoff 54090 itemsize 91
        index 62 namelen 81 name:
system@bf2624eef8134e64a38931ef85b2311e-0000000000012978-00058c2814bec367.journal
    item 100 key (4731175 XATTR_ITEM 843765919) itemoff 54036 itemsize 54
        location key (0 UNKNOWN.0 0) type XATTR
        transid 61345 data_len 8 name_len 16
        name: user.crtime_usec
    item 101 key (4731175 XATTR_ITEM 2038346239) itemoff 53931 itemsize 105
        location key (0 UNKNOWN.0 0) type XATTR
        transid 61345 data_len 52 name_len 23
        name: system.posix_acl_access
        data
    item 102 key (4731175 XATTR_ITEM 3817753667) itemoff 53854 itemsize 77
        location key (0 UNKNOWN.0 0) type XATTR
        transid 61345 data_len 31 name_len 16
        name: security.selinux
        data system_u:object_r:var_log_t:s0
    item 103 key (4731175 EXTENT_DATA 0) itemoff 53801 itemsize 53
        generation 62912 type 1 (regular)
        extent data disk byte 1125711872 nr 24576
        extent data offset 0 nr 131072 ram 131072
        extent compression 1 (zlib)
    item 104 key (4731175 EXTENT_DATA 131072) itemoff 53748 itemsize 53
        generation 62912 type 1 (regular)
        extent data disk byte 1125736448 nr 24576
        extent data offset 0 nr 131072 ram 131072
        extent compression 1 (zlib)


reboot with compress=lzo in fstab
journalctl --rotate

    item 31 key (4736226 INODE_ITEM 0) itemoff 62096 itemsize 160
        generation 62931 transid 62959 size 16777216 nbytes 16777216
        block group 0 mode 100640 links 1 uid 0 gid 190 rdev 0
        sequence 47 flags 0x13(NODATASUM|NODATACOW|PREALLOC)
        atime 1561574634.315474762 (2019-06-26 12:43:54)
        ctime 1561574674.904503161 (2019-06-26 12:44:34)
        mtime 1561574674.902503198 (2019-06-26 12:44:34)
        otime 1561574634.315474762 (2019-06-26 12:43:54)
    item 32 key (4736226 INODE_REF 169326) itemoff 62005 itemsize 91
        index 4 namelen 81 name:
system@316504f5e5cb4e6eab2ad1d018bc4d2c-0000000000000001-00058c3e6d0ff5a3.journal
    item 33 key (4736226 XATTR_ITEM 843765919) itemoff 61951 itemsize 54
        location key (0 UNKNOWN.0 0) type XATTR
        transid 62931 data_len 8 name_len 16
        name: user.crtime_usec
    item 34 key (4736226 XATTR_ITEM 2038346239) itemoff 61846 itemsize 105
        location key (0 UNKNOWN.0 0) type XATTR
        transid 62931 data_len 52 name_len 23
        name: system.posix_acl_access
        data
    item 35 key (4736226 XATTR_ITEM 3817753667) itemoff 61769 itemsize 77
        location key (0 UNKNOWN.0 0) type XATTR
        transid 62931 data_len 31 name_len 16
        name: security.selinux
        data system_u:object_r:var_log_t:s0
    item 36 key (4736226 EXTENT_DATA 0) itemoff 61716 itemsize 53
        generation 62959 type 1 (regular)
        extent data disk byte 1112309760 nr 12288
        extent data offset 0 nr 131072 ram 131072
        extent compression 2 (lzo)
    item 37 key (4736226 EXTENT_DATA 131072) itemoff 61663 itemsize 53
        generation 62959 type 1 (regular)
        extent data disk byte 1114071040 nr 8192
        extent data offset 0 nr 131072 ram 131072
        extent compression 2 (lzo)


Super easy to reproduce.

-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-06-26 18:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-24 21:19 5.2rc5 corruption, many "is compressed but inode flag doesn't allow" Chris Murphy
2019-06-24 22:07 ` Chris Murphy
2019-06-24 22:46   ` Chris Murphy
2019-06-24 23:29     ` Holger Hoffstätte
2019-06-25  0:53       ` Chris Murphy
2019-06-24 23:48 ` Qu Wenruo
2019-06-25  1:01   ` Chris Murphy
2019-06-25  1:33     ` Chris Murphy
2019-06-25  2:49   ` Chris Murphy
2019-06-25  6:58     ` Nikolay Borisov
2019-06-25 15:31       ` Chris Murphy
2019-06-25 16:19       ` Chris Murphy
2019-06-25 16:58         ` Chris Murphy
2019-06-26 18:47       ` Chris Murphy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).