fatal database corruption with btrfs "out of space" with ~50 GB left

All of lore.kernel.org
 help / color / mirror / Atom feed

* fatal database corruption with btrfs "out of space" with ~50 GB left
@ 2018-02-14 14:19 Tomasz Chmielewski
  2018-02-15  1:25 ` Duncan
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tomasz Chmielewski @ 2018-02-14 14:19 UTC (permalink / raw)
  To: Btrfs BTRFS

Just FYI, how dangerous running btrfs can be - we had a fatal, 
unrecoverable MySQL corruption when btrfs decided to do one of these "I 
have ~50 GB left, so let's do out of space (and corrupt some files at 
the same time, ha ha!)".

Running btrfs RAID-1 with kernel 4.14.

Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-14 14:19 fatal database corruption with btrfs "out of space" with ~50 GB left Tomasz Chmielewski
@ 2018-02-15  1:25 ` Duncan
  2018-02-15  1:47 ` Qu Wenruo
  2018-02-19  4:29 ` Anand Jain
  2 siblings, 0 replies; 12+ messages in thread
From: Duncan @ 2018-02-15  1:25 UTC (permalink / raw)
  To: linux-btrfs

Tomasz Chmielewski posted on Wed, 14 Feb 2018 23:19:20 +0900 as excerpted:

> Just FYI, how dangerous running btrfs can be - we had a fatal,
> unrecoverable MySQL corruption when btrfs decided to do one of these "I
> have ~50 GB left, so let's do out of space (and corrupt some files at
> the same time, ha ha!)".

Ouch!

> Running btrfs RAID-1 with kernel 4.14.

Kernel 4.14... quite current... good.  But 4.14.0 first release, 4.14.x 
current stable, or somewhere (where?) in between?

And please post the output of btrfs fi usage for that filesystem.  
Without that (or fi sh and fi df, the pre-usage method of getting nearly 
the same info), it's hard to say where or what the problem was.

Meanwhile, FWIW there was a recent metadata over-reserve bug that should 
be fixed in 4.15 and the latest 4.14 stable, but IDR whether it affected 
4.14.0 original or only the 4.13 series and early 4.14-rcs and was fixed 
by 4.14.0.  The bug seemed to trigger most frequently when doing balances 
or other major writes to the filesystem, on middle to large sized 
filesystems.  (My all under quarter-TB each btrfs didn't appear to be 
affected.)

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-14 14:19 fatal database corruption with btrfs "out of space" with ~50 GB left Tomasz Chmielewski
  2018-02-15  1:25 ` Duncan
@ 2018-02-15  1:47 ` Qu Wenruo
  2018-02-15  4:19   ` Tomasz Chmielewski
  2018-02-19  4:29 ` Anand Jain
  2 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2018-02-15  1:47 UTC (permalink / raw)
  To: Tomasz Chmielewski, Btrfs BTRFS

[-- Attachment #1.1: Type: text/plain, Size: 1298 bytes --]

On 2018年02月14日 22:19, Tomasz Chmielewski wrote:
> Just FYI, how dangerous running btrfs can be - we had a fatal,
> unrecoverable MySQL corruption when btrfs decided to do one of these "I
> have ~50 GB left, so let's do out of space (and corrupt some files at
> the same time, ha ha!)".

I'm recently looking into unexpected corruption problem of btrfs.

Would you please provide some extra info about how the corruption happened?

1) Is there any power reset?
   Btrfs should be bullet proof, but in fact it's not, so I'm here to
   get some clue.

2) Are MySQL files set with nodatacow?
   If so, data corruption is more or less expected, but should be
   handled by checkpoint of MySQL.

3) Is the filesystem metadata corrupted? (AKA, btrfs check report error)
   If so, that should be the problem I'm looking into.

4) Metadata/data ratio?
   "btrfs fi usage" could have quite good result about it.
   And "btrfs fi df" also helps.

Thanks,
Qu

> 
> Running btrfs RAID-1 with kernel 4.14.
> 
> 
> 
> Tomasz Chmielewski
> https://lxadm.com
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-15  1:47 ` Qu Wenruo
@ 2018-02-15  4:19   ` Tomasz Chmielewski
  2018-02-15  4:32     ` Qu Wenruo
  0 siblings, 1 reply; 12+ messages in thread
From: Tomasz Chmielewski @ 2018-02-15  4:19 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

On 2018-02-15 10:47, Qu Wenruo wrote:
> On 2018年02月14日 22:19, Tomasz Chmielewski wrote:
>> Just FYI, how dangerous running btrfs can be - we had a fatal,
>> unrecoverable MySQL corruption when btrfs decided to do one of these 
>> "I
>> have ~50 GB left, so let's do out of space (and corrupt some files at
>> the same time, ha ha!)".
> 
> I'm recently looking into unexpected corruption problem of btrfs.
> 
> Would you please provide some extra info about how the corruption 
> happened?
> 
> 1) Is there any power reset?
>    Btrfs should be bullet proof, but in fact it's not, so I'm here to
>    get some clue.

No power reset.


> 2) Are MySQL files set with nodatacow?
>    If so, data corruption is more or less expected, but should be
>    handled by checkpoint of MySQL.

Yes, MySQL files were using "nodatacow".

I've seen many cases of "filesystem full" with ext4, but none lead to 
database corruption (i.e. the database would always recover after 
releasing some space)

On the other hand, I've seen a handful of "out of space" with gigabytes 
of free space with btrfs, which lead to some light, heavy or 
unrecoverable MySQL or mongo corruption.


Can it be because of of how "predictable" out of space situations are 
with btrfs and other filesystems?

- in short, ext4 will report out of space when there is 0 bytes left 
(perhaps slightly faster for non-root users) - the application trying to 
write data will see "out of space" at some point, and it can stay like 
this for hours (i.e. until some data is removed manually)

- on the other hand, btrfs can report out of space when there is still 
10, 50 or 100 GB left, meaning, any capacity planning is close to 
impossible; also, the application trying to write data can be seeing the 
fs as transitioning between "out of space" and "data written 
successfully" many times per minute/second?


> 3) Is the filesystem metadata corrupted? (AKA, btrfs check report 
> error)
>    If so, that should be the problem I'm looking into.

I don't think so, there are no scary things in dmesg. However, I didn't 
unmount the filesystem to run btrfs check.


> 4) Metadata/data ratio?
>    "btrfs fi usage" could have quite good result about it.
>    And "btrfs fi df" also helps.

Here it is - however, that's after removing some 80 GB data, so most 
likely doesn't reflect when the failure happened.

# btrfs fi usage /var/lib/lxd
Overall:
     Device size:                 846.25GiB
     Device allocated:            840.05GiB
     Device unallocated:            6.20GiB
     Device missing:                  0.00B
     Used:                        498.26GiB
     Free (estimated):            167.96GiB      (min: 167.96GiB)
     Data ratio:                       2.00
     Metadata ratio:                   2.00
     Global reserve:              512.00MiB      (used: 0.00B)

Data,RAID1: Size:411.00GiB, Used:246.14GiB
    /dev/sda3     411.00GiB
    /dev/sdb3     411.00GiB

Metadata,RAID1: Size:9.00GiB, Used:2.99GiB
    /dev/sda3       9.00GiB
    /dev/sdb3       9.00GiB

System,RAID1: Size:32.00MiB, Used:80.00KiB
    /dev/sda3      32.00MiB
    /dev/sdb3      32.00MiB

Unallocated:
    /dev/sda3       3.10GiB
    /dev/sdb3       3.10GiB



# btrfs fi df /var/lib/lxd
Data, RAID1: total=411.00GiB, used=246.15GiB
System, RAID1: total=32.00MiB, used=80.00KiB
Metadata, RAID1: total=9.00GiB, used=2.99GiB
GlobalReserve, single: total=512.00MiB, used=0.00B



# btrfs fi show /var/lib/lxd
Label: 'btrfs'  uuid: f5f30428-ec5b-4497-82de-6e20065e6f61
         Total devices 2 FS bytes used 249.15GiB
         devid    1 size 423.13GiB used 420.03GiB path /dev/sda3
         devid    2 size 423.13GiB used 420.03GiB path /dev/sdb3



Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-15  4:19   ` Tomasz Chmielewski
@ 2018-02-15  4:32     ` Qu Wenruo
  2018-02-15  7:02       ` Tomasz Chmielewski
  0 siblings, 1 reply; 12+ messages in thread
From: Qu Wenruo @ 2018-02-15  4:32 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 5974 bytes --]



On 2018年02月15日 12:19, Tomasz Chmielewski wrote:
> On 2018-02-15 10:47, Qu Wenruo wrote:
>> On 2018年02月14日 22:19, Tomasz Chmielewski wrote:
>>> Just FYI, how dangerous running btrfs can be - we had a fatal,
>>> unrecoverable MySQL corruption when btrfs decided to do one of these "I
>>> have ~50 GB left, so let's do out of space (and corrupt some files at
>>> the same time, ha ha!)".
>>
>> I'm recently looking into unexpected corruption problem of btrfs.
>>
>> Would you please provide some extra info about how the corruption
>> happened?
>>
>> 1) Is there any power reset?
>>    Btrfs should be bullet proof, but in fact it's not, so I'm here to
>>    get some clue.
> 
> No power reset.
> 
> 
>> 2) Are MySQL files set with nodatacow?
>>    If so, data corruption is more or less expected, but should be
>>    handled by checkpoint of MySQL.
> 
> Yes, MySQL files were using "nodatacow".
> 
> I've seen many cases of "filesystem full" with ext4, but none lead to
> database corruption (i.e. the database would always recover after
> releasing some space)
> 
> On the other hand, I've seen a handful of "out of space" with gigabytes
> of free space with btrfs, which lead to some light, heavy or
> unrecoverable MySQL or mongo corruption.

Is there any kernel message like kernel warning or backtrace?

> 
> 
> Can it be because of of how "predictable" out of space situations are
> with btrfs and other filesystems?

It's possible.
As other filesystem doesn't really dynamically allocate its metadata
space, (ext4/xfs just have limited inode numbers) and btrfs strictly
split metadata and data usage, it's possible that we still have gigas of
data space, but run out of meta space.

> 
> - in short, ext4 will report out of space when there is 0 bytes left
> (perhaps slightly faster for non-root users) - the application trying to
> write data will see "out of space" at some point, and it can stay like
> this for hours (i.e. until some data is removed manually)
> 
> - on the other hand, btrfs can report out of space when there is still
> 10, 50 or 100 GB left, meaning, any capacity planning is close to
> impossible; also, the application trying to write data can be seeing the
> fs as transitioning between "out of space" and "data written
> successfully" many times per minute/second?
> 
> 
>> 3) Is the filesystem metadata corrupted? (AKA, btrfs check report error)
>>    If so, that should be the problem I'm looking into.
> 
> I don't think so, there are no scary things in dmesg. However, I didn't
> unmount the filesystem to run btrfs check.

If no scary kernel warning, then it may be less serious.

One of my assumption is, snapshots are used in your btrfs setup (well,
more or less the only two selling points of btrfs), and even your DB is
using nodatacow, due to snapshot, it still does data CoW.

And when the DB fails to write some critical data, maybe write ahead log
or similiar things, due to ENOSPC, it causes inconsistency.

The problem here is, most DB assumes the filesystem is defaulted to do
overwrite successfully, while it's not always true to btrfs.

If that's the case, would you please remove all snapshots of your DB
subvolume? Or just put all DB into one subvolume and never snapshot it?
Then nodatacow should work as expected, so it would be more or less more
similar to ext4/xfs. (Although still slower than ext4/xfs)

> 
> 
>> 4) Metadata/data ratio?
>>    "btrfs fi usage" could have quite good result about it.
>>    And "btrfs fi df" also helps.
> 
> Here it is - however, that's after removing some 80 GB data, so most
> likely doesn't reflect when the failure happened.
> 
> # btrfs fi usage /var/lib/lxd
> Overall:
>     Device size:                 846.25GiB
>     Device allocated:            840.05GiB
>     Device unallocated:            6.20GiB

That's should prevent further ENOSPC, as long as this number is beyond 1G.

>     Device missing:                  0.00B
>     Used:                        498.26GiB
>     Free (estimated):            167.96GiB      (min: 167.96GiB)
>     Data ratio:                       2.00
>     Metadata ratio:                   2.00
>     Global reserve:              512.00MiB      (used: 0.00B)
> 
> Data,RAID1: Size:411.00GiB, Used:246.14GiB
>    /dev/sda3     411.00GiB
>    /dev/sdb3     411.00GiB
> 
> Metadata,RAID1: Size:9.00GiB, Used:2.99GiB
>    /dev/sda3       9.00GiB
>    /dev/sdb3       9.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:80.00KiB
>    /dev/sda3      32.00MiB
>    /dev/sdb3      32.00MiB
> 
> Unallocated:
>    /dev/sda3       3.10GiB
>    /dev/sdb3       3.10GiB
> 
> 
> 
> # btrfs fi df /var/lib/lxd
> Data, RAID1: total=411.00GiB, used=246.15GiB
> System, RAID1: total=32.00MiB, used=80.00KiB
> Metadata, RAID1: total=9.00GiB, used=2.99GiB

Not sure if the removal of 80G has anything to do with this, but this
seems that your metadata (along with data) is quite scattered.

It's really recommended to keep some unallocated device space, and one
of the method to do that is to use balance to free such scattered space
from data/metadata usage.

And that's why balance routine is recommened for btrfs.

Thanks,
Qu

> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> 
> 
> # btrfs fi show /var/lib/lxd
> Label: 'btrfs'  uuid: f5f30428-ec5b-4497-82de-6e20065e6f61
>         Total devices 2 FS bytes used 249.15GiB
>         devid    1 size 423.13GiB used 420.03GiB path /dev/sda3
>         devid    2 size 423.13GiB used 420.03GiB path /dev/sdb3
> 
> 
> 
> Tomasz Chmielewski
> https://lxadm.com


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-15  4:32     ` Qu Wenruo
@ 2018-02-15  7:02       ` Tomasz Chmielewski
  2018-02-15  7:17         ` Tomasz Chmielewski
                           ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Tomasz Chmielewski @ 2018-02-15  7:02 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

On 2018-02-15 13:32, Qu Wenruo wrote:

> Is there any kernel message like kernel warning or backtrace?

I see there was this one:

Feb 13 13:53:32 lxd01 kernel: [9351710.878404] ------------[ cut here 
]------------
Feb 13 13:53:32 lxd01 kernel: [9351710.878430] WARNING: CPU: 9 PID: 7780 
at /home/kernel/COD/linux/fs/btrfs/tree-log.c:3361 
log_dir_items+0x54b/0x560 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878431] Modules linked in: 
nfnetlink_queue bluetooth ecdh_generic xt_nat xt_REDIRECT 
nf_nat_redirect sunrpc cfg80211 tcp_diag inet_diag xt_NFLOG 
nfnetlink_log nfnetlink xt_conntrack ipt_REJECT nf_reject_ipv4 
binfmt_misc veth ebtable_filter ebtables ip6t_MASQUERADE 
nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 
nf_nat_ipv6 xt_comment nf_log_ipv4 nf_log_common xt_LOG ipt_MASQUERADE 
nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 
nf_nat_ipv4 nf_nat ip_vs nf_conntrack ip6table_filter ip6_tables 
iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle ip_tables x_tables 
intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
irqbypass btrfs bridge stp llc crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel pcbc zstd_compress aesni_intel aes_x86_64
Feb 13 13:53:32 lxd01 kernel: [9351710.878460]  crypto_simd glue_helper 
cryptd input_leds intel_cstate ipmi_ssif intel_rapl_perf serio_raw 
lpc_ich shpchp ipmi_devintf ipmi_msghandler tpm_infineon acpi_pad 
mac_hid autofs4 raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops igb drm 
dca ahci ptp pps_core libahci i2c_algo_bit wmi
Feb 13 13:53:32 lxd01 kernel: [9351710.878484] CPU: 9 PID: 7780 Comm: 
TaskSchedulerBa Tainted: G        W       4.14.0-041400rc6-generic 
#201710230731
Feb 13 13:53:32 lxd01 kernel: [9351710.878485] Hardware name: ASUSTeK 
COMPUTER INC. Z10PA-U8 Series/Z10PA-U8 Series, BIOS 0601 06/26/2015
Feb 13 13:53:32 lxd01 kernel: [9351710.878486] task: ffff9454227d1700 
task.stack: ffffabc6a810c000
Feb 13 13:53:32 lxd01 kernel: [9351710.878502] RIP: 
0010:log_dir_items+0x54b/0x560 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878502] RSP: 
0018:ffffabc6a810f980 EFLAGS: 00010202
Feb 13 13:53:32 lxd01 kernel: [9351710.878503] RAX: 0000000000000001 
RBX: 000000000008b771 RCX: 0000000000000000
Feb 13 13:53:32 lxd01 kernel: [9351710.878504] RDX: 0000000000000000 
RSI: 0000000000000000 RDI: 0000000000000000
Feb 13 13:53:32 lxd01 kernel: [9351710.878505] RBP: ffffabc6a810fa28 
R08: ffff9491a8f05540 R09: 0000000000000008
Feb 13 13:53:32 lxd01 kernel: [9351710.878506] R10: 0000000000000000 
R11: ffffabc6a810f934 R12: ffffabc6a810fe50
Feb 13 13:53:32 lxd01 kernel: [9351710.878506] R13: ffff94666d426000 
R14: ffff9491a8f05540 R15: 0000000000000054
Feb 13 13:53:32 lxd01 kernel: [9351710.878508] FS:  
00007f9936e22700(0000) GS:ffff9491bf440000(0000) knlGS:0000000000000000
Feb 13 13:53:32 lxd01 kernel: [9351710.878508] CS:  0010 DS: 0000 ES: 
0000 CR0: 0000000080050033
Feb 13 13:53:32 lxd01 kernel: [9351710.878509] CR2: 00007f6abef4d7b0 
CR3: 00000023ecaf7006 CR4: 00000000001606e0
Feb 13 13:53:32 lxd01 kernel: [9351710.878510] Call Trace:
Feb 13 13:53:32 lxd01 kernel: [9351710.878524]  ? 
btrfs_search_slot+0x81b/0x9c0 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878538]  
log_directory_changes+0x83/0xd0 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878551]  
btrfs_log_inode+0xa24/0x11a0 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878563]  ? 
generic_bin_search.constprop.37+0xe7/0x1f0 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878565]  ? find_inode+0x59/0xb0
Feb 13 13:53:32 lxd01 kernel: [9351710.878567]  ? 
iget5_locked+0x9e/0x1e0
Feb 13 13:53:32 lxd01 kernel: [9351710.878582]  
log_new_dir_dentries+0x203/0x4a7 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878595]  
btrfs_log_inode_parent+0x6c2/0xa10 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878598]  ? 
pagevec_lookup_tag+0x21/0x30
Feb 13 13:53:32 lxd01 kernel: [9351710.878599]  ? 
__filemap_fdatawait_range+0x9a/0x170
Feb 13 13:53:32 lxd01 kernel: [9351710.878614]  ? 
wait_current_trans+0x33/0x110 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878627]  ? 
join_transaction+0x27/0x420 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878639]  
btrfs_log_dentry_safe+0x60/0x80 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878658]  
btrfs_sync_file+0x2d1/0x410 [btrfs]
Feb 13 13:53:32 lxd01 kernel: [9351710.878661]  
vfs_fsync_range+0x4b/0xb0
Feb 13 13:53:32 lxd01 kernel: [9351710.878663]  do_fsync+0x3d/0x70
Feb 13 13:53:32 lxd01 kernel: [9351710.878668]  SyS_fdatasync+0x13/0x20
Feb 13 13:53:32 lxd01 kernel: [9351710.878670]  do_syscall_64+0x61/0x120
Feb 13 13:53:32 lxd01 kernel: [9351710.878673]  
entry_SYSCALL64_slow_path+0x25/0x25
Feb 13 13:53:32 lxd01 kernel: [9351710.878674] RIP: 0033:0x7f99461437dd
Feb 13 13:53:32 lxd01 kernel: [9351710.878675] RSP: 
002b:00007f9936e20f10 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
Feb 13 13:53:32 lxd01 kernel: [9351710.878676] RAX: ffffffffffffffda 
RBX: 0000307d6f5d1070 RCX: 00007f99461437dd
Feb 13 13:53:32 lxd01 kernel: [9351710.878677] RDX: 000000000000005c 
RSI: 0000000000080000 RDI: 000000000000005c
Feb 13 13:53:32 lxd01 kernel: [9351710.878678] RBP: 0000000000000000 
R08: 0000000000000000 R09: 0000000000000000
Feb 13 13:53:32 lxd01 kernel: [9351710.878679] R10: 00000000ffffffff 
R11: 0000000000000293 R12: 0000000000001000
Feb 13 13:53:32 lxd01 kernel: [9351710.878679] R13: 0000307d6f550b00 
R14: 0000000000000000 R15: 0000000000001000
Feb 13 13:53:32 lxd01 kernel: [9351710.878681] Code: 89 85 6c ff ff ff 
4c 8b 95 70 ff ff ff 74 23 4c 89 f7 e8 a9 dc f8 ff 48 8b 7d 88 e8 a0 dc 
f8 ff 8b 85 6c ff ff ff e9 d8 fb ff ff <0f> ff e9 35 fe ff ff 4c 89 55 
18 e9 56 fc ff ff e8 60 65 61 eb
Feb 13 13:53:32 lxd01 kernel: [9351710.878707] ---[ end trace 
81aeb3fb0c68ce00 ]---


BTW we've updated to the latest 4.15 kernel after that.


> Not sure if the removal of 80G has anything to do with this, but this
> seems that your metadata (along with data) is quite scattered.
> 
> It's really recommended to keep some unallocated device space, and one
> of the method to do that is to use balance to free such scattered space
> from data/metadata usage.
> 
> And that's why balance routine is recommened for btrfs.

The balance might work on that server - it's less than 0.5 TB SSD disks.

However, on multi-terabyte servers with terabytes of data on HDD disks, 
running balance is not realistic. We have some servers where balance was 
taking 2 months or so, and was not even 50% done. And the IO load the 
balance was adding was slowing the things down a lot.


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-15  7:02       ` Tomasz Chmielewski
@ 2018-02-15  7:17         ` Tomasz Chmielewski
  2018-02-15  9:06           ` Nikolay Borisov
  2018-02-15  7:38         ` Qu Wenruo
  2018-02-15  7:50         ` Duncan
  2 siblings, 1 reply; 12+ messages in thread
From: Tomasz Chmielewski @ 2018-02-15  7:17 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Btrfs BTRFS

On 2018-02-15 16:02, Tomasz Chmielewski wrote:
> On 2018-02-15 13:32, Qu Wenruo wrote:
> 
>> Is there any kernel message like kernel warning or backtrace?
> 
> I see there was this one:
> 
> Feb 13 13:53:32 lxd01 kernel: [9351710.878404] ------------[ cut here
> ]------------
> Feb 13 13:53:32 lxd01 kernel: [9351710.878430] WARNING: CPU: 9 PID:
> 7780 at /home/kernel/COD/linux/fs/btrfs/tree-log.c:3361
> log_dir_items+0x54b/0x560 [btrfs]
(...)
> Feb 13 13:53:32 lxd01 kernel: [9351710.878707] ---[ end trace
> 81aeb3fb0c68ce00 ]---
> 
> 
> BTW we've updated to the latest 4.15 kernel after that.

Also, we were just running a balance there and it printed this in dmesg.

Not sure if it's something serious or not. The filesystem still runs 
correctly.

[60082.349447] WARNING: CPU: 2 PID: 780 at 
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:124 
btrfs_put_block_group+0x4f/0x60 [btrfs]
[60082.349449] Modules linked in: xt_nat xt_REDIRECT nf_nat_redirect 
tcp_diag inet_diag sunrpc xt_NFLOG cfg80211 xt_conntrack nfnetlink_log 
nfnetlink ipt_REJECT nf_reject_ipv4 binfmt_misc veth ebtable_filter 
ebtables ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 xt_comment nf_log_ipv4 
nf_log_common xt_LOG ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat 
nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_vs nf_conntrack 
ip6table_filter ip6_tables iptable_filter xt_CHECKSUM xt_tcpudp 
iptable_mangle ip_tables x_tables intel_rapl sb_edac 
x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs bridge pcbc 
zstd_compress stp llc aesni_intel aes_x86_64 crypto_simd glue_helper 
input_leds
[60082.349471]  cryptd intel_cstate intel_rapl_perf serio_raw shpchp 
lpc_ich acpi_pad mac_hid autofs4 raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 
raid0 multipath linear ttm drm_kms_helper syscopyarea sysfillrect 
sysimgblt fb_sys_fops igb drm dca ahci ptp libahci pps_core i2c_algo_bit 
wmi
[60082.349488] CPU: 2 PID: 780 Comm: btrfs-cleaner Tainted: G        W   
      4.15.3-041503-generic #201802120730
[60082.349489] Hardware name: ASUSTeK COMPUTER INC. Z10PA-U8 
Series/Z10PA-U8 Series, BIOS 0601 06/26/2015
[60082.349497] RIP: 0010:btrfs_put_block_group+0x4f/0x60 [btrfs]
[60082.349497] RSP: 0018:ffffc1468d3afe48 EFLAGS: 00010206
[60082.349498] RAX: 0000000000000000 RBX: ffff9d072f740000 RCX: 
0000000000000000
[60082.349499] RDX: 0000000000000001 RSI: 0000000000000246 RDI: 
ffff9d072bcb8c00
[60082.349499] RBP: ffffc1468d3afe50 R08: ffff9d072bcbfc00 R09: 
0000000180200019
[60082.349500] R10: ffffc1468d3afe38 R11: 0000000000000100 R12: 
ffff9d072b87ce00
[60082.349500] R13: ffff9d072bcb8c00 R14: ffff9d072b87ced0 R15: 
ffff9d072bcb8d20
[60082.349501] FS:  0000000000000000(0000) GS:ffff9d073f280000(0000) 
knlGS:0000000000000000
[60082.349502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[60082.349502] CR2: 00007f9221375000 CR3: 00000019a660a005 CR4: 
00000000001606e0
[60082.349503] Call Trace:
[60082.349513]  btrfs_delete_unused_bgs+0x243/0x3c0 [btrfs]
[60082.349521]  cleaner_kthread+0x159/0x170 [btrfs]
[60082.349524]  kthread+0x121/0x140
[60082.349531]  ? __btree_submit_bio_start+0x20/0x20 [btrfs]
[60082.349533]  ? kthread_create_worker_on_cpu+0x70/0x70
[60082.349535]  ret_from_fork+0x35/0x40
[60082.349536] Code: e8 01 00 00 48 85 c0 75 1a 48 89 fb 48 8b bf d8 00 
00 00 e8 14 3a 66 d7 48 89 df e8 0c 3a 66 d7 5b 5d c3 0f ff eb e2 0f ff 
eb cb <0f> ff eb ce 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[60082.349554] ---[ end trace 9492ee1b902c858d ]---


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-15  7:02       ` Tomasz Chmielewski
  2018-02-15  7:17         ` Tomasz Chmielewski
@ 2018-02-15  7:38         ` Qu Wenruo
  2018-02-15  7:50         ` Duncan
  2 siblings, 0 replies; 12+ messages in thread
From: Qu Wenruo @ 2018-02-15  7:38 UTC (permalink / raw)
  To: Tomasz Chmielewski; +Cc: Btrfs BTRFS


[-- Attachment #1.1: Type: text/plain, Size: 7988 bytes --]



On 2018年02月15日 15:02, Tomasz Chmielewski wrote:
> On 2018-02-15 13:32, Qu Wenruo wrote:
> 
>> Is there any kernel message like kernel warning or backtrace?
> 
> I see there was this one:
> 
> Feb 13 13:53:32 lxd01 kernel: [9351710.878404] ------------[ cut
> here ]------------ Feb 13 13:53:32 lxd01 kernel: [9351710.878430]
> WARNING: CPU: 9 PID: 7780 at
> /home/kernel/COD/linux/fs/btrfs/tree-log.c:3361

Something with tree log (used by fsync) is not running as expected, and
it seems to be a big problem.
As the code shows that, btrfs fails to search the key in log tree.

No wonder why MySQL is reporting error, as fsync is not executed correctly.

I strongly recommend to run offline btrfs check to ensure metadata is
not corrupted before causing more problems.

> log_dir_items+0x54b/0x560 [btrfs] Feb 13 13:53:32 lxd01 kernel:
> [9351710.878431] Modules linked in: nfnetlink_queue bluetooth
> ecdh_generic xt_nat xt_REDIRECT nf_nat_redirect sunrpc cfg80211
> tcp_diag inet_diag xt_NFLOG nfnetlink_log nfnetlink xt_conntrack
> ipt_REJECT nf_reject_ipv4 binfmt_misc veth ebtable_filter ebtables
> ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat nf_conntrack_ipv6
> nf_defrag_ipv6 nf_nat_ipv6 xt_comment nf_log_ipv4 nf_log_common
> xt_LOG ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_vs
> nf_conntrack ip6table_filter ip6_tables iptable_filter xt_CHECKSUM
> xt_tcpudp iptable_mangle ip_tables x_tables intel_rapl
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm 
> irqbypass btrfs bridge stp llc crct10dif_pclmul crc32_pclmul 
> ghash_clmulni_intel pcbc zstd_compress aesni_intel aes_x86_64 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878460]  crypto_simd glue_helper 
> cryptd input_leds intel_cstate ipmi_ssif intel_rapl_perf serio_raw 
> lpc_ich shpchp ipmi_devintf ipmi_msghandler tpm_infineon acpi_pad 
> mac_hid autofs4 raid10 raid456 async_raid6_recov async_memcpy
> async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0
> multipath linear ttm drm_kms_helper syscopyarea sysfillrect sysimgblt
> fb_sys_fops igb drm dca ahci ptp pps_core libahci i2c_algo_bit wmi 
> Feb 13 13:53:32 lxd01 kernel: [9351710.878484] CPU: 9 PID: 7780
> Comm: TaskSchedulerBa Tainted: G        W
> 4.14.0-041400rc6-generic #201710230731 Feb 13 13:53:32 lxd01 kernel:
> [9351710.878485] Hardware name: ASUSTeK COMPUTER INC. Z10PA-U8
> Series/Z10PA-U8 Series, BIOS 0601 06/26/2015 Feb 13 13:53:32 lxd01
> kernel: [9351710.878486] task: ffff9454227d1700 task.stack:
> ffffabc6a810c000 Feb 13 13:53:32 lxd01 kernel: [9351710.878502] RIP: 
> 0010:log_dir_items+0x54b/0x560 [btrfs] Feb 13 13:53:32 lxd01 kernel:
> [9351710.878502] RSP: 0018:ffffabc6a810f980 EFLAGS: 00010202 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878503] RAX: 0000000000000001 RBX:
> 000000000008b771 RCX: 0000000000000000 Feb 13 13:53:32 lxd01 kernel:
> [9351710.878504] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
> 0000000000000000 Feb 13 13:53:32 lxd01 kernel: [9351710.878505] RBP:
> ffffabc6a810fa28 R08: ffff9491a8f05540 R09: 0000000000000008 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878506] R10: 0000000000000000 R11:
> ffffabc6a810f934 R12: ffffabc6a810fe50 Feb 13 13:53:32 lxd01 kernel:
> [9351710.878506] R13: ffff94666d426000 R14: ffff9491a8f05540 R15:
> 0000000000000054 Feb 13 13:53:32 lxd01 kernel: [9351710.878508] FS: 
> 00007f9936e22700(0000) GS:ffff9491bf440000(0000)
> knlGS:0000000000000000 Feb 13 13:53:32 lxd01 kernel: [9351710.878508]
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Feb 13 13:53:32
> lxd01 kernel: [9351710.878509] CR2: 00007f6abef4d7b0 CR3:
> 00000023ecaf7006 CR4: 00000000001606e0 Feb 13 13:53:32 lxd01 kernel:
> [9351710.878510] Call Trace: Feb 13 13:53:32 lxd01 kernel:
> [9351710.878524]  ? btrfs_search_slot+0x81b/0x9c0 [btrfs] Feb 13
> 13:53:32 lxd01 kernel: [9351710.878538] 
> log_directory_changes+0x83/0xd0 [btrfs] Feb 13 13:53:32 lxd01 kernel:
> [9351710.878551] btrfs_log_inode+0xa24/0x11a0 [btrfs] Feb 13 13:53:32
> lxd01 kernel: [9351710.878563]  ? 
> generic_bin_search.constprop.37+0xe7/0x1f0 [btrfs] Feb 13 13:53:32
> lxd01 kernel: [9351710.878565]  ? find_inode+0x59/0xb0 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878567]  ? iget5_locked+0x9e/0x1e0 
> Feb 13 13:53:32 lxd01 kernel: [9351710.878582] 
> log_new_dir_dentries+0x203/0x4a7 [btrfs] Feb 13 13:53:32 lxd01
> kernel: [9351710.878595] btrfs_log_inode_parent+0x6c2/0xa10 [btrfs] 
> Feb 13 13:53:32 lxd01 kernel: [9351710.878598]  ? 
> pagevec_lookup_tag+0x21/0x30 Feb 13 13:53:32 lxd01 kernel:
> [9351710.878599]  ? __filemap_fdatawait_range+0x9a/0x170 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878614]  ? 
> wait_current_trans+0x33/0x110 [btrfs] Feb 13 13:53:32 lxd01 kernel:
> [9351710.878627]  ? join_transaction+0x27/0x420 [btrfs] Feb 13
> 13:53:32 lxd01 kernel: [9351710.878639] 
> btrfs_log_dentry_safe+0x60/0x80 [btrfs] Feb 13 13:53:32 lxd01 kernel:
> [9351710.878658] btrfs_sync_file+0x2d1/0x410 [btrfs] Feb 13 13:53:32
> lxd01 kernel: [9351710.878661]  vfs_fsync_range+0x4b/0xb0 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878663]  do_fsync+0x3d/0x70 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878668]  SyS_fdatasync+0x13/0x20 Feb
> 13 13:53:32 lxd01 kernel: [9351710.878670]  do_syscall_64+0x61/0x120 
> Feb 13 13:53:32 lxd01 kernel: [9351710.878673] 
> entry_SYSCALL64_slow_path+0x25/0x25 Feb 13 13:53:32 lxd01 kernel:
> [9351710.878674] RIP: 0033:0x7f99461437dd Feb 13 13:53:32 lxd01
> kernel: [9351710.878675] RSP: 002b:00007f9936e20f10 EFLAGS: 00000293
> ORIG_RAX: 000000000000004b Feb 13 13:53:32 lxd01 kernel:
> [9351710.878676] RAX: ffffffffffffffda RBX: 0000307d6f5d1070 RCX:
> 00007f99461437dd Feb 13 13:53:32 lxd01 kernel: [9351710.878677] RDX:
> 000000000000005c RSI: 0000000000080000 RDI: 000000000000005c Feb 13
> 13:53:32 lxd01 kernel: [9351710.878678] RBP: 0000000000000000 R08:
> 0000000000000000 R09: 0000000000000000 Feb 13 13:53:32 lxd01 kernel:
> [9351710.878679] R10: 00000000ffffffff R11: 0000000000000293 R12:
> 0000000000001000 Feb 13 13:53:32 lxd01 kernel: [9351710.878679] R13:
> 0000307d6f550b00 R14: 0000000000000000 R15: 0000000000001000 Feb 13
> 13:53:32 lxd01 kernel: [9351710.878681] Code: 89 85 6c ff ff ff 4c 8b
> 95 70 ff ff ff 74 23 4c 89 f7 e8 a9 dc f8 ff 48 8b 7d 88 e8 a0 dc f8
> ff 8b 85 6c ff ff ff e9 d8 fb ff ff <0f> ff e9 35 fe ff ff 4c 89 55 
> 18 e9 56 fc ff ff e8 60 65 61 eb Feb 13 13:53:32 lxd01 kernel:
> [9351710.878707] ---[ end trace 81aeb3fb0c68ce00 ]---
> 
> 
> BTW we've updated to the latest 4.15 kernel after that.
> 
> 
>> Not sure if the removal of 80G has anything to do with this, but
>> this seems that your metadata (along with data) is quite
>> scattered.
>> 
>> It's really recommended to keep some unallocated device space, and
>> one of the method to do that is to use balance to free such
>> scattered space from data/metadata usage.
>> 
>> And that's why balance routine is recommened for btrfs.
> 
> The balance might work on that server - it's less than 0.5 TB SSD
> disks.
> 
> However, on multi-terabyte servers with terabytes of data on HDD
> disks, running balance is not realistic> We have some servers where
> balance was taking 2 months or so, and was not even 50% done. And the
> IO load the balance was adding was slowing the things down a lot.

How did you do the balance?

Btrfs are completely OK to relocate certain chunks which meet certain
condition.

For example, only to relocate chunk whose used space is lower than 15%.

If you're relocating the whole fs, no wonder it will takes a long long time.

And further more, btrfs is super fast in creating snapshots, but at the
cost of dramatically slowing down balance and snapshot deletion.

So abusing snapshots is not a good idea for btrfs especially for balance.

Thanks,
Qu

> 
> 
> Tomasz Chmielewski https://lxadm.com


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-15  7:02       ` Tomasz Chmielewski
  2018-02-15  7:17         ` Tomasz Chmielewski
  2018-02-15  7:38         ` Qu Wenruo
@ 2018-02-15  7:50         ` Duncan
  2 siblings, 0 replies; 12+ messages in thread
From: Duncan @ 2018-02-15  7:50 UTC (permalink / raw)
  To: linux-btrfs

Tomasz Chmielewski posted on Thu, 15 Feb 2018 16:02:59 +0900 as excerpted:

>> Not sure if the removal of 80G has anything to do with this, but this
>> seems that your metadata (along with data) is quite scattered.
>> 
>> It's really recommended to keep some unallocated device space, and one
>> of the method to do that is to use balance to free such scattered space
>> from data/metadata usage.
>> 
>> And that's why balance routine is recommened for btrfs.
> 
> The balance might work on that server - it's less than 0.5 TB SSD disks.
> 
> However, on multi-terabyte servers with terabytes of data on HDD disks,
> running balance is not realistic. We have some servers where balance was
> taking 2 months or so, and was not even 50% done. And the IO load the
> balance was adding was slowing the things down a lot.

Try a filtered balance.  Something along the lines of:

btrfs balance start -dusage=10 <filesystem-path>

The -dusage number, a limit on the chunk usage percentage, can start 
small, even 0, and be increased as necessary, until btrfs fi usage 
reports data size (currently 411 GiB) closer to data usage (currently 
246.14 GiB), with the freed space returning to unallocated.

I'd shoot for reducing data size to under 300 GiB, thus returning over 
100 GiB to unallocated, while hopefully not requiring too high a -dusage 
percentage and thus too long a balance time.  You could get it down under 
250 gig size, but that would likely take a lot of rewriting for little 
additional gain, since with it under 300 gig size you should already have 
over 100 gig unallocated.

Balance time should be quite short for low percentages, with a big 
payback if there's quite a few chunks with little usage, because at 10%, 
the filesystem can get rid of 10 chunks while only rewriting the 
equivalent of a single full chunk.

Obviously as the chunk usage percentage goes up, the payback goes down, 
so at 50%, it can only clear two chunks while writing one, and at 66%, it 
has to write two chunks worth to clear three.  Above that (tho I tend to 
round up to 70% here) is seldom worth it until the filesystem gets quite 
full and you're really fighting to keep a few gigs of unallocated space.  
(As Qu indicated, you always want at least a gig of unallocated space, on 
at least two devices if you're doing raid1.)

If you really wanted you could do the same with -musage for metadata, 
except that's not so bad, only 9 gig size, 3 gig used.  But you could 
free 5 gigs or so, if desired.

That's assuming there's no problem.  I see a followup indicating you're 
seeing problems in dmesg with a balance, however, and will let others 
deal with that.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-15  7:17         ` Tomasz Chmielewski
@ 2018-02-15  9:06           ` Nikolay Borisov
  0 siblings, 0 replies; 12+ messages in thread
From: Nikolay Borisov @ 2018-02-15  9:06 UTC (permalink / raw)
  To: Tomasz Chmielewski, Qu Wenruo; +Cc: Btrfs BTRFS



On 15.02.2018 09:17, Tomasz Chmielewski wrote:
> On 2018-02-15 16:02, Tomasz Chmielewski wrote:
>> On 2018-02-15 13:32, Qu Wenruo wrote:
>>
>>> Is there any kernel message like kernel warning or backtrace?
>>
>> I see there was this one:
>>
>> Feb 13 13:53:32 lxd01 kernel: [9351710.878404] ------------[ cut here
>> ]------------
>> Feb 13 13:53:32 lxd01 kernel: [9351710.878430] WARNING: CPU: 9 PID:
>> 7780 at /home/kernel/COD/linux/fs/btrfs/tree-log.c:3361
>> log_dir_items+0x54b/0x560 [btrfs]
> (...)
>> Feb 13 13:53:32 lxd01 kernel: [9351710.878707] ---[ end trace
>> 81aeb3fb0c68ce00 ]---
>>
>>
>> BTW we've updated to the latest 4.15 kernel after that.
> 
> Also, we were just running a balance there and it printed this in dmesg.
> 
> Not sure if it's something serious or not. The filesystem still runs
> correctly.
> 
> [60082.349447] WARNING: CPU: 2 PID: 780 at
> /home/kernel/COD/linux/fs/btrfs/extent-tree.c:124
> btrfs_put_block_group+0x4f/0x60 [btrfs]
> [60082.349449] Modules linked in: xt_nat xt_REDIRECT nf_nat_redirect
> tcp_diag inet_diag sunrpc xt_NFLOG cfg80211 xt_conntrack nfnetlink_log
> nfnetlink ipt_REJECT nf_reject_ipv4 binfmt_misc veth ebtable_filter
> ebtables ip6t_MASQUERADE nf_nat_masquerade_ipv6 ip6table_nat
> nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 xt_comment nf_log_ipv4
> nf_log_common xt_LOG ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat
> nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat ip_vs nf_conntrack
> ip6table_filter ip6_tables iptable_filter xt_CHECKSUM xt_tcpudp
> iptable_mangle ip_tables x_tables intel_rapl sb_edac
> x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass
> crct10dif_pclmul crc32_pclmul ghash_clmulni_intel btrfs bridge pcbc
> zstd_compress stp llc aesni_intel aes_x86_64 crypto_simd glue_helper
> input_leds
> [60082.349471]  cryptd intel_cstate intel_rapl_perf serio_raw shpchp
> lpc_ich acpi_pad mac_hid autofs4 raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1
> raid0 multipath linear ttm drm_kms_helper syscopyarea sysfillrect
> sysimgblt fb_sys_fops igb drm dca ahci ptp libahci pps_core i2c_algo_bit
> wmi
> [60082.349488] CPU: 2 PID: 780 Comm: btrfs-cleaner Tainted: G        W  
>      4.15.3-041503-generic #201802120730
> [60082.349489] Hardware name: ASUSTeK COMPUTER INC. Z10PA-U8
> Series/Z10PA-U8 Series, BIOS 0601 06/26/2015
> [60082.349497] RIP: 0010:btrfs_put_block_group+0x4f/0x60 [btrfs]
> [60082.349497] RSP: 0018:ffffc1468d3afe48 EFLAGS: 00010206
> [60082.349498] RAX: 0000000000000000 RBX: ffff9d072f740000 RCX:
> 0000000000000000
> [60082.349499] RDX: 0000000000000001 RSI: 0000000000000246 RDI:
> ffff9d072bcb8c00
> [60082.349499] RBP: ffffc1468d3afe50 R08: ffff9d072bcbfc00 R09:
> 0000000180200019
> [60082.349500] R10: ffffc1468d3afe38 R11: 0000000000000100 R12:
> ffff9d072b87ce00
> [60082.349500] R13: ffff9d072bcb8c00 R14: ffff9d072b87ced0 R15:
> ffff9d072bcb8d20
> [60082.349501] FS:  0000000000000000(0000) GS:ffff9d073f280000(0000)
> knlGS:0000000000000000
> [60082.349502] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [60082.349502] CR2: 00007f9221375000 CR3: 00000019a660a005 CR4:
> 00000000001606e0
> [60082.349503] Call Trace:
> [60082.349513]  btrfs_delete_unused_bgs+0x243/0x3c0 [btrfs]
> [60082.349521]  cleaner_kthread+0x159/0x170 [btrfs]
> [60082.349524]  kthread+0x121/0x140
> [60082.349531]  ? __btree_submit_bio_start+0x20/0x20 [btrfs]
> [60082.349533]  ? kthread_create_worker_on_cpu+0x70/0x70
> [60082.349535]  ret_from_fork+0x35/0x40
> [60082.349536] Code: e8 01 00 00 48 85 c0 75 1a 48 89 fb 48 8b bf d8 00
> 00 00 e8 14 3a 66 d7 48 89 df e8 0c 3a 66 d7 5b 5d c3 0f ff eb e2 0f ff
> eb cb <0f> ff eb ce 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
> [60082.349554] ---[ end trace 9492ee1b902c858d ]---
> 

If you have the vmlinux for that particular kernel can you run the 
following command and provide the output: 

./scripts/faddr2line <path-to-vmlinux> btrfs_delete_unused_bgs+0x243/0x3c0

./scripts/faddr2line is in the kernel source directory. This 
warn on indicates we are freeing an unused blockgroup (one which 
shouldn't have any space reserved in it) yet we trigger a warning 
due to having reserved space for that block group. Quite strange

> 
> Tomasz Chmielewski
> https://lxadm.com
> -- 
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-14 14:19 fatal database corruption with btrfs "out of space" with ~50 GB left Tomasz Chmielewski
  2018-02-15  1:25 ` Duncan
  2018-02-15  1:47 ` Qu Wenruo
@ 2018-02-19  4:29 ` Anand Jain
  2018-02-19  8:30   ` Tomasz Chmielewski
  2 siblings, 1 reply; 12+ messages in thread
From: Anand Jain @ 2018-02-19  4:29 UTC (permalink / raw)
  To: Tomasz Chmielewski, Btrfs BTRFS



On 02/14/2018 10:19 PM, Tomasz Chmielewski wrote:
> Just FYI, how dangerous running btrfs can be - we had a fatal, 
> unrecoverable MySQL corruption when btrfs decided to do one of these "I 
> have ~50 GB left, so let's do out of space (and corrupt some files at 
> the same time, ha ha!)".

  Thanks for reporting.

> Running btrfs RAID-1 with kernel 4.14.

  Can you pls let us know..
  1. What tool cli/reported/identified that data is corrupted?
  2. Disk error stat using.. btrfs dev stat <mnt>
     (dev stat is stored on disk)
  3. Wheather the disk was mounted as degraded any time before?

Thanks, Anand

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: fatal database corruption with btrfs "out of space" with ~50 GB left
  2018-02-19  4:29 ` Anand Jain
@ 2018-02-19  8:30   ` Tomasz Chmielewski
  0 siblings, 0 replies; 12+ messages in thread
From: Tomasz Chmielewski @ 2018-02-19  8:30 UTC (permalink / raw)
  To: Anand Jain; +Cc: Btrfs BTRFS

On 2018-02-19 13:29, Anand Jain wrote:
> On 02/14/2018 10:19 PM, Tomasz Chmielewski wrote:
>> Just FYI, how dangerous running btrfs can be - we had a fatal, 
>> unrecoverable MySQL corruption when btrfs decided to do one of these 
>> "I have ~50 GB left, so let's do out of space (and corrupt some files 
>> at the same time, ha ha!)".
> 
>  Thanks for reporting.
> 
>> Running btrfs RAID-1 with kernel 4.14.
> 
>  Can you pls let us know..
>  1. What tool cli/reported/identified that data is corrupted?

mysqld log - mysqld would refuse to start because of database 
corruption.

And, the database wouldn't start even when "innodb_force_recovery = " 
was set to a high/max value.


In the past, with lower kernel versions, we had a similar issue with 
mongod - it wouldn't start anymore due to some corruption which happened 
when we hit "out of space" (again, with dozens of GBs free space).


>  2. Disk error stat using.. btrfs dev stat <mnt>
>     (dev stat is stored on disk)

# btrfs dev stat /var/lib/lxd
[/dev/sda3].write_io_errs    0
[/dev/sda3].read_io_errs     0
[/dev/sda3].flush_io_errs    0
[/dev/sda3].corruption_errs  0
[/dev/sda3].generation_errs  0
[/dev/sdb3].write_io_errs    0
[/dev/sdb3].read_io_errs     0
[/dev/sdb3].flush_io_errs    0
[/dev/sdb3].corruption_errs  0
[/dev/sdb3].generation_errs  0


>  3. Wheather the disk was mounted as degraded any time before?

No. Everything healthy with the disks.


Tomasz Chmielewski
https://lxadm.com

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-02-19  8:30 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-14 14:19 fatal database corruption with btrfs "out of space" with ~50 GB left Tomasz Chmielewski
2018-02-15  1:25 ` Duncan
2018-02-15  1:47 ` Qu Wenruo
2018-02-15  4:19   ` Tomasz Chmielewski
2018-02-15  4:32     ` Qu Wenruo
2018-02-15  7:02       ` Tomasz Chmielewski
2018-02-15  7:17         ` Tomasz Chmielewski
2018-02-15  9:06           ` Nikolay Borisov
2018-02-15  7:38         ` Qu Wenruo
2018-02-15  7:50         ` Duncan
2018-02-19  4:29 ` Anand Jain
2018-02-19  8:30   ` Tomasz Chmielewski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.