All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Asif Youssuff <yoasif@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: Filesystem goes readonly soon after mount, cannot free space or rebalance
Date: Mon, 21 Jun 2021 15:42:08 +0800	[thread overview]
Message-ID: <1b89f8a3-42a4-3c6d-aec8-1b91a7b43713@gmx.com> (raw)
In-Reply-To: <2bb832db-3c33-d3ba-d9ae-4ebd44c1c7f3@gmail.com>



On 2021/6/19 下午1:16, Asif Youssuff wrote:
> Hi Btrfs mailing list,
>
> I'm running into a weird situation where my filesystem goes readonly
> soon after mount. Removing snapshots or files doesn't help, and the
> changes are not persisted after the filesystem goes readonly and is
> remounted.
>
> I made the mistake of starting a large rebalance operation while using
> an expansive balance filter (I don't recall the figure, unfortunately),
> so I can't even add a new disk (given as a solution to disk full errors
> on various places on the web).
>
> Mounting with skip_balance stops the balance operation, but doesn't
> *cancel* it, so adding a new disk isn't possible.
>
> I'm pretty stuck here so any ideas on how to resolve would be great!
>
> uname -a
> Linux butter-server 5.12.11-051211-generic #202106161201 SMP Wed Jun 16
> 12:32:38 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>
> btrfs --version
> btrfs-progs v5.4.1
>
> btrfs fi show
> Label: none  uuid: c8557a6e-4b51-44f1-ba8f-75fce8c7dfcd
>      Total devices 1 FS bytes used 5.38TiB
>      devid    1 size 5.46TiB used 5.46TiB path /dev/sdh1
>
> Label: none  uuid: 48ed8a66-731d-499b-829e-dd07dd7260cc
>      Total devices 13 FS bytes used 50.79TiB
>      devid    4 size 5.46TiB used 5.46TiB path /dev/sdf
>      devid    5 size 7.28TiB used 7.28TiB path /dev/sdj
>      devid    7 size 12.73TiB used 12.73TiB path /dev/sdg
>      devid    9 size 5.46TiB used 5.46TiB path /dev/sdd
>      devid   10 size 7.28TiB used 7.28TiB path /dev/sdp
>      devid   11 size 7.28TiB used 7.28TiB path /dev/sdl
>      devid   12 size 5.46TiB used 5.46TiB path /dev/sdb
>      devid   14 size 7.28TiB used 7.28TiB path /dev/sda
>      devid   15 size 7.28TiB used 7.28TiB path /dev/sdn

All devices above are exhauseted, without unallocated space.

>      devid   17 size 9.10TiB used 7.49TiB path /dev/sde

But there are several TiB unllocated space.

>      devid   18 size 7.28TiB used 7.28TiB path /dev/sdm
>      devid   20 size 7.28TiB used 7.28TiB path /dev/sdc
>      devid   21 size 7.28TiB used 6.42TiB path /dev/sdo

And there is also some space.

I believe it's a bug in metadata overcommit code, which makes btrfs to
believe it can overcommit.

But in reality, metadata is RAID1C4, needs at least 4 devices, not 2.

And this illusion makes btrfs continue over-commit, and hits a situation
where it really runs of out space during critical operations, and went RO.


>
>
> sudo btrfs fi df /media/camino/
> Data, RAID1: total=37.59TiB, used=36.98TiB
> Data, RAID6: total=13.77TiB, used=13.75TiB
> System, RAID1C4: total=32.00MiB, used=12.97MiB
> Metadata, RAID1C4: total=66.00GiB, used=65.63GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
>
>
> After mount, my .allocation looks like (this is rw):
>
> grep -R . allocation/
> allocation/metadata/disk_used:281696862208
> allocation/metadata/bytes_pinned:17383424
> allocation/metadata/bytes_used:70424215552
> allocation/metadata/total_bytes_pinned:57327616
> allocation/metadata/disk_total:283467841536
> allocation/metadata/total_bytes:70866960384
> allocation/metadata/bytes_reserved:24412160
> allocation/metadata/bytes_readonly:0
> allocation/metadata/raid1c4/used_bytes:70424215552
> allocation/metadata/raid1c4/total_bytes:70866960384
> allocation/metadata/bytes_zone_unusable:0
> allocation/metadata/bytes_may_use:842219520
> allocation/metadata/flags:4
> allocation/system/disk_used:54525952
> allocation/system/bytes_pinned:0
> allocation/system/bytes_used:13631488
> allocation/system/total_bytes_pinned:114688
> allocation/system/disk_total:134217728
> allocation/system/total_bytes:33554432
> allocation/system/bytes_reserved:81920
> allocation/system/bytes_readonly:0
> allocation/system/raid1c4/used_bytes:13631488
> allocation/system/raid1c4/total_bytes:33554432
> allocation/system/bytes_zone_unusable:0
> allocation/system/bytes_may_use:0
> allocation/system/flags:2
> allocation/global_rsv_reserved:535183360
> allocation/data/raid1/used_bytes:40658677952512
> allocation/data/raid1/total_bytes:41329991548928
> allocation/data/disk_used:96436269289472
> allocation/data/bytes_pinned:65359872
> allocation/data/raid6/used_bytes:15118913384448
> allocation/data/raid6/total_bytes:15141606326272
> allocation/data/bytes_used:55777591259136
> allocation/data/total_bytes_pinned:4347092992
> allocation/data/disk_total:97801589424128
> allocation/data/total_bytes:56471597875200
> allocation/data/bytes_reserved:0
> allocation/data/bytes_readonly:2555904
> allocation/data/bytes_zone_unusable:0
> allocation/data/bytes_may_use:0
> allocation/data/flags:1
> allocation/global_rsv_size:536870912
>
> After the disk goes readonly, my .allocation looks like:
>
> grep -R . allocation/
> allocation/metadata/disk_used:281865486336
> allocation/metadata/bytes_pinned:0
> allocation/metadata/bytes_used:70466371584
> allocation/metadata/total_bytes_pinned:0
> allocation/metadata/disk_total:283467841536
> allocation/metadata/total_bytes:70866960384
> allocation/metadata/bytes_reserved:0
> allocation/metadata/bytes_readonly:0
> allocation/metadata/raid1c4/used_bytes:70466371584
> allocation/metadata/raid1c4/total_bytes:70866960384
> allocation/metadata/bytes_zone_unusable:0
> allocation/metadata/bytes_may_use:536870912
> allocation/metadata/flags:4
> allocation/system/disk_used:54394880
> allocation/system/bytes_pinned:0
> allocation/system/bytes_used:13598720
> allocation/system/total_bytes_pinned:0
> allocation/system/disk_total:134217728
> allocation/system/total_bytes:33554432
> allocation/system/bytes_reserved:0
> allocation/system/bytes_readonly:0
> allocation/system/raid1c4/used_bytes:13598720
> allocation/system/raid1c4/total_bytes:33554432
> allocation/system/bytes_zone_unusable:0
> allocation/system/bytes_may_use:0
> allocation/system/flags:2
> allocation/global_rsv_reserved:536870912
> allocation/data/raid1/used_bytes:40658677952512
> allocation/data/raid1/total_bytes:41329991548928
> allocation/data/disk_used:96434427817984
> allocation/data/bytes_pinned:0
> allocation/data/raid6/used_bytes:15117071912960
> allocation/data/raid6/total_bytes:15141606326272
> allocation/data/bytes_used:55775749865472
> allocation/data/total_bytes_pinned:0
> allocation/data/disk_total:97801589424128
> allocation/data/total_bytes:56471597875200
> allocation/data/bytes_reserved:0
> allocation/data/bytes_readonly:2555904
> allocation/data/bytes_zone_unusable:0
> allocation/data/bytes_may_use:0
> allocation/data/flags:1
> allocation/global_rsv_size:536870912
>
>
> dmesg error (full dmesg attached):
>
> [ 1043.994674] ------------[ cut here ]------------
> [ 1043.994676] BTRFS: Transaction aborted (error -28)
> [ 1043.994739] WARNING: CPU: 7 PID: 11673 at fs/btrfs/block-group.c:2721
> btrfs_start_dirty_block_groups+0x48c/0x4f0 [btrfs]
> [ 1043.994912] Modules linked in: xt_mark xt_nat veth
> nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype
> br_netfilter xt_CHECKSUM iptable_mangle xt_MASQUERADE iptable_nat nf_nat
> xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipt_REJECT
> nf_reject_ipv4 xt_tcpudp bridge stp llc ebtable_filter ebtables
> ip6table_filter ip6_tables iptable_filter bpfilter ppdev parport_pc
> parport vmw_vsock_vmci_transport vsock vmw_vmci overlay bluetooth
> ecdh_generic ecc msr binfmt_misc joydev input_leds ipmi_ssif dm_crypt
> intel_rapl_msr intel_rapl_common x86_pkg_temp_thermal intel_powerclamp
> coretemp kvm_intel kvm rapl intel_cstate intel_pch_thermal lpc_ich
> mei_me mei ie31200_edac acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler
> mac_hid acpi_pad sch_fq_codel ib_iser rdma_cm iw_cm ib_cm ib_core
> iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables
> autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov
> async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1
> [ 1043.995121]  raid0 multipath linear dm_mirror dm_region_hash dm_log
> hid_generic usbhid hid uas usb_storage ast drm_vram_helper
> drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul
> ghash_clmulni_intel aesni_intel syscopyarea sysfillrect sysimgblt
> fb_sys_fops crypto_simd ahci cec cryptd rc_core libahci mpt3sas igb drm
> dca xhci_pci raid_class e1000e i2c_algo_bit scsi_transport_sas
> xhci_pci_renesas video
> [ 1043.995272] CPU: 7 PID: 11673 Comm: snapperd Not tainted
> 5.12.11-051211-generic #202106161201
> [ 1043.995282] Hardware name: Supermicro X10SLM-F/X10SLM-F, BIOS 3.0
> 04/24/2015
> [ 1043.995289] RIP: 0010:btrfs_start_dirty_block_groups+0x48c/0x4f0 [btrfs]
> [ 1043.995459] Code: 8b 53 50 f0 48 0f ba aa 48 0a 00 00 03 72 20 83 f8
> fb 74 46 83 f8 e2 74 41 89 c6 48 c7 c7 e0 1a 85 c0 89 45 8c e8 57 99 9c
> e9 <0f> 0b 8b 45 8c 89 c1 ba a1 0a 00 00 48 89 df 89 45 8c 48 c7 c6 00
> [ 1043.995466] RSP: 0018:ffffb8ba424c3bf8 EFLAGS: 00010282
> [ 1043.995474] RAX: 0000000000000000 RBX: ffff89af68c460d0 RCX:
> ffff89b57fdd85c8
> [ 1043.995478] RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI:
> ffff89b57fdd85c0
> [ 1043.995482] RBP: ffffb8ba424c3c70 R08: 0000000000000000 R09:
> ffffb8ba424c39d8
> [ 1043.995489] R10: ffffb8ba424c39d0 R11: ffffffffab5542e8 R12:
> ffff89af9b305000
> [ 1043.995493] R13: ffff89af9b305170 R14: ffff89af06973000 R15:
> ffff89af9b305160
> [ 1043.995498] FS:  00007f5f18556700(0000) GS:ffff89b57fdc0000(0000)
> knlGS:0000000000000000
> [ 1043.995503] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1043.995507] CR2: 00007fdb1af228a0 CR3: 000000027477a002 CR4:
> 00000000001706e0
> [ 1043.995513] Call Trace:
> [ 1043.995521]  btrfs_commit_transaction+0x7ff/0xa20 [btrfs]
> [ 1043.995640]  ? start_transaction+0xd5/0x590 [btrfs]
> [ 1043.995751]  create_snapshot+0x1bb/0x270 [btrfs]
> [ 1043.995894]  btrfs_mksubvol+0x112/0x1f0 [btrfs]
> [ 1043.996037]  btrfs_mksnapshot+0x80/0xb0 [btrfs]
> [ 1043.996178]  __btrfs_ioctl_snap_create+0x176/0x180 [btrfs]
> [ 1043.996320]  btrfs_ioctl_snap_create_v2+0xc0/0x150 [btrfs]
> [ 1043.996462]  btrfs_ioctl+0x93c/0x970 [btrfs]
> [ 1043.996603]  ? __cond_resched+0x1a/0x50
> [ 1043.996614]  __x64_sys_ioctl+0x91/0xc0
> [ 1043.996623]  do_syscall_64+0x38/0x90
> [ 1043.996630]  entry_SYSCALL_64_after_hwframe+0x44/0xae
> [ 1043.996640] RIP: 0033:0x7f5f1db09317
> [ 1043.996647] Code: b3 66 90 48 8b 05 71 4b 2d 00 64 c7 00 26 00 00 00
> 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f
> 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 41 4b 2d 00 f7 d8 64 89 01 48
> [ 1043.996653] RSP: 002b:00007f5f185532c8 EFLAGS: 00000246 ORIG_RAX:
> 0000000000000010
> [ 1043.996660] RAX: ffffffffffffffda RBX: 00007f5f185532d0 RCX:
> 00007f5f1db09317
> [ 1043.996664] RDX: 00007f5f185532d0 RSI: 0000000050009417 RDI:
> 0000000000000007
> [ 1043.996668] RBP: 0000000000000006 R08: 000000000000000f R09:
> 00007f5f18554f84
> [ 1043.996671] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000007
> [ 1043.996675] R13: 00007f5f18555450 R14: 0000000000000000 R15:
> 0000000000000001
> [ 1043.996683] ---[ end trace 84d7b5fe58817f91 ]---
> [ 1043.996689] BTRFS: error (device sdf) in
> btrfs_start_dirty_block_groups:2721: errno=-28 No space left
> [ 1044.014701] BTRFS error (device sdf): allocation failed flags 1028,
> wanted 16384 tree-log 0
> [ 1044.014835] BTRFS: error (device sdf) in __btrfs_free_extent:3216:
> errno=-28 No space left
> [ 1044.014918] BTRFS: error (device sdf) in btrfs_run_delayed_refs:2163:
> errno=-28 No space left

Can you delete some subvolumes/snapshot to free some space?

In such critical case, I don't believe balance will do any help.

Regular file deletion also needs extra metadata, thus maybe only
subvolumes/snapshots deletion can help.

Thanks,
Qu
>
>
> Thanks for the help!
> Asif

  parent reply	other threads:[~2021-06-21  7:42 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-19  5:16 Filesystem goes readonly soon after mount, cannot free space or rebalance Asif Youssuff
2021-06-19 18:47 ` Chris Murphy
2021-06-20  9:11   ` Asif Youssuff
     [not found]   ` <ab0e8705-e18f-90eb-c42b-318c04a2101c@gmail.com>
2021-06-20 16:24     ` Chris Murphy
2021-06-20 16:49       ` Chris Murphy
2021-06-21  2:04       ` Asif Youssuff
2021-06-21  3:27         ` Chris Murphy
2021-06-21  6:53           ` Asif Youssuff
2021-06-21  7:42 ` Qu Wenruo [this message]
2021-06-21 23:14   ` Asif Youssuff
2021-06-21 23:36     ` Qu Wenruo
2021-06-22  0:12       ` Asif Youssuff
2021-06-22  0:25         ` Qu Wenruo
2021-06-22  0:50           ` Asif Youssuff
2021-06-22  0:55             ` Asif Youssuff
2021-06-22  3:15               ` Qu Wenruo
2021-06-22  4:54                 ` Asif Youssuff
2021-06-22  5:03                   ` Qu Wenruo
2021-06-22  6:37                     ` Asif Youssuff
2021-06-22 21:33                       ` Chris Murphy
2021-06-23  9:32                         ` Asif Youssuff
2021-06-23  9:37                           ` Qu Wenruo
2021-06-23  9:53                             ` Forza
2021-06-23 16:24                             ` Asif Youssuff
2021-06-23 16:29                               ` Martin Raiber
2021-07-27 21:58                               ` Asif Youssuff
2021-06-23  5:32 ` Zygo Blaxell
2021-06-23  6:08   ` Paul Jones
2021-06-23  9:22   ` Asif Youssuff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b89f8a3-42a4-3c6d-aec8-1b91a7b43713@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=yoasif@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.