From: Moritz M <mailinglist@moritzmueller.ee>
To: linux-btrfs@vger.kernel.org
Subject: Help needed, server is unresponsive after btrfs balance
Date: Mon, 04 Feb 2019 12:47:59 +0100 [thread overview]
Message-ID: <6c9257eb3b6451b67bd8b082e06a7735@moritzmueller.ee> (raw)
Hi,
I'm running a Ubuntu server with a btrfs RAID1 consisting of three HDDs.
I do balancing daily via
> btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4 /
It usually takes between 1 - 10 minutes.
But today the server was unresponsive (no ssh connect possible, no
direct login via keyboard possible) even after 7 hours.
I had a similar situation two weeks ago. I did not find anything and
finally checked and repaired the filesystem with
> btrfs check --repair /dev/sda3
Which found some qgroup related problems:
> enabling repair mode
> Checking filesystem on /dev/sda3
> UUID: cf8c4bb2-6a75-4e1d-983c-19583a93a546
> No device size related problem found
> cache and super generation don't match, space cache will be invalidated
> Counts for qgroup id: 0/257 are different
> our: referenced 127300112384 referenced compressed 127300112384
> disk: referenced 18446743939800129536 referenced compressed
> 18446743939800129536
> diff: referenced 261209534464 referenced compressed 261209534464
> our: exclusive 56360521728 exclusive compressed 56360521728
> disk: exclusive 56360521728 exclusive compressed 56360521728
…
> Repair qgroup 0/257
Today I had to boot a Live system, mount the btrfs filessystem with
-o skip_balance and cancel the balancing there.
Mounting took ~30 mins and in journalctl of the Live system I found this
> Feb 04 09:42:28 ubuntu kernel: INFO: task btrfs-transacti:7527 blocked
> for
> more than 120 seconds.
> Feb 04 09:42:28 ubuntu kernel: Not tainted
> 4.15.0-29-generic #31-Ubuntu
> Feb 04 09:42:28 ubuntu kernel: "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Feb 04 09:42:28 ubuntu kernel: btrfs-transacti D 0 7527 2
> 0x80000000
> Feb 04 09:42:28 ubuntu kernel: Call Trace:
> Feb 04 09:42:28 ubuntu kernel: __schedule+0x291/0x8a0
> Feb 04 09:42:28 ubuntu kernel: schedule+0x2c/0x80
> Feb 04 09:42:28 ubuntu kernel: btrfs_commit_transaction+0x81d/0x8f0
> [btrfs]
> Feb 04 09:42:28 ubuntu kernel: ? wait_woken+0x80/0x80
> Feb 04 09:42:28 ubuntu kernel: transaction_kthread+0x18d/0x1b0 [btrfs]
> Feb 04 09:42:28 ubuntu kernel: kthread+0x121/0x140
> Feb 04 09:42:28 ubuntu kernel: ? btrfs_cleanup_transaction+0x560/0x560
> [btrfs] Feb 04 09:42:28 ubuntu kernel: ?
> kthread_create_worker_on_cpu+0x70/0x70 Feb 04 09:42:28 ubuntu kernel:
> ?
> do_syscall_64+0x73/0x130
> Feb 04 09:42:28 ubuntu kernel: ? SyS_exit_group+0x14/0x20
After rebooting the server acted normal. The only thing I could find in
the journalctl was:
> Feb 04 02:00:02 server kernel: BTRFS info (device sda3): relocating
> block
> group 7246746484736 flags data|raid1
>
> Feb 04 02:05:23 server kernel: BTRFS info (device sda3): found 3
> extents
> Feb 04 02:06:12 server kernel: BTRFS info (device sda3): found 3
> extents
> Feb 04 02:07:01 server kernel: BTRFS info (device sda3): relocating
> block
> group 7059915407360 flags metadata|raid1
Btrfs balancing starts at 02:00.
Can anybody give me a hint what causes this?
I suspect some kind of hardware failure but can't find anything. Any
idea where to look?
My setup:
> Linux server 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC
> 2019
> x86_64 x86_64 x86_64 GNU/Linux
>
> btrfs-progs v4.15.1
>
> Label: 'rootfs' uuid: cf8c4bb2-6a75-4e1d-983c-19583a93a546
>
> Total devices 3 FS bytes used 620.55GiB
> devid 1 size 923.13GiB used 446.03GiB path /dev/sdc3
> devid 2 size 923.13GiB used 449.00GiB path /dev/sda3
> devid 3 size 923.13GiB used 447.03GiB path /dev/sdb3
>
> Data, RAID1: total=667.00GiB, used=617.65GiB
> System, RAID1: total=32.00MiB, used=176.00KiB
> Metadata, RAID1: total=4.00GiB, used=2.90GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
Dmesg output is not provided there was nothing after reboot.
Thanks
Moritz
next reply other threads:[~2019-02-04 11:55 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-04 11:47 Moritz M [this message]
2019-02-04 11:59 ` Help needed, server is unresponsive after btrfs balance Qu Wenruo
2019-02-04 12:52 ` Moritz M
[not found] ` <1a6d00fce82926ce9ec7db7bbab37c12@moritzmueller.ee>
[not found] ` <30c7e926-98c7-fd82-f587-1478d31cbf58@gmx.com>
2019-02-05 11:45 ` Moritz M
2019-02-05 12:17 ` Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6c9257eb3b6451b67bd8b082e06a7735@moritzmueller.ee \
--to=mailinglist@moritzmueller.ee \
--cc=linux-btrfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).