linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Moritz M <mailinglist@moritzmueller.ee>, linux-btrfs@vger.kernel.org
Subject: Re: Help needed, server is unresponsive after btrfs balance
Date: Mon, 4 Feb 2019 19:59:22 +0800	[thread overview]
Message-ID: <4ecaf7ef-49cb-d7f6-3535-941e44e2f469@gmx.com> (raw)
In-Reply-To: <6c9257eb3b6451b67bd8b082e06a7735@moritzmueller.ee>


[-- Attachment #1.1: Type: text/plain, Size: 4511 bytes --]



On 2019/2/4 下午7:47, Moritz M wrote:
> Hi,
> 
> I'm running a Ubuntu server with a btrfs RAID1 consisting of three HDDs.
> 
> I do balancing daily via
> 
>> btrfs balance start -dusage=50 -dlimit=2 -musage=50 -mlimit=4 /
> 
> It usually takes between 1 - 10 minutes.
> 
> But today the server was unresponsive (no ssh connect possible, no
> direct login via keyboard possible)  even after 7 hours.
> 
> I had a similar situation two weeks ago. I did not find anything and
> finally checked and repaired the filesystem with
> 
>> btrfs check --repair /dev/sda3
> 
> Which found some qgroup related problems:
> 
>> enabling repair mode
>> Checking filesystem on /dev/sda3
>> UUID: cf8c4bb2-6a75-4e1d-983c-19583a93a546
>> No device size related problem found
>> cache and super generation don't match, space cache will be invalidated
>> Counts for qgroup id: 0/257 are different
>> our:        referenced 127300112384 referenced compressed 127300112384
>> disk:        referenced 18446743939800129536 referenced compressed
>> 18446743939800129536
>> diff:        referenced 261209534464 referenced compressed 261209534464
>> our:        exclusive 56360521728 exclusive compressed 56360521728
>> disk:        exclusive 56360521728 exclusive compressed 56360521728
> …
>> Repair qgroup 0/257

You're using qgroups, it's known to cause huge performance overhead for
balance.

We have upcoming patches to solve it, but it not going to mainline
before v5.1 kernel.

So please disable qgroups if you're not using it actively.

Thanks,
Qu

> 
> Today I had to boot a Live system, mount the btrfs filessystem with
> -o skip_balance and cancel the balancing there.
> 
> Mounting took ~30 mins and in journalctl of the Live system I found this
> 
>> Feb 04 09:42:28 ubuntu kernel: INFO: task btrfs-transacti:7527 blocked
>> for
>> more than 120 seconds.
>> Feb 04 09:42:28 ubuntu kernel:       Not tainted
>> 4.15.0-29-generic #31-Ubuntu
>> Feb 04 09:42:28 ubuntu kernel: "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> Feb 04 09:42:28 ubuntu kernel: btrfs-transacti D    0  7527      2
>> 0x80000000
>> Feb 04 09:42:28 ubuntu kernel: Call Trace:
>> Feb 04 09:42:28 ubuntu kernel:  __schedule+0x291/0x8a0
>> Feb 04 09:42:28 ubuntu kernel:  schedule+0x2c/0x80
>> Feb 04 09:42:28 ubuntu kernel:  btrfs_commit_transaction+0x81d/0x8f0
>> [btrfs]
>> Feb 04 09:42:28 ubuntu kernel:  ? wait_woken+0x80/0x80
>> Feb 04 09:42:28 ubuntu kernel:  transaction_kthread+0x18d/0x1b0 [btrfs]
>> Feb 04 09:42:28 ubuntu kernel:  kthread+0x121/0x140
>> Feb 04 09:42:28 ubuntu kernel:  ? btrfs_cleanup_transaction+0x560/0x560
>> [btrfs] Feb 04 09:42:28 ubuntu kernel:  ?
>> kthread_create_worker_on_cpu+0x70/0x70 Feb 04 09:42:28 ubuntu kernel:  ?
>> do_syscall_64+0x73/0x130
>> Feb 04 09:42:28 ubuntu kernel:  ? SyS_exit_group+0x14/0x20
> 
> After rebooting the server acted normal. The only thing I could find in
> the journalctl was:
> 
>> Feb 04 02:00:02 server kernel: BTRFS info (device sda3): relocating block
>> group 7246746484736 flags data|raid1
>>
>> Feb 04 02:05:23 server kernel: BTRFS info (device sda3): found 3 extents
>> Feb 04 02:06:12 server kernel: BTRFS info (device sda3): found 3 extents
>> Feb 04 02:07:01 server kernel: BTRFS info (device sda3): relocating block
>> group 7059915407360 flags metadata|raid1
> 
> Btrfs balancing starts at 02:00.
> 
> Can anybody give me a hint what causes this?
> 
> I suspect some kind of hardware failure but can't find anything. Any
> idea where to look?
> 
> My setup:
>> Linux server 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC
>> 2019
>> x86_64 x86_64 x86_64 GNU/Linux
>>
>> btrfs-progs v4.15.1
>>
>> Label: 'rootfs'  uuid: cf8c4bb2-6a75-4e1d-983c-19583a93a546
>>
>>         Total devices 3 FS bytes used 620.55GiB
>>         devid    1 size 923.13GiB used 446.03GiB path /dev/sdc3
>>         devid    2 size 923.13GiB used 449.00GiB path /dev/sda3
>>         devid    3 size 923.13GiB used 447.03GiB path /dev/sdb3
>>
>> Data, RAID1: total=667.00GiB, used=617.65GiB
>> System, RAID1: total=32.00MiB, used=176.00KiB
>> Metadata, RAID1: total=4.00GiB, used=2.90GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Dmesg output is not provided there was nothing after reboot.
> 
> Thanks
> 
> Moritz


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-02-04 11:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-04 11:47 Help needed, server is unresponsive after btrfs balance Moritz M
2019-02-04 11:59 ` Qu Wenruo [this message]
2019-02-04 12:52   ` Moritz M
     [not found]   ` <1a6d00fce82926ce9ec7db7bbab37c12@moritzmueller.ee>
     [not found]     ` <30c7e926-98c7-fd82-f587-1478d31cbf58@gmx.com>
2019-02-05 11:45       ` Moritz M
2019-02-05 12:17         ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ecaf7ef-49cb-d7f6-3535-941e44e2f469@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=mailinglist@moritzmueller.ee \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).