All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
@ 2017-05-27 18:53 Ivan P
  2017-05-27 19:33 ` Hans van Kranenburg
  2017-05-27 19:36 ` Jean-Denis Girard
  0 siblings, 2 replies; 9+ messages in thread
From: Ivan P @ 2017-05-27 18:53 UTC (permalink / raw)
  To: linux-btrfs

Hello,

for a while now, btrfs-cleaner has been molesting my system's btrfs partition,
as well as my CPU. The behavior is as following:

After booting, nothing relevant is happening. After about 5-30 minutes,
a btrfs-cleaner process is spawned, which is constantly using one CPU core.
The btrfs-cleaner process never seems to finish (I've let it waste CPU cycles
for 9 hours) and also cannot be stopped or killed.

Rebooting again usually resolves the issue for some time.
But on next boot, the issue usually reappears.

I'm running linux 4.11.2, but the issue is also present on current LTS 4.9.29.
I am using newest btrfs-tools, as far as I can tell (4.11). The system is an
arch linux x64 installed on a Transcend 120GB mSATA drive.

No other disks are present, but the root volume contains several subvolumes
(@arch<date> snapshots, @home, @data).

The logs don't contain anything related to btrfs, beside the usual diag output
on mounting the root partition.

I am mounting the btrfs partition with the following options:

subvol=@arch_current,compress=lzo,ssd,noatime,autodefrag

What information should I provide so we could debug this?
Please add me to CC when replying, as I am not subscribed to the mailing list.

Thank you in advance,
Ivan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
  2017-05-27 18:53 btrfs-tools/linux 4.11: btrfs-cleaner misbehaving Ivan P
@ 2017-05-27 19:33 ` Hans van Kranenburg
  2017-05-27 20:29   ` Ivan P
  2017-05-27 19:36 ` Jean-Denis Girard
  1 sibling, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2017-05-27 19:33 UTC (permalink / raw)
  To: Ivan P, linux-btrfs

Hi,

On 05/27/2017 08:53 PM, Ivan P wrote:
> 
> for a while now, btrfs-cleaner has been molesting my system's btrfs partition,
> as well as my CPU. The behavior is as following:
> 
> After booting, nothing relevant is happening. After about 5-30 minutes,
> a btrfs-cleaner process is spawned, which is constantly using one CPU core.
> The btrfs-cleaner process never seems to finish (I've let it waste CPU cycles
> for 9 hours) and also cannot be stopped or killed.
> 
> Rebooting again usually resolves the issue for some time.
> But on next boot, the issue usually reappears.
> 
> I'm running linux 4.11.2, but the issue is also present on current LTS 4.9.29.
> I am using newest btrfs-tools, as far as I can tell (4.11). The system is an
> arch linux x64 installed on a Transcend 120GB mSATA drive.
> 
> No other disks are present, but the root volume contains several subvolumes
> (@arch<date> snapshots, @home, @data).
> 
> The logs don't contain anything related to btrfs, beside the usual diag output
> on mounting the root partition.
> 
> I am mounting the btrfs partition with the following options:
> 
> subvol=@arch_current,compress=lzo,ssd,noatime,autodefrag
> 
> What information should I provide so we could debug this?

What I usually do first in a similar situation is look at the output of

  watch cat /proc/<pid>/stack

where <pid> is the pid of the btrfs-cleaner thread.

This might already give an idea what kind of things it's doing, by
looking at the stack trace. When it's cleaning up a removed subvolume
for example, there will be a similar function name in the stack somewhere.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
  2017-05-27 18:53 btrfs-tools/linux 4.11: btrfs-cleaner misbehaving Ivan P
  2017-05-27 19:33 ` Hans van Kranenburg
@ 2017-05-27 19:36 ` Jean-Denis Girard
  1 sibling, 0 replies; 9+ messages in thread
From: Jean-Denis Girard @ 2017-05-27 19:36 UTC (permalink / raw)
  To: linux-btrfs

Le 27/05/2017 à 08:53, Ivan P a écrit :
> Hello,
> 
> for a while now, btrfs-cleaner has been molesting my system's btrfs partition,
> as well as my CPU. The behavior is as following:
> 
> After booting, nothing relevant is happening. After about 5-30 minutes,
> a btrfs-cleaner process is spawned, which is constantly using one CPU core.
> The btrfs-cleaner process never seems to finish (I've let it waste CPU cycles
> for 9 hours) and also cannot be stopped or killed.

I have seen the exact same behaviour with older kernels (4.4, 4.7, 4.9),
see http://www.spinics.net/lists/linux-btrfs/msg58111.html

It seems to be related to autodefrag, rebooting without autodefrag,
manually defragmenting, and rebooting again with autodefrag worked for me.


Thanks,
-- 
Jean-Denis Girard

SysNux                  Systèmes    Linux    en  Polynésie  française
http://www.sysnux.pf/   Tél: +689 40.50.10.40 / GSM: +689 87.79.75.27


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
  2017-05-27 19:33 ` Hans van Kranenburg
@ 2017-05-27 20:29   ` Ivan P
  2017-05-27 20:42     ` Hans van Kranenburg
  0 siblings, 1 reply; 9+ messages in thread
From: Ivan P @ 2017-05-27 20:29 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-btrfs

On Sat, May 27, 2017 at 9:33 PM, Hans van Kranenburg
<hans.van.kranenburg@mendix.com> wrote:
> Hi,
>
> On 05/27/2017 08:53 PM, Ivan P wrote:
>>
>> for a while now, btrfs-cleaner has been molesting my system's btrfs partition,
>> as well as my CPU. The behavior is as following:
>>
>> After booting, nothing relevant is happening. After about 5-30 minutes,
>> a btrfs-cleaner process is spawned, which is constantly using one CPU core.
>> The btrfs-cleaner process never seems to finish (I've let it waste CPU cycles
>> for 9 hours) and also cannot be stopped or killed.
>>
>> Rebooting again usually resolves the issue for some time.
>> But on next boot, the issue usually reappears.
>>
>> I'm running linux 4.11.2, but the issue is also present on current LTS 4.9.29.
>> I am using newest btrfs-tools, as far as I can tell (4.11). The system is an
>> arch linux x64 installed on a Transcend 120GB mSATA drive.
>>
>> No other disks are present, but the root volume contains several subvolumes
>> (@arch<date> snapshots, @home, @data).
>>
>> The logs don't contain anything related to btrfs, beside the usual diag output
>> on mounting the root partition.
>>
>> I am mounting the btrfs partition with the following options:
>>
>> subvol=@arch_current,compress=lzo,ssd,noatime,autodefrag
>>
>> What information should I provide so we could debug this?
>
> What I usually do first in a similar situation is look at the output of
>
>   watch cat /proc/<pid>/stack
>
> where <pid> is the pid of the btrfs-cleaner thread.
>
> This might already give an idea what kind of things it's doing, by
> looking at the stack trace. When it's cleaning up a removed subvolume
> for example, there will be a similar function name in the stack somewhere.
>
> --
> Hans van Kranenburg

Thank you for the fast reply.

Most of the time, the stack is just 0xffffffffffffffff, even though
CPU load is generated.
These repeat all the time, but addresses stay the same:

[<ffffffffa0444f19>] get_alloc_profile+0xa9/0x1a0 [btrfs]
[<ffffffffa04450d2>] can_overcommit+0xc2/0x110 [btrfs]
[<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
[<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
[<ffffffffffffffff>] 0xffffffffffffffff

So far, these appeared only once or twice:

[<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
[<ffffffff81316d66>] __radix_tree_lookup+0x76/0xf0
[<ffffffff81316e3d>] radix_tree_lookup+0xd/0x10
[<ffffffff8118121f>] __do_page_cache_readahead+0x10f/0x2f0
[<ffffffff81181593>] ondemand_readahead+0x193/0x2c0
[<ffffffff8118185e>] page_cache_sync_readahead+0x2e/0x50
[<ffffffffa04a23ab>] btrfs_defrag_file+0x9fb/0xf90 [btrfs]
[<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
[<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
[<ffffffff810a04d8>] kthread+0x108/0x140
[<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffff81003016>] ___preempt_schedule+0x16/0x18
[<ffffffffa0487556>] __clear_extent_bit+0x2a6/0x3e0 [btrfs]
[<ffffffffa0487c57>] clear_extent_bit+0x17/0x20 [btrfs]
[<ffffffffa04a26fa>] btrfs_defrag_file+0xd4a/0xf90 [btrfs]
[<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
[<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
[<ffffffff810a04d8>] kthread+0x108/0x140
[<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

[<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
[<ffffffff810ce28a>] __rcu_read_unlock+0x4a/0x60
[<ffffffff8118000b>] __set_page_dirty_nobuffers+0xdb/0x170
[<ffffffffa0468c1e>] btrfs_set_page_dirty+0xe/0x10 [btrfs]
[<ffffffff8117dd7b>] set_page_dirty+0x5b/0xb0
[<ffffffffa04a274e>] btrfs_defrag_file+0xd9e/0xf90 [btrfs]
[<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
[<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
[<ffffffff810a04d8>] kthread+0x108/0x140
[<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
[<ffffffffffffffff>] 0xffffffffffffffff

Forgot to mention that I have tried running a scrub, but it neither
reported any errors nor solved the issue.

Regards,
Ivan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
  2017-05-27 20:29   ` Ivan P
@ 2017-05-27 20:42     ` Hans van Kranenburg
  2017-05-27 20:54       ` Ivan P
  0 siblings, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2017-05-27 20:42 UTC (permalink / raw)
  To: Ivan P; +Cc: linux-btrfs

On 05/27/2017 10:29 PM, Ivan P wrote:
> On Sat, May 27, 2017 at 9:33 PM, Hans van Kranenburg
> <hans.van.kranenburg@mendix.com> wrote:
>> Hi,
>>
>> On 05/27/2017 08:53 PM, Ivan P wrote:
>>>
>>> for a while now, btrfs-cleaner has been molesting my system's btrfs partition,
>>> as well as my CPU. The behavior is as following:
>>>
>>> After booting, nothing relevant is happening. After about 5-30 minutes,
>>> a btrfs-cleaner process is spawned, which is constantly using one CPU core.
>>> The btrfs-cleaner process never seems to finish (I've let it waste CPU cycles
>>> for 9 hours) and also cannot be stopped or killed.
>>>
>>> Rebooting again usually resolves the issue for some time.
>>> But on next boot, the issue usually reappears.
>>>
>>> I'm running linux 4.11.2, but the issue is also present on current LTS 4.9.29.
>>> I am using newest btrfs-tools, as far as I can tell (4.11). The system is an
>>> arch linux x64 installed on a Transcend 120GB mSATA drive.
>>>
>>> No other disks are present, but the root volume contains several subvolumes
>>> (@arch<date> snapshots, @home, @data).
>>>
>>> The logs don't contain anything related to btrfs, beside the usual diag output
>>> on mounting the root partition.
>>>
>>> I am mounting the btrfs partition with the following options:
>>>
>>> subvol=@arch_current,compress=lzo,ssd,noatime,autodefrag
>>>
>>> What information should I provide so we could debug this?
>>
>> What I usually do first in a similar situation is look at the output of
>>
>>   watch cat /proc/<pid>/stack
>>
>> where <pid> is the pid of the btrfs-cleaner thread.
>>
>> This might already give an idea what kind of things it's doing, by
>> looking at the stack trace. When it's cleaning up a removed subvolume
>> for example, there will be a similar function name in the stack somewhere.
>>
>> --
>> Hans van Kranenburg
> 
> Thank you for the fast reply.
> 
> Most of the time, the stack is just 0xffffffffffffffff, even though
> CPU load is generated.
> These repeat all the time, but addresses stay the same:
> 
> [<ffffffffa0444f19>] get_alloc_profile+0xa9/0x1a0 [btrfs]
> [<ffffffffa04450d2>] can_overcommit+0xc2/0x110 [btrfs]
> [<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
> [<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> So far, these appeared only once or twice:
> 
> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
> [<ffffffff81316d66>] __radix_tree_lookup+0x76/0xf0
> [<ffffffff81316e3d>] radix_tree_lookup+0xd/0x10
> [<ffffffff8118121f>] __do_page_cache_readahead+0x10f/0x2f0
> [<ffffffff81181593>] ondemand_readahead+0x193/0x2c0
> [<ffffffff8118185e>] page_cache_sync_readahead+0x2e/0x50
> [<ffffffffa04a23ab>] btrfs_defrag_file+0x9fb/0xf90 [btrfs]
> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
> [<ffffffff810a04d8>] kthread+0x108/0x140
> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffff81003016>] ___preempt_schedule+0x16/0x18
> [<ffffffffa0487556>] __clear_extent_bit+0x2a6/0x3e0 [btrfs]
> [<ffffffffa0487c57>] clear_extent_bit+0x17/0x20 [btrfs]
> [<ffffffffa04a26fa>] btrfs_defrag_file+0xd4a/0xf90 [btrfs]
> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
> [<ffffffff810a04d8>] kthread+0x108/0x140
> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
> [<ffffffff810ce28a>] __rcu_read_unlock+0x4a/0x60
> [<ffffffff8118000b>] __set_page_dirty_nobuffers+0xdb/0x170
> [<ffffffffa0468c1e>] btrfs_set_page_dirty+0xe/0x10 [btrfs]
> [<ffffffff8117dd7b>] set_page_dirty+0x5b/0xb0
> [<ffffffffa04a274e>] btrfs_defrag_file+0xd9e/0xf90 [btrfs]
> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
> [<ffffffff810a04d8>] kthread+0x108/0x140
> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Forgot to mention that I have tried running a scrub, but it neither
> reported any errors nor solved the issue.

defrag actions called from cleaner_kthread. Looks like what Jean-Denis
suggested already.

Does the behaviour change when you disable autodefrag? You can also do
this live with mount -o remount,noautodefrag

Apparently your write pattern is some kind of worst case combined with
autodefrag? I'm not an expert in this area, but probably someone else
knows more.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
  2017-05-27 20:42     ` Hans van Kranenburg
@ 2017-05-27 20:54       ` Ivan P
  2017-05-28  4:55         ` Duncan
  0 siblings, 1 reply; 9+ messages in thread
From: Ivan P @ 2017-05-27 20:54 UTC (permalink / raw)
  To: Hans van Kranenburg; +Cc: linux-btrfs

On Sat, May 27, 2017 at 10:42 PM, Hans van Kranenburg
<hans.van.kranenburg@mendix.com> wrote:
> On 05/27/2017 10:29 PM, Ivan P wrote:
>> On Sat, May 27, 2017 at 9:33 PM, Hans van Kranenburg
>> <hans.van.kranenburg@mendix.com> wrote:
>>> Hi,
>>>
>>> On 05/27/2017 08:53 PM, Ivan P wrote:
>>>>
>>>> for a while now, btrfs-cleaner has been molesting my system's btrfs partition,
>>>> as well as my CPU. The behavior is as following:
>>>>
>>>> After booting, nothing relevant is happening. After about 5-30 minutes,
>>>> a btrfs-cleaner process is spawned, which is constantly using one CPU core.
>>>> The btrfs-cleaner process never seems to finish (I've let it waste CPU cycles
>>>> for 9 hours) and also cannot be stopped or killed.
>>>>
>>>> Rebooting again usually resolves the issue for some time.
>>>> But on next boot, the issue usually reappears.
>>>>
>>>> I'm running linux 4.11.2, but the issue is also present on current LTS 4.9.29.
>>>> I am using newest btrfs-tools, as far as I can tell (4.11). The system is an
>>>> arch linux x64 installed on a Transcend 120GB mSATA drive.
>>>>
>>>> No other disks are present, but the root volume contains several subvolumes
>>>> (@arch<date> snapshots, @home, @data).
>>>>
>>>> The logs don't contain anything related to btrfs, beside the usual diag output
>>>> on mounting the root partition.
>>>>
>>>> I am mounting the btrfs partition with the following options:
>>>>
>>>> subvol=@arch_current,compress=lzo,ssd,noatime,autodefrag
>>>>
>>>> What information should I provide so we could debug this?
>>>
>>> What I usually do first in a similar situation is look at the output of
>>>
>>>   watch cat /proc/<pid>/stack
>>>
>>> where <pid> is the pid of the btrfs-cleaner thread.
>>>
>>> This might already give an idea what kind of things it's doing, by
>>> looking at the stack trace. When it's cleaning up a removed subvolume
>>> for example, there will be a similar function name in the stack somewhere.
>>>
>>> --
>>> Hans van Kranenburg
>>
>> Thank you for the fast reply.
>>
>> Most of the time, the stack is just 0xffffffffffffffff, even though
>> CPU load is generated.
>> These repeat all the time, but addresses stay the same:
>>
>> [<ffffffffa0444f19>] get_alloc_profile+0xa9/0x1a0 [btrfs]
>> [<ffffffffa04450d2>] can_overcommit+0xc2/0x110 [btrfs]
>> [<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> [<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
>> [<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> [<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> So far, these appeared only once or twice:
>>
>> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
>> [<ffffffff81316d66>] __radix_tree_lookup+0x76/0xf0
>> [<ffffffff81316e3d>] radix_tree_lookup+0xd/0x10
>> [<ffffffff8118121f>] __do_page_cache_readahead+0x10f/0x2f0
>> [<ffffffff81181593>] ondemand_readahead+0x193/0x2c0
>> [<ffffffff8118185e>] page_cache_sync_readahead+0x2e/0x50
>> [<ffffffffa04a23ab>] btrfs_defrag_file+0x9fb/0xf90 [btrfs]
>> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
>> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
>> [<ffffffff810a04d8>] kthread+0x108/0x140
>> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> [<ffffffff81003016>] ___preempt_schedule+0x16/0x18
>> [<ffffffffa0487556>] __clear_extent_bit+0x2a6/0x3e0 [btrfs]
>> [<ffffffffa0487c57>] clear_extent_bit+0x17/0x20 [btrfs]
>> [<ffffffffa04a26fa>] btrfs_defrag_file+0xd4a/0xf90 [btrfs]
>> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
>> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
>> [<ffffffff810a04d8>] kthread+0x108/0x140
>> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
>> [<ffffffff810ce28a>] __rcu_read_unlock+0x4a/0x60
>> [<ffffffff8118000b>] __set_page_dirty_nobuffers+0xdb/0x170
>> [<ffffffffa0468c1e>] btrfs_set_page_dirty+0xe/0x10 [btrfs]
>> [<ffffffff8117dd7b>] set_page_dirty+0x5b/0xb0
>> [<ffffffffa04a274e>] btrfs_defrag_file+0xd9e/0xf90 [btrfs]
>> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
>> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
>> [<ffffffff810a04d8>] kthread+0x108/0x140
>> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Forgot to mention that I have tried running a scrub, but it neither
>> reported any errors nor solved the issue.
>
> defrag actions called from cleaner_kthread. Looks like what Jean-Denis
> suggested already.
>
> Does the behaviour change when you disable autodefrag? You can also do
> this live with mount -o remount,noautodefrag
>
> Apparently your write pattern is some kind of worst case combined with
> autodefrag? I'm not an expert in this area, but probably someone else
> knows more.
>
> --
> Hans van Kranenburg

Hmm, remounting as you suggested has shut it up immediately - hurray!

I don't really have any special write pattern from what I can tell. About
the only thing different from all the other btrfs systems I've set up is
that the data is also on the same volume as the system. Normal usage, no
VMs or heavy file generation. I'm also only taking snapshots of the system
and @home, with the latter only containing my .config, .cache and
symlinks to some folders in @data.

Is there any way I can help debugging this further, or should I just defrag
my volume manually as Jean-Denis Girard suggested and move on?

Regards,
Ivan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
  2017-05-27 20:54       ` Ivan P
@ 2017-05-28  4:55         ` Duncan
  2017-05-28  7:13           ` Marat Khalili
  0 siblings, 1 reply; 9+ messages in thread
From: Duncan @ 2017-05-28  4:55 UTC (permalink / raw)
  To: linux-btrfs

Ivan P posted on Sat, 27 May 2017 22:54:31 +0200 as excerpted:

>>>>> Please add me to CC when replying, as I am not
>>>>> subscribed to the mailing list.

> Hmm, remounting as you suggested has shut it up immediately - hurray!
> 
> I don't really have any special write pattern from what I can tell.
> About the only thing different from all the other btrfs systems I've set
> up is that the data is also on the same volume as the system. Normal
> usage, no VMs or heavy file generation. I'm also only taking snapshots
> of the system and @home, with the latter only containing my .config,
> .cache and symlinks to some folders in @data.

Systemd?  Journald with journals on btrfs?  Regularly snapshotting that 
subvolume?

If yes to all of the above, that might be the issue.  Normally systemd 
will set the journal directory NOCOW, so the journal files inherit it at 
creation, in ordered to avoid heavy fragmentation due to the COW-
unfriendly database-style file-internal-rewrite pattern with the journal 
files.  

Great.  Except that snapshotting locks the existing version of the file 
in place with the snapshot, so the next write to any block must be COW 
anyway.  This is sometimes referred to as COW1, since it's a single-time 
COW, and the effect isn't too bad with a one-time snapshot.  But if 
you're regularly snapshotting the journal files, that will trigger COW1 
on every snapshot, which if you're snapshotting often enough can be 
almost as bad as regular COW in terms of fragmentation.

The fix is to make the journal dir a subvolume instead, thereby excluding 
it from the snapshot taken on the parent subvolume, and just don't 
snapshot the journal subvolume then, so the NOCOW that systemd should 
already set on that subdir and its contents will actually be NOCOW, 
without interference from snapshotting repeatedly forcing COW1.


Of course an alternative fix, the one I use here (and am happy with) 
instead, is to have a normal syslog (I use syslog-ng, but others have 
reported using rsyslog) handling your saved logs in traditional text form 
(most modern syslogs should cooperate with systemd's journald), and 
configure journald to only use tmpfs (see the journald.conf manpage).  
Traditional text logs are append-only and not nearly as bad in COW 
terms.  Meanwhile, journald is still active, just writing to tmpfs only, 
so you get a journal for the current boot session and thus can still take 
advantage of all the usual systemd/journald features such as systemctl 
status spitting out the last 10 log entries for that service, etc.  It's 
just limited to the current boot session, and you use the normal text 
logs for anything older than that.  For me anyway that's the best of both 
worlds, and I don't have to worry about how the journal files behave on 
btrfs at all, because they're not written to btrfs at all. =:^)


Meanwhile, since you mentioned snapshots, a word of caution there.  If 
you do have scripted snapshots being taken, be sure you have a script 
thinning down your snapshot history as well.  More than 200-300 snapshots 
per subvolume scales very poorly in btrfs maintenance terms (and qgroups 
make the problem far worse, if you have them active at all).  But if for 
instance you're taking snapshots ever hour, if you need something from 
one say a month old, are you really going to remember or care which exact 
hour it was, or will the daily either before or after that hour be fine, 
and actually much easier to find if you've trimmed to daily by then, as 
opposed to having hundreds and hundreds of hourly snapshots accumulating?

So snapshots are great but they don't come without cost, and if you keep 
under 200 and if possible under 100 per subvolume, you'll find 
maintenance such as balance and check (fsck) go much faster than they do 
with even 500, let alone thousands.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
  2017-05-28  4:55         ` Duncan
@ 2017-05-28  7:13           ` Marat Khalili
  0 siblings, 0 replies; 9+ messages in thread
From: Marat Khalili @ 2017-05-28  7:13 UTC (permalink / raw)
  To: linux-btrfs

> If
> you do have scripted snapshots being taken, be sure you have a script
> thinning down your snapshot history as well.


I know Ivan P never mentioned qgroups, but just a warning for future 
readers: *with qgroups don't let this script delete more than couple 
dozen snapshots at once*, then wait for btrfs kernel activity to subside 
before trying again. Especially be very careful when running this script 
in production the very first time, it will most likely find too many 
snapshots to delete. (Temporary removing all affected qgroups beforehand 
may also work.)


--

With Best Regards,
Marat Khalili


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving
       [not found] <20170527215608.23a40176@ws>
@ 2017-05-28 10:39 ` Ivan P
  0 siblings, 0 replies; 9+ messages in thread
From: Ivan P @ 2017-05-28 10:39 UTC (permalink / raw)
  To: linux-btrfs

On Sun, May 28, 2017 at 6:56 AM, Duncan <1i5t5.duncan@cox.net> wrote:
> [This mail was also posted to gmane.comp.file-systems.btrfs.]
>
> Ivan P posted on Sat, 27 May 2017 22:54:31 +0200 as excerpted:
>
>>>>>> Please add me to CC when replying, as I am not
>>>>>> subscribed to the mailing list.
>
>> Hmm, remounting as you suggested has shut it up immediately - hurray!
>>
>> I don't really have any special write pattern from what I can tell.
>> About the only thing different from all the other btrfs systems I've
>> set up is that the data is also on the same volume as the system.
>> Normal usage, no VMs or heavy file generation. I'm also only taking
>> snapshots of the system and @home, with the latter only containing
>> my .config, .cache and symlinks to some folders in @data.
>
> Systemd?  Journald with journals on btrfs?  Regularly snapshotting that
> subvolume?
>
> If yes to all of the above, that might be the issue.  Normally systemd
> will set the journal directory NOCOW, so the journal files inherit it
> at creation, in ordered to avoid heavy fragmentation due to the COW-
> unfriendly database-style file-internal-rewrite pattern with the
> journal files.
>
> Great.  Except that snapshotting locks the existing version of the file
> in place with the snapshot, so the next write to any block must be COW
> anyway.  This is sometimes referred to as COW1, since it's a
> single-time COW, and the effect isn't too bad with a one-time
> snapshot.  But if you're regularly snapshotting the journal files, that
> will trigger COW1 on every snapshot, which if you're snapshotting often
> enough can be almost as bad as regular COW in terms of fragmentation.
>
> The fix is to make the journal dir a subvolume instead, thereby
> excluding it from the snapshot taken on the parent subvolume, and just
> don't snapshot the journal subvolume then, so the NOCOW that systemd
> should already set on that subdir and its contents will actually be
> NOCOW, without interference from snapshotting repeatedly forcing COW1.
>
>
> Of course an alternative fix, the one I use here (and am happy with)
> instead, is to have a normal syslog (I use syslog-ng, but others have
> reported using rsyslog) handling your saved logs in traditional text
> form (most modern syslogs should cooperate with systemd's journald),
> and configure journald to only use tmpfs (see the journald.conf
> manpage). Traditional text logs are append-only and not nearly as bad
> in COW terms.  Meanwhile, journald is still active, just writing to
> tmpfs only, so you get a journal for the current boot session and thus
> can still take advantage of all the usual systemd/journald features
> such as systemctl status spitting out the last 10 log entries for that
> service, etc.  It's just limited to the current boot session, and you
> use the normal text logs for anything older than that.  For me anyway
> that's the best of both worlds, and I don't have to worry about how the
> journal files behave on btrfs at all, because they're not written to
> btrfs at all. =:^)
>
>
> Meanwhile, since you mentioned snapshots, a word of caution there.  If
> you do have scripted snapshots being taken, be sure you have a script
> thinning down your snapshot history as well.  More than 200-300
> snapshots per subvolume scales very poorly in btrfs maintenance terms
> (and qgroups make the problem far worse, if you have them active at
> all).  But if for instance you're taking snapshots ever hour, if you
> need something from one say a month old, are you really going to
> remember or care which exact hour it was, or will the daily either
> before or after that hour be fine, and actually much easier to find if
> you've trimmed to daily by then, as opposed to having hundreds and
> hundreds of hourly snapshots accumulating?
>
> So snapshots are great but they don't come without cost, and if you
> keep under 200 and if possible under 100 per subvolume, you'll find
> maintenance such as balance and check (fsck) go much faster than they
> do with even 500, let alone thousands.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman

I haven't had any issues like this before on another two boxes which ran
for years with systemd and journald, so I'm rather surprised this is a problem.

It does make sense for journald to fragment the disk, but isn't that what
autodefrag is for? The weird thing is that btrfs-cleaner can't seem to be
able to ever finish the work it is doing. Which would mean the work piles up
constantly, without getting done...

At the moment I am at 9 system snapshots and 5 @home snapshots, which
IMHO btrfs should be able to handle. The other boxes have about the same
number of snapshots and one of them is running 24/7 as a home server.

The snapshots are not automated, I take a snapshot of @arch_current and
@home using a script before updating my system. So the snapshot interval
is very irregular. I also try clean up old snapshots, only leaving about three
newest system snapshots on disk, though I haven't done that recently.

Oh, an I'm not using any qgroups, not that I know, at least.

Regards,
Ivan.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-05-28 10:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-27 18:53 btrfs-tools/linux 4.11: btrfs-cleaner misbehaving Ivan P
2017-05-27 19:33 ` Hans van Kranenburg
2017-05-27 20:29   ` Ivan P
2017-05-27 20:42     ` Hans van Kranenburg
2017-05-27 20:54       ` Ivan P
2017-05-28  4:55         ` Duncan
2017-05-28  7:13           ` Marat Khalili
2017-05-27 19:36 ` Jean-Denis Girard
     [not found] <20170527215608.23a40176@ws>
2017-05-28 10:39 ` Ivan P

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.