All of lore.kernel.org
 help / color / mirror / Atom feed
* btrfs-transacti hammering the system
@ 2017-12-01 14:25 Matt McKinnon
  2017-12-01 14:52 ` Hans van Kranenburg
  0 siblings, 1 reply; 20+ messages in thread
From: Matt McKinnon @ 2017-12-01 14:25 UTC (permalink / raw)
  To: linux-btrfs

Hi All,

Is there any way to figure out what exactly btrfs-transacti is chugging 
on?  I have a few file systems that seem to get wedged for days on end 
with this process pegged around 100%.  I've stopped all snapshots, made 
sure no quotas were enabled, turned on autodefrag in the mount options, 
tried manual defragging, kernel upgrades, yet still this brings my 
system to a crawl.

Network I/O to the system seems very tiny.  The only I/O I see to the 
disk is btrfs-transacti writing a couple M/s.

# time touch foo

real    2m54.303s
user    0m0.000s
sys     0m0.002s

# uname -r
4.12.8-custom

# btrfs --version
btrfs-progs v4.13.3

Yes, I know I'm a bit behind there...

-Matt




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 14:25 btrfs-transacti hammering the system Matt McKinnon
@ 2017-12-01 14:52 ` Hans van Kranenburg
  2017-12-01 15:24   ` Matt McKinnon
  0 siblings, 1 reply; 20+ messages in thread
From: Hans van Kranenburg @ 2017-12-01 14:52 UTC (permalink / raw)
  To: Matt McKinnon, linux-btrfs

On 12/01/2017 03:25 PM, Matt McKinnon wrote:
> 
> Is there any way to figure out what exactly btrfs-transacti is chugging
> on?  I have a few file systems that seem to get wedged for days on end
> with this process pegged around 100%.  I've stopped all snapshots, made
> sure no quotas were enabled, turned on autodefrag in the mount options,
> tried manual defragging, kernel upgrades, yet still this brings my
> system to a crawl.
> 
> Network I/O to the system seems very tiny.  The only I/O I see to the
> disk is btrfs-transacti writing a couple M/s.
> 
> # time touch foo
> 
> real    2m54.303s
> user    0m0.000s
> sys     0m0.002s
> 
> # uname -r
> 4.12.8-custom
> 
> # btrfs --version
> btrfs-progs v4.13.3
> 
> Yes, I know I'm a bit behind there...

One of the simple things you can do is watch the stack traces of the
kernel thread.

watch 'cat /proc/<pid>/stack'

where <pid> is the pid of the btrfs-transaction process.

In there, you will see a pattern of reoccuring things, like, it's
searching for free space, it's writing out free space cache, or other
things. Correlate this with the disk write traffic and see if we get a
step further.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 14:52 ` Hans van Kranenburg
@ 2017-12-01 15:24   ` Matt McKinnon
  2017-12-01 15:39     ` Hans van Kranenburg
  0 siblings, 1 reply; 20+ messages in thread
From: Matt McKinnon @ 2017-12-01 15:24 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs

Thanks for this.  Here's what I get:


[<ffffffffc084ba73>] transaction_kthread+0x133/0x1c0 [btrfs]
[<ffffffffaa09b839>] kthread+0x109/0x140
[<ffffffffaa8459f5>] ret_from_fork+0x25/0x30

...

[<ffffffffaa0a8406>] io_schedule+0x16/0x40
[<ffffffffaa3b3cde>] get_request+0x23e/0x720
[<ffffffffaa3b6861>] blk_queue_bio+0xc1/0x3a0
[<ffffffffaa3b4a88>] generic_make_request+0xf8/0x2a0
[<ffffffffaa3b4ca5>] submit_bio+0x75/0x150
[<ffffffffc087fac5>] btrfs_map_bio+0xe5/0x2f0 [btrfs]
[<ffffffffc084834c>] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
[<ffffffffc086f1e3>] submit_one_bio+0x63/0xa0 [btrfs]
[<ffffffffc086f39b>] flush_epd_write_bio+0x3b/0x50 [btrfs]
[<ffffffffc086f3be>] flush_write_bio+0xe/0x10 [btrfs]
[<ffffffffc08777a9>] btree_write_cache_pages+0x379/0x450 [btrfs]
[<ffffffffc08478ed>] btree_writepages+0x5d/0x70 [btrfs]
[<ffffffffaa1a326c>] do_writepages+0x1c/0x70
[<ffffffffaa196f2a>] __filemap_fdatawrite_range+0xaa/0xe0
[<ffffffffaa197023>] filemap_fdatawrite_range+0x13/0x20
[<ffffffffc084fba9>] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
[<ffffffffc084fc4d>] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 
[btrfs]
[<ffffffffc0851645>] btrfs_commit_transaction+0x665/0x900 [btrfs]

...

[<ffffffffaa0a8406>] io_schedule+0x16/0x40
[<ffffffffaa1959c8>] wait_on_page_bit+0xe8/0x120
[<ffffffffc087639d>] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs]
[<ffffffffc0846fcf>] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs]
[<ffffffffc0848542>] read_tree_block+0x32/0x50 [btrfs]
[<ffffffffc0828980>] read_block_for_search.isra.32+0x120/0x2e0 [btrfs]
[<ffffffffc082daa5>] btrfs_next_old_leaf+0x215/0x400 [btrfs]
[<ffffffffc082dca0>] btrfs_next_leaf+0x10/0x20 [btrfs]
[<ffffffffc0843c3e>] btrfs_lookup_csums_range+0x12e/0x410 [btrfs]
[<ffffffffc08d09ea>] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs]
[<ffffffffc08596a2>] run_delalloc_nocow+0x9b2/0xa10 [btrfs]
[<ffffffffc0859768>] run_delalloc_range+0x68/0x340 [btrfs]
[<ffffffffc0872070>] writepage_delalloc.isra.47+0xf0/0x140 [btrfs]
[<ffffffffc0872f97>] __extent_writepage+0xc7/0x290 [btrfs]
[<ffffffffc0873415>] extent_write_cache_pages.constprop.53+0x2b5/0x450 
[btrfs]
[<ffffffffc0874fed>] extent_writepages+0x4d/0x70 [btrfs]
[<ffffffffc0852d88>] btrfs_writepages+0x28/0x30 [btrfs]
[<ffffffffaa1a326c>] do_writepages+0x1c/0x70
[<ffffffffaa196f2a>] __filemap_fdatawrite_range+0xaa/0xe0
[<ffffffffaa197023>] filemap_fdatawrite_range+0x13/0x20
[<ffffffffc08695c0>] btrfs_fdatawrite_range+0x20/0x50 [btrfs]
[<ffffffffc089abf9>] __btrfs_write_out_cache+0x3d9/0x420 [btrfs]
[<ffffffffc089b066>] btrfs_write_out_cache+0x86/0x100 [btrfs]
[<ffffffffc083bc61>] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs]
[<ffffffffc084e83b>] commit_cowonly_roots+0x1fb/0x290 [btrfs]
[<ffffffffc0851414>] btrfs_commit_transaction+0x434/0x900 [btrfs]

...

[<ffffffffc08992d7>] tree_search_offset.isra.23+0x37/0x1d0 [btrfs]


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 15:24   ` Matt McKinnon
@ 2017-12-01 15:39     ` Hans van Kranenburg
  2017-12-01 15:42       ` Matt McKinnon
  2017-12-01 16:31       ` Matt McKinnon
  0 siblings, 2 replies; 20+ messages in thread
From: Hans van Kranenburg @ 2017-12-01 15:39 UTC (permalink / raw)
  To: Matt McKinnon, linux-btrfs

On 12/01/2017 04:24 PM, Matt McKinnon wrote:
> Thanks for this.  Here's what I get:

Ok, and which one is displaying most of the time?

> [...]
> 
> [<ffffffffaa0a8406>] io_schedule+0x16/0x40
> [<ffffffffaa3b3cde>] get_request+0x23e/0x720
> [<ffffffffaa3b6861>] blk_queue_bio+0xc1/0x3a0
> [<ffffffffaa3b4a88>] generic_make_request+0xf8/0x2a0
> [<ffffffffaa3b4ca5>] submit_bio+0x75/0x150
> [<ffffffffc087fac5>] btrfs_map_bio+0xe5/0x2f0 [btrfs]
> [<ffffffffc084834c>] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
> [<ffffffffc086f1e3>] submit_one_bio+0x63/0xa0 [btrfs]
> [<ffffffffc086f39b>] flush_epd_write_bio+0x3b/0x50 [btrfs]
> [<ffffffffc086f3be>] flush_write_bio+0xe/0x10 [btrfs]
> [<ffffffffc08777a9>] btree_write_cache_pages+0x379/0x450 [btrfs]
> [<ffffffffc08478ed>] btree_writepages+0x5d/0x70 [btrfs]
> [<ffffffffaa1a326c>] do_writepages+0x1c/0x70
> [<ffffffffaa196f2a>] __filemap_fdatawrite_range+0xaa/0xe0
> [<ffffffffaa197023>] filemap_fdatawrite_range+0x13/0x20
> [<ffffffffc084fba9>] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
> [<ffffffffc084fc4d>] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80
> [btrfs]
> [<ffffffffc0851645>] btrfs_commit_transaction+0x665/0x900 [btrfs]
> 
> [...]
> 
> [<ffffffffaa0a8406>] io_schedule+0x16/0x40
> [<ffffffffaa1959c8>] wait_on_page_bit+0xe8/0x120
> [<ffffffffc087639d>] read_extent_buffer_pages+0x1cd/0x2e0 [btrfs]
> [<ffffffffc0846fcf>] btree_read_extent_buffer_pages+0x9f/0x100 [btrfs]
> [<ffffffffc0848542>] read_tree_block+0x32/0x50 [btrfs]
> [<ffffffffc0828980>] read_block_for_search.isra.32+0x120/0x2e0 [btrfs]
> [<ffffffffc082daa5>] btrfs_next_old_leaf+0x215/0x400 [btrfs]
> [<ffffffffc082dca0>] btrfs_next_leaf+0x10/0x20 [btrfs]
> [<ffffffffc0843c3e>] btrfs_lookup_csums_range+0x12e/0x410 [btrfs]
> [<ffffffffc08d09ea>] csum_exist_in_range.isra.49+0x2a/0x81 [btrfs]
> [<ffffffffc08596a2>] run_delalloc_nocow+0x9b2/0xa10 [btrfs]
> [<ffffffffc0859768>] run_delalloc_range+0x68/0x340 [btrfs]
> [<ffffffffc0872070>] writepage_delalloc.isra.47+0xf0/0x140 [btrfs]
> [<ffffffffc0872f97>] __extent_writepage+0xc7/0x290 [btrfs]
> [<ffffffffc0873415>] extent_write_cache_pages.constprop.53+0x2b5/0x450
> [btrfs]
> [<ffffffffc0874fed>] extent_writepages+0x4d/0x70 [btrfs]
> [<ffffffffc0852d88>] btrfs_writepages+0x28/0x30 [btrfs]
> [<ffffffffaa1a326c>] do_writepages+0x1c/0x70
> [<ffffffffaa196f2a>] __filemap_fdatawrite_range+0xaa/0xe0
> [<ffffffffaa197023>] filemap_fdatawrite_range+0x13/0x20
> [<ffffffffc08695c0>] btrfs_fdatawrite_range+0x20/0x50 [btrfs]
> [<ffffffffc089abf9>] __btrfs_write_out_cache+0x3d9/0x420 [btrfs]
> [<ffffffffc089b066>] btrfs_write_out_cache+0x86/0x100 [btrfs]
> [<ffffffffc083bc61>] btrfs_write_dirty_block_groups+0x261/0x390 [btrfs]
> [<ffffffffc084e83b>] commit_cowonly_roots+0x1fb/0x290 [btrfs]
> [<ffffffffc0851414>] btrfs_commit_transaction+0x434/0x900 [btrfs]

1) The one right above, btrfs_write_out_cache, is the write-out of the
free space cache v1. Do you see this for multiple seconds going on, and
does it match the time when it's writing X MB/s to disk?

2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say?

3) What kind of workload are you running? E.g. how can you describe it
within a range from "big files which just sit there" to "small writes
and deletes all over the place all the time"?

4) What kernel version is this? `uname -a` output?


-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 15:39     ` Hans van Kranenburg
@ 2017-12-01 15:42       ` Matt McKinnon
  2017-12-01 16:31       ` Matt McKinnon
  1 sibling, 0 replies; 20+ messages in thread
From: Matt McKinnon @ 2017-12-01 15:42 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs

These seem to come up most often:

[<ffffffffc084ba73>] transaction_kthread+0x133/0x1c0 [btrfs]
[<ffffffffaa09b839>] kthread+0x109/0x140
[<ffffffffaa8459f5>] ret_from_fork+0x25/0x30



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 15:39     ` Hans van Kranenburg
  2017-12-01 15:42       ` Matt McKinnon
@ 2017-12-01 16:31       ` Matt McKinnon
  2017-12-01 17:06         ` Hans van Kranenburg
  1 sibling, 1 reply; 20+ messages in thread
From: Matt McKinnon @ 2017-12-01 16:31 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs

Sorry, I missed your in-line reply:


> 1) The one right above, btrfs_write_out_cache, is the write-out of the
> free space cache v1. Do you see this for multiple seconds going on, and
> does it match the time when it's writing X MB/s to disk?
> 

It seems to only last until the next watch update.

[<ffffffffaa0a8406>] io_schedule+0x16/0x40
[<ffffffffaa3b3cde>] get_request+0x23e/0x720
[<ffffffffaa3b6861>] blk_queue_bio+0xc1/0x3a0
[<ffffffffaa3b4a88>] generic_make_request+0xf8/0x2a0
[<ffffffffaa3b4ca5>] submit_bio+0x75/0x150
[<ffffffffc087fac5>] btrfs_map_bio+0xe5/0x2f0 [btrfs]
[<ffffffffc084834c>] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
[<ffffffffc086f1e3>] submit_one_bio+0x63/0xa0 [btrfs]
[<ffffffffc086f39b>] flush_epd_write_bio+0x3b/0x50 [btrfs]
[<ffffffffc086f3be>] flush_write_bio+0xe/0x10 [btrfs]
[<ffffffffc08777a9>] btree_write_cache_pages+0x379/0x450 [btrfs]
[<ffffffffc08478ed>] btree_writepages+0x5d/0x70 [btrfs]
[<ffffffffaa1a326c>] do_writepages+0x1c/0x70
[<ffffffffaa196f2a>] __filemap_fdatawrite_range+0xaa/0xe0
[<ffffffffaa197023>] filemap_fdatawrite_range+0x13/0x20
[<ffffffffc084fba9>] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
[<ffffffffc084fc4d>] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80 
[btrfs]
[<ffffffffc0851645>] btrfs_commit_transaction+0x665/0x900 [btrfs]
[<ffffffffc084baca>] transaction_kthread+0x18a/0x1c0 [btrfs]
[<ffffffffaa09b839>] kthread+0x109/0x140
[<ffffffffaa8459f5>] ret_from_fork+0x25/0x30

The last three lines will stick around for a while.  Is switching to 
space cache v2 something that everyone should be doing?  Something that 
would be a good test at least?


> 2) How big is this filesystem? What does your `btrfs fi df /mountpoint` say?
> 

# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.08GiB
GlobalReserve, single: total=512.00MiB, used=53.69MiB


> 3) What kind of workload are you running? E.g. how can you describe it
> within a range from "big files which just sit there" to "small writes
> and deletes all over the place all the time"?
> 

It's a pretty light workload most of the time.  It's a file system that 
exports two NFS shares to a small lab group.  I believe it is more small 
reads all over a large file (MRI imaging) rather than small writes.

> 4) What kernel version is this? `uname -a` output?
> 

# uname -a
Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017 
x86_64 x86_64 x86_64 GNU/Linux


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 16:31       ` Matt McKinnon
@ 2017-12-01 17:06         ` Hans van Kranenburg
  2017-12-01 17:13           ` Andrei Borzenkov
                             ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Hans van Kranenburg @ 2017-12-01 17:06 UTC (permalink / raw)
  To: Matt McKinnon, linux-btrfs

On 12/01/2017 05:31 PM, Matt McKinnon wrote:
> Sorry, I missed your in-line reply:
> 
> 
>> 1) The one right above, btrfs_write_out_cache, is the write-out of the
>> free space cache v1. Do you see this for multiple seconds going on, and
>> does it match the time when it's writing X MB/s to disk?
>>
> 
> It seems to only last until the next watch update.
> 
> [<ffffffffaa0a8406>] io_schedule+0x16/0x40
> [<ffffffffaa3b3cde>] get_request+0x23e/0x720
> [<ffffffffaa3b6861>] blk_queue_bio+0xc1/0x3a0
> [<ffffffffaa3b4a88>] generic_make_request+0xf8/0x2a0
> [<ffffffffaa3b4ca5>] submit_bio+0x75/0x150
> [<ffffffffc087fac5>] btrfs_map_bio+0xe5/0x2f0 [btrfs]
> [<ffffffffc084834c>] btree_submit_bio_hook+0x8c/0xe0 [btrfs]
> [<ffffffffc086f1e3>] submit_one_bio+0x63/0xa0 [btrfs]
> [<ffffffffc086f39b>] flush_epd_write_bio+0x3b/0x50 [btrfs]
> [<ffffffffc086f3be>] flush_write_bio+0xe/0x10 [btrfs]
> [<ffffffffc08777a9>] btree_write_cache_pages+0x379/0x450 [btrfs]
> [<ffffffffc08478ed>] btree_writepages+0x5d/0x70 [btrfs]
> [<ffffffffaa1a326c>] do_writepages+0x1c/0x70
> [<ffffffffaa196f2a>] __filemap_fdatawrite_range+0xaa/0xe0
> [<ffffffffaa197023>] filemap_fdatawrite_range+0x13/0x20
> [<ffffffffc084fba9>] btrfs_write_marked_extents+0xe9/0x110 [btrfs]
> [<ffffffffc084fc4d>] btrfs_write_and_wait_transaction.isra.22+0x3d/0x80
> [btrfs]
> [<ffffffffc0851645>] btrfs_commit_transaction+0x665/0x900 [btrfs]
> [<ffffffffc084baca>] transaction_kthread+0x18a/0x1c0 [btrfs]
> [<ffffffffaa09b839>] kthread+0x109/0x140
> [<ffffffffaa8459f5>] ret_from_fork+0x25/0x30
> 
> The last three lines will stick around for a while.  Is switching to
> space cache v2 something that everyone should be doing?  Something that
> would be a good test at least?

Yes. Read on.

>> 2) How big is this filesystem? What does your `btrfs fi df
>> /mountpoint` say?
>>
> 
> # btrfs fi df /export/
> Data, single: total=30.45TiB, used=30.25TiB
> System, DUP: total=32.00MiB, used=3.62MiB
> Metadata, DUP: total=66.50GiB, used=65.08GiB
> GlobalReserve, single: total=512.00MiB, used=53.69MiB

Multi-TiB filesystem, check. total/used ratio looks healthy.

>> 3) What kind of workload are you running? E.g. how can you describe it
>> within a range from "big files which just sit there" to "small writes
>> and deletes all over the place all the time"?
> 
> It's a pretty light workload most of the time.  It's a file system that
> exports two NFS shares to a small lab group.  I believe it is more small
> reads all over a large file (MRI imaging) rather than small writes.

Ok.

>> 4) What kernel version is this? `uname -a` output?
> 
> # uname -a
> Linux machine_name 4.12.8-custom #1 SMP Tue Aug 22 10:15:01 EDT 2017
> x86_64 x86_64 x86_64 GNU/Linux
> 

Yes, I'd recommend switching to space_cache v2, which stores the free
space information in a tree instead of separate blobs, and does not
block the transaction while writing out all info of all touched parts of
the filesystem again.

Here's of course the famous presentation with all kinds of info why:

http://events.linuxfoundation.org/sites/events/files/slides/vault2016_0.pdf

How:

* umount the filesystem
* btrfsck --clear-space-cache v1 /block/device
* do a rw mount with the space_cache=v2 option added (only needed
explicitly once)

During that mount, it will generate the free space tree by reading the
extent tree and writing the inverse of it. This will take some time,
depending on how fast your storage can do random reads with a cold disk
cache.

For x86_64, using the free space cache v2 is fine since linux 4.5. Up to
4.9, there was a bug for big-endian systems. So, with your kernel it's
absolutely fine.

Why isn't this the default yet? It's because btrfs-progs don't have
support to update the free space tree when doing offline modifications
(like check --repair or btrfstune, which you hopefully don't need often
anyway). So, until that's fully added, you need to do an `btrfsck
--clear-space-cache v2`, then do the offline r/w action and then
generate the tree again on next mount.

Additional tips (forgot to ask for your /proc/mounts before):
* Use the noatime mount option, so that only accessing files does not
lead to changes in metadata, which lead to writes, which lead to cowing
and writes in a new place, which lead to updates of the free space
administration etc...

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 17:06         ` Hans van Kranenburg
@ 2017-12-01 17:13           ` Andrei Borzenkov
  2017-12-01 18:04             ` Austin S. Hemmelgarn
  2017-12-01 17:34           ` Matt McKinnon
  2017-12-01 21:47           ` Duncan
  2 siblings, 1 reply; 20+ messages in thread
From: Andrei Borzenkov @ 2017-12-01 17:13 UTC (permalink / raw)
  To: Hans van Kranenburg, Matt McKinnon, linux-btrfs

01.12.2017 20:06, Hans van Kranenburg пишет:
> 
> Additional tips (forgot to ask for your /proc/mounts before):
> * Use the noatime mount option, so that only accessing files does not
> lead to changes in metadata,

Is not 'lazytime" default today? It gives you correct atime + no extra
metadata update cause by update of atime only.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 17:06         ` Hans van Kranenburg
  2017-12-01 17:13           ` Andrei Borzenkov
@ 2017-12-01 17:34           ` Matt McKinnon
  2017-12-01 17:57             ` Holger Hoffstätte
  2017-12-01 21:47           ` Duncan
  2 siblings, 1 reply; 20+ messages in thread
From: Matt McKinnon @ 2017-12-01 17:34 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs

Thanks, I'll give space_cache=v2 a shot.

My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 17:34           ` Matt McKinnon
@ 2017-12-01 17:57             ` Holger Hoffstätte
  2017-12-01 18:24               ` Hans van Kranenburg
  0 siblings, 1 reply; 20+ messages in thread
From: Holger Hoffstätte @ 2017-12-01 17:57 UTC (permalink / raw)
  To: Matt McKinnon, linux-btrfs

On 12/01/17 18:34, Matt McKinnon wrote:
> Thanks, I'll give space_cache=v2 a shot.

Yes, very much recommended.

> My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/

Turn autodefrag off and use noatime instead of relatime.

Your filesystem also seems very full, that's bad with every filesystem but
*especially* with btrfs because the allocator has to work really hard to find
free space for COWing. Really consider deleting stuff or adding more space.

-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 17:13           ` Andrei Borzenkov
@ 2017-12-01 18:04             ` Austin S. Hemmelgarn
  2017-12-02 19:42               ` Andrei Borzenkov
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-12-01 18:04 UTC (permalink / raw)
  To: Andrei Borzenkov, Hans van Kranenburg, Matt McKinnon, linux-btrfs

On 2017-12-01 12:13, Andrei Borzenkov wrote:
> 01.12.2017 20:06, Hans van Kranenburg пишет:
>>
>> Additional tips (forgot to ask for your /proc/mounts before):
>> * Use the noatime mount option, so that only accessing files does not
>> lead to changes in metadata,
> 
> Is not 'lazytime" default today? It gives you correct atime + no extra
> metadata update cause by update of atime only.
Unless things have changed since the last time this came up, BTRFS does 
not support the 'lazytime' mount option (but it doesn't complain about 
it either).

Also, lazytime is independent from noatime, and using both can have 
benefits (lazytime will still have to write out the inode for every file 
read on the system every 24 hours, but with noatime it only has to write 
out the inode for files that have changed).

On top of all that though, you generally shouldn't be trusting atime 
because:
1. Many people run with noatime (or patch their kernels to default to 
noatime instead of relatime), so you can't be certain if the atime is 
accurate at all.
2. It has somewhat non-intuitive semantics when dealing with directories.
3. Even without noatime thrown in, you only get 1 day resolution by 
default (as per the operation of 'relatime').
4. Essentially nothing uses it other than find (which only has one day 
resolution as it's typically used) and older versions of mutt (which use 
it because of lazy programming), which is why issue 1 and 3 are the case.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 17:57             ` Holger Hoffstätte
@ 2017-12-01 18:24               ` Hans van Kranenburg
  2017-12-01 19:07                 ` Matt McKinnon
  0 siblings, 1 reply; 20+ messages in thread
From: Hans van Kranenburg @ 2017-12-01 18:24 UTC (permalink / raw)
  To: Holger Hoffstätte, Matt McKinnon, linux-btrfs

On 12/01/2017 06:57 PM, Holger Hoffstätte wrote:
> On 12/01/17 18:34, Matt McKinnon wrote:
>> Thanks, I'll give space_cache=v2 a shot.
> 
> Yes, very much recommended.
> 
>> My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/
> 
> Turn autodefrag off and use noatime instead of relatime.
> 
> Your filesystem also seems very full,

We don't know. btrfs fi df only displays allocated space. And that being
full is good, it means not too much free space fragments everywhere.

> that's bad with every filesystem but
> *especially* with btrfs because the allocator has to work really hard to find
> free space for COWing. Really consider deleting stuff or adding more space.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 18:24               ` Hans van Kranenburg
@ 2017-12-01 19:07                 ` Matt McKinnon
  2017-12-01 21:03                   ` Chris Murphy
  0 siblings, 1 reply; 20+ messages in thread
From: Matt McKinnon @ 2017-12-01 19:07 UTC (permalink / raw)
  To: Hans van Kranenburg, Holger Hoffstätte, linux-btrfs

Right.  The file system is 48T, with 17T available, so we're not quite 
pushing it yet.

So far so good on the space_cache=v2 mount.  I'm surprised this isn't on 
the gotcha page in the wiki; it may end up making a world of difference 
to the users here

Thanks again,
Matt

On 01/12/17 13:24, Hans van Kranenburg wrote:
> On 12/01/2017 06:57 PM, Holger Hoffstätte wrote:
>> On 12/01/17 18:34, Matt McKinnon wrote:
>>> Thanks, I'll give space_cache=v2 a shot.
>>
>> Yes, very much recommended.
>>
>>> My mount options are: rw,relatime,space_cache,autodefrag,subvolid=5,subvol=/
>>
>> Turn autodefrag off and use noatime instead of relatime.
>>
>> Your filesystem also seems very full,
> 
> We don't know. btrfs fi df only displays allocated space. And that being
> full is good, it means not too much free space fragments everywhere.
> 
>> that's bad with every filesystem but
>> *especially* with btrfs because the allocator has to work really hard to find
>> free space for COWing. Really consider deleting stuff or adding more space.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 19:07                 ` Matt McKinnon
@ 2017-12-01 21:03                   ` Chris Murphy
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Murphy @ 2017-12-01 21:03 UTC (permalink / raw)
  To: Matt McKinnon; +Cc: Hans van Kranenburg, Holger Hoffstätte, Btrfs BTRFS

On Fri, Dec 1, 2017 at 12:07 PM, Matt McKinnon <matt@techsquare.com> wrote:
> Right.  The file system is 48T, with 17T available, so we're not quite
> pushing it yet.
>
> So far so good on the space_cache=v2 mount.  I'm surprised this isn't on the
> gotcha page in the wiki; it may end up making a world of difference to the
> users here
>

I'd change one  thing at a time so you learn what change does/doesn't
resolve the problem. For storage of mostly large files, autodefrag
doesn't seem applicable, but I'd leave it on for now since you've
already made the space cache v2 change.


-- 
Chris Murphy

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 17:06         ` Hans van Kranenburg
  2017-12-01 17:13           ` Andrei Borzenkov
  2017-12-01 17:34           ` Matt McKinnon
@ 2017-12-01 21:47           ` Duncan
  2017-12-01 21:50             ` Matt McKinnon
  2 siblings, 1 reply; 20+ messages in thread
From: Duncan @ 2017-12-01 21:47 UTC (permalink / raw)
  To: linux-btrfs

Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as
excerpted:

> On 12/01/2017 05:31 PM, Matt McKinnon wrote:
>> Sorry, I missed your in-line reply:
>> 
>> 
>>> 2) How big is this filesystem? What does your `btrfs fi df
>>> /mountpoint` say?
>>>
>> 
>> # btrfs fi df /export/
>> Data, single: total=30.45TiB, used=30.25TiB
>> System, DUP: total=32.00MiB, used=3.62MiB
>> Metadata, DUP: total=66.50GiB, used=65.08GiB
>> GlobalReserve, single: total=512.00MiB, used=53.69MiB
> 
> Multi-TiB filesystem, check. total/used ratio looks healthy.

Not so healthy, from here.  Data/metadata are healthy, yes,
but...

Any usage at all of global reserve is a red flag indicating that
something in the filesystem thinks, or thought when it resorted
to global reserve, that space is running out.

Global reserve usage doesn't really hint what the problem is,
but it's definitely a red flag that there /is/ a problem, and
it's easily overlooked, as it apparently was here.

It's likely indication of a bug, possibly one of the ones fixed
right around 4.12/4.13.  I'll let the devs and better experts take
it from there, but I'd certainly be worried until global reserve
drops to zero usage.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 21:47           ` Duncan
@ 2017-12-01 21:50             ` Matt McKinnon
  2017-12-04 12:18               ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Matt McKinnon @ 2017-12-01 21:50 UTC (permalink / raw)
  To: Duncan, linux-btrfs

Well, it's at zero now...

# btrfs fi df /export/
Data, single: total=30.45TiB, used=30.25TiB
System, DUP: total=32.00MiB, used=3.62MiB
Metadata, DUP: total=66.50GiB, used=65.16GiB
GlobalReserve, single: total=512.00MiB, used=0.00B


On 01/12/17 16:47, Duncan wrote:
> Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as
> excerpted:
> 
>> On 12/01/2017 05:31 PM, Matt McKinnon wrote:
>>> Sorry, I missed your in-line reply:
>>>
>>>
>>>> 2) How big is this filesystem? What does your `btrfs fi df
>>>> /mountpoint` say?
>>>>
>>>
>>> # btrfs fi df /export/
>>> Data, single: total=30.45TiB, used=30.25TiB
>>> System, DUP: total=32.00MiB, used=3.62MiB
>>> Metadata, DUP: total=66.50GiB, used=65.08GiB
>>> GlobalReserve, single: total=512.00MiB, used=53.69MiB
>>
>> Multi-TiB filesystem, check. total/used ratio looks healthy.
> 
> Not so healthy, from here.  Data/metadata are healthy, yes,
> but...
> 
> Any usage at all of global reserve is a red flag indicating that
> something in the filesystem thinks, or thought when it resorted
> to global reserve, that space is running out.
> 
> Global reserve usage doesn't really hint what the problem is,
> but it's definitely a red flag that there /is/ a problem, and
> it's easily overlooked, as it apparently was here.
> 
> It's likely indication of a bug, possibly one of the ones fixed
> right around 4.12/4.13.  I'll let the devs and better experts take
> it from there, but I'd certainly be worried until global reserve
> drops to zero usage.
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 18:04             ` Austin S. Hemmelgarn
@ 2017-12-02 19:42               ` Andrei Borzenkov
  0 siblings, 0 replies; 20+ messages in thread
From: Andrei Borzenkov @ 2017-12-02 19:42 UTC (permalink / raw)
  To: Austin S. Hemmelgarn, Hans van Kranenburg, Matt McKinnon, linux-btrfs

01.12.2017 21:04, Austin S. Hemmelgarn пишет:
> On 2017-12-01 12:13, Andrei Borzenkov wrote:
>> 01.12.2017 20:06, Hans van Kranenburg пишет:
>>>
>>> Additional tips (forgot to ask for your /proc/mounts before):
>>> * Use the noatime mount option, so that only accessing files does not
>>> lead to changes in metadata,
>>
>> Is not 'lazytime" default today?

Sorry, it was relatime that is today's default, I mixed them up.

> It gives you correct atime + no extra
>> metadata update cause by update of atime only.
> Unless things have changed since the last time this came up, BTRFS does
> not support the 'lazytime' mount option (but it doesn't complain about
> it either).
> 

Actually since v2.27 "lazytime" is interpreted by mount command itself
and converted into MS_LAZYTIME flag, so should be available for each FS.

bor@10:~> sudo mkfs -t ext4 /dev/sdb1
mke2fs 1.43.7 (16-Oct-2017)
...

bor@10:~> sudo mount -t ext4 -o lazytime /dev/sdb1 /mnt
bor@10:~> tail /proc/self/mountinfo
...
224 66 8:17 / /mnt rw,relatime shared:152 - ext4 /dev/sdb1
rw,lazytime,data=ordered
bor@10:~> sudo umount /dev/sdb1
bor@10:~> sudo mkfs -t btrfs -f /dev/sdb1
btrfs-progs v4.13.3
...

bor@10:~> sudo mount -t btrfs -o lazytime /dev/sdb1 /mnt
bor@10:~> tail /proc/self/mountinfo
...
224 66 0:88 / /mnt rw,relatime shared:152 - btrfs /dev/sdb1
rw,lazytime,space_cache,subvolid=5,subvol=/
bor@10:~>


> Also, lazytime is independent from noatime, and using both can have
> benefits (lazytime will still have to write out the inode for every file
> read on the system every 24 hours, but with noatime it only has to write
> out the inode for files that have changed).
> 

OK, that's true.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-01 21:50             ` Matt McKinnon
@ 2017-12-04 12:18               ` Austin S. Hemmelgarn
  2017-12-04 14:10                 ` Duncan
  0 siblings, 1 reply; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-12-04 12:18 UTC (permalink / raw)
  To: Matt McKinnon, linux-btrfs

On 2017-12-01 16:50, Matt McKinnon wrote:
> Well, it's at zero now...
> 
> # btrfs fi df /export/
> Data, single: total=30.45TiB, used=30.25TiB
> System, DUP: total=32.00MiB, used=3.62MiB
> Metadata, DUP: total=66.50GiB, used=65.16GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
GlobalReserve seems to be used temporarily for certain cases of metadata 
COW regardless of how full the FS actually is, I'm betting that it just 
so happened that such a case was in progress when you got the info 
previously.  If you aren't seeing it regularly used, it's (probably) not 
an issue.

Duncan is correct though when dealing with long-term usage.  If you see 
GlobalReserve usage that persists for an extended period of time, 
something is almost certainly wrong, especially if the FS isn't close to 
being full.
> 
> 
> On 01/12/17 16:47, Duncan wrote:
>> Hans van Kranenburg posted on Fri, 01 Dec 2017 18:06:23 +0100 as
>> excerpted:
>>
>>> On 12/01/2017 05:31 PM, Matt McKinnon wrote:
>>>> Sorry, I missed your in-line reply:
>>>>
>>>>
>>>>> 2) How big is this filesystem? What does your `btrfs fi df
>>>>> /mountpoint` say?
>>>>>
>>>>
>>>> # btrfs fi df /export/
>>>> Data, single: total=30.45TiB, used=30.25TiB
>>>> System, DUP: total=32.00MiB, used=3.62MiB
>>>> Metadata, DUP: total=66.50GiB, used=65.08GiB
>>>> GlobalReserve, single: total=512.00MiB, used=53.69MiB
>>>
>>> Multi-TiB filesystem, check. total/used ratio looks healthy.
>>
>> Not so healthy, from here.  Data/metadata are healthy, yes,
>> but...
>>
>> Any usage at all of global reserve is a red flag indicating that
>> something in the filesystem thinks, or thought when it resorted
>> to global reserve, that space is running out.
>>
>> Global reserve usage doesn't really hint what the problem is,
>> but it's definitely a red flag that there /is/ a problem, and
>> it's easily overlooked, as it apparently was here.
>>
>> It's likely indication of a bug, possibly one of the ones fixed
>> right around 4.12/4.13.  I'll let the devs and better experts take
>> it from there, but I'd certainly be worried until global reserve
>> drops to zero usage.
>>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-04 12:18               ` Austin S. Hemmelgarn
@ 2017-12-04 14:10                 ` Duncan
  2017-12-04 14:30                   ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 20+ messages in thread
From: Duncan @ 2017-12-04 14:10 UTC (permalink / raw)
  To: linux-btrfs

Austin S. Hemmelgarn posted on Mon, 04 Dec 2017 07:18:11 -0500 as
excerpted:

> On 2017-12-01 16:50, Matt McKinnon wrote:
>> Well, it's at zero now...
>> 
>> # btrfs fi df /export/
>> Data, single: total=30.45TiB, used=30.25TiB
>> System, DUP: total=32.00MiB, used=3.62MiB
>> Metadata, DUP: total=66.50GiB, used=65.16GiB
>> GlobalReserve, single: total=512.00MiB, used=0.00B

> GlobalReserve seems to be used temporarily for certain cases of metadata
> COW regardless of how full the FS actually is, I'm betting that it just so
> happened that such a case was in progress when you got the info
> previously.  If you aren't seeing it regularly used, it's (probably) not
> an issue.
> 
> Duncan is correct though when dealing with long-term usage.  If you see
> GlobalReserve usage that persists for an extended period of time,
> something is almost certainly wrong, especially if the FS isn't close to
> being full.

Thanks.  I wasn't aware global reserve was routinely temporarily used.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: btrfs-transacti hammering the system
  2017-12-04 14:10                 ` Duncan
@ 2017-12-04 14:30                   ` Austin S. Hemmelgarn
  0 siblings, 0 replies; 20+ messages in thread
From: Austin S. Hemmelgarn @ 2017-12-04 14:30 UTC (permalink / raw)
  To: linux-btrfs

On 2017-12-04 09:10, Duncan wrote:
> Austin S. Hemmelgarn posted on Mon, 04 Dec 2017 07:18:11 -0500 as
> excerpted:
> 
>> On 2017-12-01 16:50, Matt McKinnon wrote:
>>> Well, it's at zero now...
>>>
>>> # btrfs fi df /export/
>>> Data, single: total=30.45TiB, used=30.25TiB
>>> System, DUP: total=32.00MiB, used=3.62MiB
>>> Metadata, DUP: total=66.50GiB, used=65.16GiB
>>> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
>> GlobalReserve seems to be used temporarily for certain cases of metadata
>> COW regardless of how full the FS actually is, I'm betting that it just so
>> happened that such a case was in progress when you got the info
>> previously.  If you aren't seeing it regularly used, it's (probably) not
>> an issue.
>>
>> Duncan is correct though when dealing with long-term usage.  If you see
>> GlobalReserve usage that persists for an extended period of time,
>> something is almost certainly wrong, especially if the FS isn't close to
>> being full.
> 
> Thanks.  I wasn't aware global reserve was routinely temporarily used.
> 
I don't know that it's 'routinely' used, but I've seen it used 
temporarily during balance and defrag runs, and on rare occasion when 
snapshotting very subvolumes.

I'm pretty certain that all those cases are not 'supposed' to happen, 
they just do as a consequence of how the code is written.

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-12-04 14:30 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-01 14:25 btrfs-transacti hammering the system Matt McKinnon
2017-12-01 14:52 ` Hans van Kranenburg
2017-12-01 15:24   ` Matt McKinnon
2017-12-01 15:39     ` Hans van Kranenburg
2017-12-01 15:42       ` Matt McKinnon
2017-12-01 16:31       ` Matt McKinnon
2017-12-01 17:06         ` Hans van Kranenburg
2017-12-01 17:13           ` Andrei Borzenkov
2017-12-01 18:04             ` Austin S. Hemmelgarn
2017-12-02 19:42               ` Andrei Borzenkov
2017-12-01 17:34           ` Matt McKinnon
2017-12-01 17:57             ` Holger Hoffstätte
2017-12-01 18:24               ` Hans van Kranenburg
2017-12-01 19:07                 ` Matt McKinnon
2017-12-01 21:03                   ` Chris Murphy
2017-12-01 21:47           ` Duncan
2017-12-01 21:50             ` Matt McKinnon
2017-12-04 12:18               ` Austin S. Hemmelgarn
2017-12-04 14:10                 ` Duncan
2017-12-04 14:30                   ` Austin S. Hemmelgarn

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.