All of lore.kernel.org
 help / color / mirror / Atom feed
* metadata_ratio mount option?
@ 2018-05-07 11:40 Martin Svec
  2018-05-07 14:49 ` Chris Mason
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Svec @ 2018-05-07 11:40 UTC (permalink / raw)
  To: Btrfs BTRFS

Hi,

According to man btrfs [1], I assume that metadata_ratio=1 mount option should
force allocation of one metadata chunk after every allocated data chunk. However,
when I set this option and start filling btrfs with "dd if=/dev/zero of=dummyfile.dat",
only data chunks are allocated but no metadata ones. So, how does the metadata_ratio
option really work?

Note that I'm trying to use this option as a workaround of the bug reported here: 

https://www.spinics.net/lists/linux-btrfs/msg75104.html

i.e. I want to manually preallocate metadata chunks to avoid nightly ENOSPC errors.

Best regards.

Martin


[1] https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs(5)#MOUNT_OPTIONS




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: metadata_ratio mount option?
  2018-05-07 11:40 metadata_ratio mount option? Martin Svec
@ 2018-05-07 14:49 ` Chris Mason
  2018-05-07 16:16   ` Martin Svec
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Mason @ 2018-05-07 14:49 UTC (permalink / raw)
  To: Martin Svec; +Cc: Btrfs BTRFS

On 7 May 2018, at 7:40, Martin Svec wrote:

> Hi,
>
> According to man btrfs [1], I assume that metadata_ratio=1 mount 
> option should
> force allocation of one metadata chunk after every allocated data 
> chunk. However,
> when I set this option and start filling btrfs with "dd if=/dev/zero 
> of=dummyfile.dat",
> only data chunks are allocated but no metadata ones. So, how does the 
> metadata_ratio
> option really work?
>
> Note that I'm trying to use this option as a workaround of the bug 
> reported here:
>

[ urls that FB email server eats, sorry ]

>
> i.e. I want to manually preallocate metadata chunks to avoid nightly 
> ENOSPC errors.


metadata_ratio is almost but not quite what you want.  It sets a flag on 
the space_info to force a chunk allocation the next time we decide to 
call should_alloc_chunk().  Thanks to the overcommit code, we usually 
don't call that until the metadata we think we're going to need is 
bigger than the metadata space available.  In other words, by the time 
we're into the code that honors the force flag, reservations are already 
high enough to make us allocate the chunk anyway.

I tried to use metadata_ratio to experiment with forcing more metadata 
slop space, but really I have to tweak the overcommit code first.
Omar beat me to a better solution, tracking down our transient ENOSPC 
problems here at FB to reservations done for orphans.  Do you have a lot 
of deleted files still being held open?  lsof /mntpoint | grep deleted 
will list them.

We're working through a patch for the orphans here.  You've got a ton of 
bytes pinned, which isn't a great match for the symptoms we see:

[285169.096630] BTRFS info (device sdb): space_info 4 has 
18446744072120172544 free, is not full
[285169.096633] BTRFS info (device sdb): space_info total=273804165120, 
used=269218267136, pinned=3459629056, reserved=52396032, 
may_use=2663120896, readonly=131072

But, your may_use count is high enough that you might be hitting this 
problem.  Otherwise I'll work out a patch to make some more metadata 
chunks while Josef is perfecting his great delayed ref update.

-chris



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: metadata_ratio mount option?
  2018-05-07 14:49 ` Chris Mason
@ 2018-05-07 16:16   ` Martin Svec
  2018-05-07 16:37     ` Chris Mason
  0 siblings, 1 reply; 5+ messages in thread
From: Martin Svec @ 2018-05-07 16:16 UTC (permalink / raw)
  To: Chris Mason; +Cc: Btrfs BTRFS

Hello Chris,

Dne 7.5.2018 v 16:49 Chris Mason napsal(a):
> On 7 May 2018, at 7:40, Martin Svec wrote:
>
>> Hi,
>>
>> According to man btrfs [1], I assume that metadata_ratio=1 mount option should
>> force allocation of one metadata chunk after every allocated data chunk. However,
>> when I set this option and start filling btrfs with "dd if=/dev/zero of=dummyfile.dat",
>> only data chunks are allocated but no metadata ones. So, how does the metadata_ratio
>> option really work?
>>
>> Note that I'm trying to use this option as a workaround of the bug reported here:
>>
>
> [ urls that FB email server eats, sorry ]

It's link to "Btrfs remounted read-only due to ENOSPC in btrfs_run_delayed_refs" thread :)

>
>>
>> i.e. I want to manually preallocate metadata chunks to avoid nightly ENOSPC errors.
>
>
> metadata_ratio is almost but not quite what you want.  It sets a flag on the space_info to force a
> chunk allocation the next time we decide to call should_alloc_chunk().  Thanks to the overcommit
> code, we usually don't call that until the metadata we think we're going to need is bigger than
> the metadata space available.  In other words, by the time we're into the code that honors the
> force flag, reservations are already high enough to make us allocate the chunk anyway.

Yeah, that's how I understood the code. So I think metadata_ratio man section is quite confusing
because it implies that btrfs guarantees given metadata to data chunk space ratio, which isn't true.

>
> I tried to use metadata_ratio to experiment with forcing more metadata slop space, but really I
> have to tweak the overcommit code first.
> Omar beat me to a better solution, tracking down our transient ENOSPC problems here at FB to
> reservations done for orphans.  Do you have a lot of deleted files still being held open?  lsof
> /mntpoint | grep deleted will list them.

I'll take a look during backup window. The initial bug report describes our rsync workload in
detail, for your reference.

>
> We're working through a patch for the orphans here.  You've got a ton of bytes pinned, which isn't
> a great match for the symptoms we see:
>
> [285169.096630] BTRFS info (device sdb): space_info 4 has 18446744072120172544 free, is not full
> [285169.096633] BTRFS info (device sdb): space_info total=273804165120, used=269218267136,
> pinned=3459629056, reserved=52396032, may_use=2663120896, readonly=131072
>
> But, your may_use count is high enough that you might be hitting this problem.  Otherwise I'll
> work out a patch to make some more metadata chunks while Josef is perfecting his great delayed ref
> update.

As mentioned in the bug report, we have a custom patch that dedicates SSDs for metadata chunks and
HDDs for data chunks. So, all we need is to preallocate metadata chunks to occupy all of the SSD
space and our issues will be gone.
Note that btrfs with SSD-backed metadata works absolutely great for rsync backups, even if there're
billions of files and thousands of snapshots. The global reservation ENOSPC is the last issue we're
struggling with.

Thank you

Martin



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: metadata_ratio mount option?
  2018-05-07 16:16   ` Martin Svec
@ 2018-05-07 16:37     ` Chris Mason
  2018-05-08  8:47       ` Btrfs remounted read-only due to ENOSPC in btrfs_run_delayed_refs cont. [Was: Re: metadata_ratio mount option?] Martin Svec
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Mason @ 2018-05-07 16:37 UTC (permalink / raw)
  To: Martin Svec; +Cc: Btrfs BTRFS



On 7 May 2018, at 12:16, Martin Svec wrote:

> Hello Chris,
>
> Dne 7.5.2018 v 16:49 Chris Mason napsal(a):
>> On 7 May 2018, at 7:40, Martin Svec wrote:
>>
>>> Hi,
>>>
>>> According to man btrfs [1], I assume that metadata_ratio=1 mount 
>>> option should
>>> force allocation of one metadata chunk after every allocated data 
>>> chunk. However,
>>> when I set this option and start filling btrfs with "dd if=/dev/zero 
>>> of=dummyfile.dat",
>>> only data chunks are allocated but no metadata ones. So, how does 
>>> the metadata_ratio
>>> option really work?
>>>
>>> Note that I'm trying to use this option as a workaround of the bug 
>>> reported here:
>>>
>>
>> [ urls that FB email server eats, sorry ]
>
> It's link to "Btrfs remounted read-only due to ENOSPC in 
> btrfs_run_delayed_refs" thread :)

Oh yeah, the link worked fine, it just goes through this url defense 
monster that munges it in replies.

>
>>
>>>
>>> i.e. I want to manually preallocate metadata chunks to avoid nightly 
>>> ENOSPC errors.
>>
>>
>> metadata_ratio is almost but not quite what you want.  It sets a 
>> flag on the space_info to force a
>> chunk allocation the next time we decide to call 
>> should_alloc_chunk().  Thanks to the overcommit
>> code, we usually don't call that until the metadata we think we're 
>> going to need is bigger than
>> the metadata space available.  In other words, by the time we're 
>> into the code that honors the
>> force flag, reservations are already high enough to make us allocate 
>> the chunk anyway.
>
> Yeah, that's how I understood the code. So I think metadata_ratio man 
> section is quite confusing
> because it implies that btrfs guarantees given metadata to data chunk 
> space ratio, which isn't true.
>
>>
>> I tried to use metadata_ratio to experiment with forcing more 
>> metadata slop space, but really I
>> have to tweak the overcommit code first.
>> Omar beat me to a better solution, tracking down our transient ENOSPC 
>> problems here at FB to
>> reservations done for orphans.  Do you have a lot of deleted files 
>> still being held open?  lsof
>> /mntpoint | grep deleted will list them.
>
> I'll take a look during backup window. The initial bug report 
> describes our rsync workload in
> detail, for your reference.
>
>>
>> We're working through a patch for the orphans here.  You've got a 
>> ton of bytes pinned, which isn't
>> a great match for the symptoms we see:
>>
>> [285169.096630] BTRFS info (device sdb): space_info 4 has 
>> 18446744072120172544 free, is not full
>> [285169.096633] BTRFS info (device sdb): space_info 
>> total=273804165120, used=269218267136,
>> pinned=3459629056, reserved=52396032, may_use=2663120896, 
>> readonly=131072
>>
>> But, your may_use count is high enough that you might be hitting this 
>> problem.  Otherwise I'll
>> work out a patch to make some more metadata chunks while Josef is 
>> perfecting his great delayed ref
>> update.
>
> As mentioned in the bug report, we have a custom patch that dedicates 
> SSDs for metadata chunks and
> HDDs for data chunks. So, all we need is to preallocate metadata 
> chunks to occupy all of the SSD
> space and our issues will be gone.
> Note that btrfs with SSD-backed metadata works absolutely great for 
> rsync backups, even if there're
> billions of files and thousands of snapshots. The global reservation 
> ENOSPC is the last issue we're
> struggling with.

Great, we'll get this nailed down, thanks!

-chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Btrfs remounted read-only due to ENOSPC in btrfs_run_delayed_refs cont. [Was: Re: metadata_ratio mount option?]
  2018-05-07 16:37     ` Chris Mason
@ 2018-05-08  8:47       ` Martin Svec
  0 siblings, 0 replies; 5+ messages in thread
From: Martin Svec @ 2018-05-08  8:47 UTC (permalink / raw)
  To: Chris Mason; +Cc: Btrfs BTRFS, fdmanana

Hello Chris,

Dne 7.5.2018 v 18:37 Chris Mason napsal(a):
>
>
> On 7 May 2018, at 12:16, Martin Svec wrote:
>
>> Hello Chris,
>>
>> Dne 7.5.2018 v 16:49 Chris Mason napsal(a):
>>> On 7 May 2018, at 7:40, Martin Svec wrote:
>>>
>>>> Hi,
>>>>
>>>> According to man btrfs [1], I assume that metadata_ratio=1 mount option should
>>>> force allocation of one metadata chunk after every allocated data chunk. However,
>>>> when I set this option and start filling btrfs with "dd if=/dev/zero of=dummyfile.dat",
>>>> only data chunks are allocated but no metadata ones. So, how does the metadata_ratio
>>>> option really work?
>>>>
>>>> Note that I'm trying to use this option as a workaround of the bug reported here:
>>>>
>>>
>>> [ urls that FB email server eats, sorry ]
>>
>> It's link to "Btrfs remounted read-only due to ENOSPC in btrfs_run_delayed_refs" thread :)
>
> Oh yeah, the link worked fine, it just goes through this url defense monster that munges it in
> replies.
>
>>
>>>
>>>>
>>>> i.e. I want to manually preallocate metadata chunks to avoid nightly ENOSPC errors.
>>>
>>>
>>> metadata_ratio is almost but not quite what you want.  It sets a flag on the space_info to force a
>>> chunk allocation the next time we decide to call should_alloc_chunk().  Thanks to the overcommit
>>> code, we usually don't call that until the metadata we think we're going to need is bigger than
>>> the metadata space available.  In other words, by the time we're into the code that honors the
>>> force flag, reservations are already high enough to make us allocate the chunk anyway.
>>
>> Yeah, that's how I understood the code. So I think metadata_ratio man section is quite confusing
>> because it implies that btrfs guarantees given metadata to data chunk space ratio, which isn't true.
>>
>>>
>>> I tried to use metadata_ratio to experiment with forcing more metadata slop space, but really I
>>> have to tweak the overcommit code first.
>>> Omar beat me to a better solution, tracking down our transient ENOSPC problems here at FB to
>>> reservations done for orphans.  Do you have a lot of deleted files still being held open?  lsof
>>> /mntpoint | grep deleted will list them.
>>
>> I'll take a look during backup window. The initial bug report describes our rsync workload in
>> detail, for your reference. 

No, there're no trailing deleted files during backup. However, I noticed something interesting in
strace output: rsync does ftruncate() of every transferred file before closing it. In 99.9% cases
the file is truncated to its own size, so it should be a no-op. But these ftruncates are by far the
slowest syscalls according to strace timing and btrfs_truncate() comments itself as "indeed ugly".
Could it be the root cause of global reservations pressure?

I've found this patch from Filipe (Cc'd): https://patchwork.kernel.org/patch/10205013/. Should I
apply it to our 4.14.y kernel and try the impact on intensive rsync workloads?

Thank you
Martin



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-05-08  8:47 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-07 11:40 metadata_ratio mount option? Martin Svec
2018-05-07 14:49 ` Chris Mason
2018-05-07 16:16   ` Martin Svec
2018-05-07 16:37     ` Chris Mason
2018-05-08  8:47       ` Btrfs remounted read-only due to ENOSPC in btrfs_run_delayed_refs cont. [Was: Re: metadata_ratio mount option?] Martin Svec

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.