All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Martin Raiber <martin@urbackup.org>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Subject: Re: df shows no available space in 5.4.1
Date: Mon, 16 Dec 2019 13:04:46 +0800	[thread overview]
Message-ID: <af8532dc-6b0c-d084-b752-56889ae6e928@gmx.com> (raw)
In-Reply-To: <0102016ef51617a2-339bd846-c076-4a86-a263-f1bdb14de622-000000@eu-west-1.amazonses.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 8832 bytes --]



On 2019/12/11 下午9:11, Martin Raiber wrote:
> On 10.12.2019 02:19 Qu Wenruo wrote:
>>
>> On 2019/12/10 上午8:52, Qu Wenruo wrote:
>>>
>>> On 2019/12/10 上午2:56, Martin Raiber wrote:
>>>> On 07.12.2019 08:28 Qu Wenruo wrote:
>>>>> On 2019/12/7 上午5:26, Martin Raiber wrote:
>>>>>> Hi,
>>>>>>
>>>>>> with kernel 5.4.1 I have the problem that df shows 100% space used. I
>>>>>> can still write to the btrfs volume, but my software looks at the
>>>>>> available space and starts deleting stuff if statfs() says there is a
>>>>>> low amount of available space.
>>>>> If the bug still happens, mind to try the snippet to see why this happened?
>>>>>
>>>>> You will need to:
>>>>> - Apply the patch to your kernel code
>>>>> - Recompile the kernel or btrfs module
>>>>>   So this needs some experience in kernel compile.
>>>>> - Reboot to newly compiled kernel or load the debug btrfs module
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>>>> index 23aa630f04c9..cf34c05b16d7 100644
>>>>> --- a/fs/btrfs/relocation.c
>>>>> +++ b/fs/btrfs/relocation.c
>>>>> @@ -523,7 +523,8 @@ static int should_ignore_root(struct btrfs_root *root)
>>>>>  {
>>>>>         struct btrfs_root *reloc_root;
>>>>>
>>>>> -       if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>>>>> +       if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state) ||
>>>>> +           test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>>>>>                 return 0;
>>>>>
>>>>>         reloc_root = root->reloc_root;
>>>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>>>> index f452a94abdc3..c2b70d97a63b 100644
>>>>> --- a/fs/btrfs/super.c
>>>>> +++ b/fs/btrfs/super.c
>>>>> @@ -2064,6 +2064,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>>>> struct kstatfs *buf)
>>>>>                                         found->disk_used;
>>>>>                 }
>>>>>
>>>>> +               pr_info("%s: found type=0x%llx disk_used=%llu factor=%d\n",
>>>>> +                       __func__, found->flags, found->disk_used, factor);
>>>>>                 total_used += found->disk_used;
>>>>>         }
>>>>>
>>>>> @@ -2071,6 +2073,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>>>> struct kstatfs *buf)
>>>>>
>>>>>         buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super),
>>>>> factor);
>>>>>         buf->f_blocks >>= bits;
>>>>> +       pr_info("%s: super_total_bytes=%llu total_used=%llu
>>>>> factor=%d\n", __func__,
>>>>> +               btrfs_super_total_bytes(disk_super), total_used, factor);
>>>>>         buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >>
>>>>> bits);
>>>>>
>>>>>         /* Account global block reserve as used, it's in logical size
>>>>> already */
>>>>>
>>>> Applied. It's currently 100% used directly after reboot, and I am
>>>> getting this log output:
>>> Thank you a lot for the debug output!
>>>
>>>> [...]
>>>> [  241.245150] btrfs_statfs: super_total_bytes=128835387392
>>>> total_used=93778841600 factor=1
>>>> [  241.904824] btrfs_statfs: found type=0x1 disk_used=93464006656 factor=1
>>>> [  241.904824] btrfs_statfs: found type=0x4 disk_used=314818560 factor=1
>>>> [  241.904824] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
>>>> [  241.904824] btrfs_statfs: super_total_bytes=128835387392
>>>> total_used=93778841600 factor=1
>>> This proves the on-disk numbers are all correct, so far so good.
>>>
>>> The remaining problem is the block_rsv part. Which matches with the new
>>> ticket system introduced in v5.4.
>>>
>>> Mind to test the new debug snippet?
>>>
>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>> index f452a94abdc3..516969534095 100644
>>> --- a/fs/btrfs/super.c
>>> +++ b/fs/btrfs/super.c
>>> @@ -2076,6 +2076,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>> struct kstatfs *buf)
>>>         /* Account global block reserve as used, it's in logical size
>>> already */
>>>         spin_lock(&block_rsv->lock);
>>>         /* Mixed block groups accounting is not byte-accurate, avoid
>>> overflow */
>>> +       pr_info("%s: block_rsv->size=%llu block_rsv->reserved=%llu\n",
>>> __func__,
>>> +               block_rsv->size, block_rsv->reserved);
>>>         if (buf->f_bfree >= block_rsv->size >> bits)
>>>                 buf->f_bfree -= block_rsv->size >> bits;
>>>         else
>>>
>> And this extra snippet for available space.
>>
>> Thanks,
>> Qu
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index f452a94abdc3..f1a3e01a0ef5 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1911,6 +1911,7 @@ static inline int
>> btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
>>          * We aren't under the device list lock, so this is racy-ish,
>> but good
>>          * enough for our purposes.
>>          */
>> +       pr_info("%s: original_free_bytes=%llu\n", __func__, *free_bytes);
>>         nr_devices = fs_info->fs_devices->open_devices;
>>         if (!nr_devices) {
>>                 smp_mb();
>> @@ -2005,6 +2006,7 @@ static inline int
>> btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
>>
>>         kfree(devices_info);
>>         *free_bytes = avail_space;
>> +       pr_info("%s: calculated_bytes=%llu\n", __func__, avail_space);
>>         return 0;
>>  }
>>

Sorry for the date reply, was busy firefighting some bugs.

> Now logs this at 100% used:
> 
> [90273.353449] btrfs_calc_avail_data_space: original_free_bytes=23583420416
> [90273.353449] btrfs_calc_avail_data_space: calculated_bytes=13662945280

This marks the beginning of one statefs call.

> [90273.369508] btrfs_statfs: found type=0x1 disk_used=90233212928 factor=1
> [90273.369536] btrfs_statfs: found type=0x1 disk_used=90233212928 factor=1
> [90273.369536] btrfs_statfs: found type=0x4 disk_used=339361792 factor=1
> [90273.369508] btrfs_statfs: found type=0x4 disk_used=339361792 factor=1
> [90273.369508] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
> [90273.369536] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
> [90273.369508] btrfs_statfs: super_total_bytes=128835387392
> total_used=90572591104 factor=1

So far so good. All SINGLE chunks, total disk bytes are ~120GiB.
While totally used bytes are ~84GiB.

In theory, we should give ~36GiB.

> [90273.369508] btrfs_statfs: block_rsv->size=147554304
> block_rsv->reserved=147554304

block_rsv is tiny, just ~140 MiB, shouldn't cause much difference.

> [90273.369537] btrfs_statfs: super_total_bytes=128835387392
> total_used=90572591104 factor=1

So at this stage, f_bfree should be 74732024 - 288192 blocks.
                                    ^^^^^^^^   ^^^- block_rsv / 512
                                    |- (total_bytes - total_used ) / 512

At least, f_bfree looks OK.

> [90273.369509] btrfs_calc_avail_data_space: original_free_bytes=23583420416
> [90273.369537] btrfs_statfs: block_rsv->size=147554304
> block_rsv->reserved=147554304
> [90273.369537] btrfs_calc_avail_data_space: original_free_bytes=23583420416

Still good, we have around ~21.9GiB unused data space across all
allocated data chunks.
All this ~21.9GiB should contribute to f_bavail.

Although it means you have some fragments, it's not a big deal at all.

> [90273.369509] btrfs_calc_avail_data_space: calculated_bytes=13662945280
> [90273.369537] btrfs_calc_avail_data_space: calculated_bytes=13662945280

And btrfs_calc_avail_data_space() find that we can allocate around
12.7GiB new data chunks.

This 12.7GiB also going to be part of f_bavail.

This means, you should have ~34GiB free space, before we do the
offending check:

	if (!mixed && total_free_meta - thresh < block_rsv->size)
		buf->f_bavail = 0;

This check is pretty old, from 2015, while recently we allow aggressive
metadata over-committing, thus we can have a lot of metadata reserved
space without really allocating new metadata chunks.

I'll try to find out a better calculation to co-operate with metadata
over-committing.

Feel free to remove all debugg snippets, and if you want some dirty
fixes, please try the attached diff.

Thanks,
Qu

> [90273.400227] btrfs_statfs: found type=0x1 disk_used=726834307072 factor=1
> [90273.400227] btrfs_statfs: found type=0x4 disk_used=4908548096 factor=1
> [90273.400227] btrfs_statfs: found type=0x2 disk_used=98304 factor=1
> [90273.400227] btrfs_statfs: super_total_bytes=8133881348096
> total_used=731742953472 factor=1
> [90273.400227] btrfs_statfs: block_rsv->size=536870912
> block_rsv->reserved=536821760
> [90273.400227] btrfs_calc_avail_data_space: original_free_bytes=1171038208
> [90273.400227] btrfs_calc_avail_data_space: calculated_bytes=7400493613056
> 

[-- Attachment #1.1.2: diff --]
[-- Type: text/plain, Size: 1516 bytes --]

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 1b151af25772..b8b67ab05f72 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2032,7 +2032,6 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	unsigned factor = 1;
 	struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
 	int ret;
-	u64 thresh = 0;
 	int mixed = 0;
 
 	rcu_read_lock();
@@ -2085,26 +2084,9 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	if (ret)
 		return ret;
 	buf->f_bavail += div_u64(total_free_data, factor);
+	buf->f_bavail -= block_rsv->size;
 	buf->f_bavail = buf->f_bavail >> bits;
 
-	/*
-	 * We calculate the remaining metadata space minus global reserve. If
-	 * this is (supposedly) smaller than zero, there's no space. But this
-	 * does not hold in practice, the exhausted state happens where's still
-	 * some positive delta. So we apply some guesswork and compare the
-	 * delta to a 4M threshold.  (Practically observed delta was ~2M.)
-	 *
-	 * We probably cannot calculate the exact threshold value because this
-	 * depends on the internal reservations requested by various
-	 * operations, so some operations that consume a few metadata will
-	 * succeed even if the Avail is zero. But this is better than the other
-	 * way around.
-	 */
-	thresh = SZ_4M;
-
-	if (!mixed && total_free_meta - thresh < block_rsv->size)
-		buf->f_bavail = 0;
-
 	buf->f_type = BTRFS_SUPER_MAGIC;
 	buf->f_bsize = dentry->d_sb->s_blocksize;
 	buf->f_namelen = BTRFS_NAME_LEN;

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2019-12-16  5:05 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-12-06 21:26 df shows no available space in 5.4.1 Martin Raiber
2019-12-06 22:35 ` Chris Murphy
2019-12-06 22:51   ` Martin Raiber
2019-12-08 18:12   ` Zygo Blaxell
2019-12-07  7:28 ` Qu Wenruo
2019-12-09 18:56   ` Martin Raiber
2019-12-09 19:26     ` Martin Raiber
2019-12-10  0:52     ` Qu Wenruo
2019-12-10  1:19       ` Qu Wenruo
2019-12-11 13:11         ` Martin Raiber
2019-12-16  5:04           ` Qu Wenruo [this message]
2019-12-13 16:02 ` David Sterba
2019-12-13 20:03 ` Chris Murphy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af8532dc-6b0c-d084-b752-56889ae6e928@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=martin@urbackup.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.