[PATCH] btrfs: do not zero f_bavail if we have available space

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] btrfs: do not zero f_bavail if we have available space
@ 2020-01-31 14:31 Josef Bacik
  2020-01-31 20:06 ` Martin Steigerwald
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Josef Bacik @ 2020-01-31 14:31 UTC (permalink / raw)
  To: linux-btrfs, kernel-team; +Cc: Martin Steigerwald

There was some logic added a while ago to clear out f_bavail in statfs()
if we did not have enough free metadata space to satisfy our global
reserve.  This was incorrect at the time, however didn't really pose a
problem for normal file systems because we would often allocate chunks
if we got this low on free metadata space, and thus wouldn't really hit
this case unless we were actually full.

Fast forward to today and now we are much better about not allocating
metadata chunks all of the time.  Couple this with d792b0f19711 which
now means we'll easily have a larger global reserve than our free space,
we are now more likely to trip over this while still having plenty of
space.

Fix this by skipping this logic if the global rsv's space_info is not
full.  space_info->full is 0 unless we've attempted to allocate a chunk
for that space_info and that has failed.  If this happens then the space
for the global reserve is definitely sacred and we need to report
b_avail == 0, but before then we can just use our calculated b_avail.

There are other cases where df isn't quite right, and Qu is addressing
them in a more holistic way.  This simply fixes the users that are
currently experiencing pain because of this problem.

Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if metadata are exhausted")
Reported-by: Martin Steigerwald <martin@lichtvoll.de>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/super.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index d421884f0c23..42433ca822aa 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2143,7 +2143,15 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	 */
 	thresh = SZ_4M;

-	if (!mixed && total_free_meta - thresh < block_rsv->size)
+	/*
+	 * We only want to claim there's no available space if we can no longer
+	 * allocate chunks for our metadata profile and our global reserve will
+	 * not fit in the free metadata space.  If we aren't ->full then we
+	 * still can allocate chunks and thus are fine using the currently
+	 * calculated f_bavail.
+	 */
+	if (!mixed && block_rsv->space_info->full &&
+	    total_free_meta - thresh < block_rsv->size)
 		buf->f_bavail = 0;

 	buf->f_type = BTRFS_SUPER_MAGIC;
-- 
2.24.1

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-01-31 14:31 [PATCH] btrfs: do not zero f_bavail if we have available space Josef Bacik
@ 2020-01-31 20:06 ` Martin Steigerwald
  2020-02-01  1:00 ` Qu Wenruo
  2020-02-02 17:52 ` David Sterba
  2 siblings, 0 replies; 26+ messages in thread
From: Martin Steigerwald @ 2020-01-31 20:06 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

Josef Bacik - 31.01.20, 15:31:05 CET:
> There was some logic added a while ago to clear out f_bavail in
> statfs() if we did not have enough free metadata space to satisfy our
> global reserve.  This was incorrect at the time, however didn't
> really pose a problem for normal file systems because we would often
> allocate chunks if we got this low on free metadata space, and thus
> wouldn't really hit this case unless we were actually full.
> 
> Fast forward to today and now we are much better about not allocating
> metadata chunks all of the time.  Couple this with d792b0f19711 which
> now means we'll easily have a larger global reserve than our free
> space, we are now more likely to trip over this while still having
> plenty of space.
> 
> Fix this by skipping this logic if the global rsv's space_info is not
> full.  space_info->full is 0 unless we've attempted to allocate a
> chunk for that space_info and that has failed.  If this happens then
> the space for the global reserve is definitely sacred and we need to
> report b_avail == 0, but before then we can just use our calculated
> b_avail.

Thank you!

The fix works:

merkaba:~> LANG=en df -hT /daten
Filesystem             Type   Size  Used Avail Use% Mounted on
/dev/mapper/sata-daten btrfs  400G  311G   91G  78% /daten

Tested-By: Martin Steigerwald <martin@lichtvoll.de>

Thanks,
Martin

> 
> There are other cases where df isn't quite right, and Qu is addressing
> them in a more holistic way.  This simply fixes the users that are
> currently experiencing pain because of this problem.
> 
> Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if metadata
> are exhausted") Reported-by: Martin Steigerwald <martin@lichtvoll.de>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/super.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index d421884f0c23..42433ca822aa 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2143,7 +2143,15 @@ static int btrfs_statfs(struct dentry *dentry,
> struct kstatfs *buf) */
>  	thresh = SZ_4M;
> 
> -	if (!mixed && total_free_meta - thresh < block_rsv->size)
> +	/*
> +	 * We only want to claim there's no available space if we can no
> longer +	 * allocate chunks for our metadata profile and our 
global
> reserve will +	 * not fit in the free metadata space.  If we aren't
> ->full then we +	 * still can allocate chunks and thus are fine using
> the currently +	 * calculated f_bavail.
> +	 */
> +	if (!mixed && block_rsv->space_info->full &&
> +	    total_free_meta - thresh < block_rsv->size)
>  		buf->f_bavail = 0;
> 
>  	buf->f_type = BTRFS_SUPER_MAGIC;


-- 
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-01-31 14:31 [PATCH] btrfs: do not zero f_bavail if we have available space Josef Bacik
  2020-01-31 20:06 ` Martin Steigerwald
@ 2020-02-01  1:00 ` Qu Wenruo
  2020-02-02 17:52 ` David Sterba
  2 siblings, 0 replies; 26+ messages in thread
From: Qu Wenruo @ 2020-02-01  1:00 UTC (permalink / raw)
  To: Josef Bacik, linux-btrfs, kernel-team; +Cc: Martin Steigerwald


[-- Attachment #1.1: Type: text/plain, Size: 2568 bytes --]



On 2020/1/31 下午10:31, Josef Bacik wrote:
> There was some logic added a while ago to clear out f_bavail in statfs()
> if we did not have enough free metadata space to satisfy our global
> reserve.  This was incorrect at the time, however didn't really pose a
> problem for normal file systems because we would often allocate chunks
> if we got this low on free metadata space, and thus wouldn't really hit
> this case unless we were actually full.
> 
> Fast forward to today and now we are much better about not allocating
> metadata chunks all of the time.  Couple this with d792b0f19711 which
> now means we'll easily have a larger global reserve than our free space,
> we are now more likely to trip over this while still having plenty of
> space.
> 
> Fix this by skipping this logic if the global rsv's space_info is not
> full.  space_info->full is 0 unless we've attempted to allocate a chunk
> for that space_info and that has failed.  If this happens then the space
> for the global reserve is definitely sacred and we need to report
> b_avail == 0, but before then we can just use our calculated b_avail.
> 
> There are other cases where df isn't quite right, and Qu is addressing
> them in a more holistic way.  This simply fixes the users that are
> currently experiencing pain because of this problem.
> 
> Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if metadata are exhausted")
> Reported-by: Martin Steigerwald <martin@lichtvoll.de>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu

> ---
>  fs/btrfs/super.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> index d421884f0c23..42433ca822aa 100644
> --- a/fs/btrfs/super.c
> +++ b/fs/btrfs/super.c
> @@ -2143,7 +2143,15 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
>  	 */
>  	thresh = SZ_4M;
>  
> -	if (!mixed && total_free_meta - thresh < block_rsv->size)
> +	/*
> +	 * We only want to claim there's no available space if we can no longer
> +	 * allocate chunks for our metadata profile and our global reserve will
> +	 * not fit in the free metadata space.  If we aren't ->full then we
> +	 * still can allocate chunks and thus are fine using the currently
> +	 * calculated f_bavail.
> +	 */
> +	if (!mixed && block_rsv->space_info->full &&
> +	    total_free_meta - thresh < block_rsv->size)
>  		buf->f_bavail = 0;
>  
>  	buf->f_type = BTRFS_SUPER_MAGIC;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-01-31 14:31 [PATCH] btrfs: do not zero f_bavail if we have available space Josef Bacik
  2020-01-31 20:06 ` Martin Steigerwald
  2020-02-01  1:00 ` Qu Wenruo
@ 2020-02-02 17:52 ` David Sterba
       [not found]   ` <CAKhhfD7S=kcKLRURdNFZ8H4beS8=XjFvnOQXche7+SVOGFGC_w@mail.gmail.com>
  2 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2020-02-02 17:52 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team, Martin Steigerwald

On Fri, Jan 31, 2020 at 09:31:05AM -0500, Josef Bacik wrote:
> There was some logic added a while ago to clear out f_bavail in statfs()
> if we did not have enough free metadata space to satisfy our global
> reserve.  This was incorrect at the time, however didn't really pose a
> problem for normal file systems because we would often allocate chunks
> if we got this low on free metadata space, and thus wouldn't really hit
> this case unless we were actually full.
> 
> Fast forward to today and now we are much better about not allocating
> metadata chunks all of the time.  Couple this with d792b0f19711 which
> now means we'll easily have a larger global reserve than our free space,
> we are now more likely to trip over this while still having plenty of
> space.
> 
> Fix this by skipping this logic if the global rsv's space_info is not
> full.  space_info->full is 0 unless we've attempted to allocate a chunk
> for that space_info and that has failed.  If this happens then the space
> for the global reserve is definitely sacred and we need to report
> b_avail == 0, but before then we can just use our calculated b_avail.
> 
> There are other cases where df isn't quite right, and Qu is addressing
> them in a more holistic way.  This simply fixes the users that are
> currently experiencing pain because of this problem.
> 
> Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if metadata are exhausted")
> Reported-by: Martin Steigerwald <martin@lichtvoll.de>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to 5.6 queue, thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
       [not found]   ` <CAKhhfD7S=kcKLRURdNFZ8H4beS8=XjFvnOQXche7+SVOGFGC_w@mail.gmail.com>
@ 2020-02-19  9:17     ` Martin Steigerwald
  2020-02-19 13:43       ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Martin Steigerwald @ 2020-02-19  9:17 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: dsterba, Josef Bacik, linux-btrfs, kernel-team

Marc MERLIN - 19.02.20, 01:42:57 CET:
> Has the patch gotten to any 5.5 release too?

Yes, as git log easily reveals.

> On Sun, Feb 2, 2020, 12:53 David Sterba <dsterba@suse.cz> wrote:
> > On Fri, Jan 31, 2020 at 09:31:05AM -0500, Josef Bacik wrote:
> > > There was some logic added a while ago to clear out f_bavail in
> > > statfs() if we did not have enough free metadata space to satisfy
> > > our global reserve.  This was incorrect at the time, however
> > > didn't really pose a problem for normal file systems because we
> > > would often allocate chunks if we got this low on free metadata
> > > space, and thus wouldn't really hit this case unless we were
> > > actually full.
> > > 
> > > Fast forward to today and now we are much better about not
> > > allocating
> > > metadata chunks all of the time.  Couple this with d792b0f19711
> > > which
> > > now means we'll easily have a larger global reserve than our free
> > > space, we are now more likely to trip over this while still
> > > having plenty of space.
> > > 
> > > Fix this by skipping this logic if the global rsv's space_info is
> > > not
> > > full.  space_info->full is 0 unless we've attempted to allocate a
> > > chunk for that space_info and that has failed.  If this happens
> > > then the space for the global reserve is definitely sacred and we
> > > need to report b_avail == 0, but before then we can just use our
> > > calculated b_avail.
> > > 
> > > There are other cases where df isn't quite right, and Qu is
> > > addressing them in a more holistic way.  This simply fixes the
> > > users that are currently experiencing pain because of this
> > > problem.
> > > 
> > > Fixes: ca8a51b3a979 ("btrfs: statfs: report zero available if
> > > metadata> 
> > are exhausted")
> > 
> > > Reported-by: Martin Steigerwald <martin@lichtvoll.de>
> > > Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> > 
> > Added to 5.6 queue, thanks.


-- 
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-19  9:17     ` Martin Steigerwald
@ 2020-02-19 13:43       ` Marc MERLIN
  2020-02-19 14:31         ` David Sterba
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-19 13:43 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: dsterba, Josef Bacik, linux-btrfs, kernel-team

On Wed, Feb 19, 2020 at 10:17:24AM +0100, Martin Steigerwald wrote:
> Marc MERLIN - 19.02.20, 01:42:57 CET:
> > Has the patch gotten to any 5.5 release too?
> 
> Yes, as git log easily reveals.

Sorry if I suck, but right now I only have pre-made kernel releases from
kernel.org.
This bug in 5.4 messed up some of my dm-thin volumes which now took 28% of a dm-thin
14TB pool when the actual data is only using 4GB :( (at the same time it
also shows my FS is full when of course it's not).

I'll likely have to destroy the dm-thin to recover that space (or maybe
not, we'll see), but I'm travelling and don't really have countless time
to allocate to this.
If 5.5.4 is supposed to fix this too, I'll build it, install it and hope
it reclaims my lost dm-thin space, and if not suck up the deletion,
re-creation and backup/restore.

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-19 13:43       ` Marc MERLIN
@ 2020-02-19 14:31         ` David Sterba
  2020-02-19 15:36           ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: David Sterba @ 2020-02-19 14:31 UTC (permalink / raw)
  To: Marc MERLIN; +Cc: Martin Steigerwald, Josef Bacik, linux-btrfs, kernel-team

On Wed, Feb 19, 2020 at 05:43:27AM -0800, Marc MERLIN wrote:
> On Wed, Feb 19, 2020 at 10:17:24AM +0100, Martin Steigerwald wrote:
> > Marc MERLIN - 19.02.20, 01:42:57 CET:
> > > Has the patch gotten to any 5.5 release too?
> > 
> > Yes, as git log easily reveals.
> 
> Sorry if I suck, but right now I only have pre-made kernel releases from
> kernel.org.
> This bug in 5.4 messed up some of my dm-thin volumes which now took 28% of a dm-thin
> 14TB pool when the actual data is only using 4GB :( (at the same time it
> also shows my FS is full when of course it's not).
> 
> I'll likely have to destroy the dm-thin to recover that space (or maybe
> not, we'll see), but I'm travelling and don't really have countless time
> to allocate to this.
> If 5.5.4 is supposed to fix this too, I'll build it, install it and hope
> it reclaims my lost dm-thin space, and if not suck up the deletion,
> re-creation and backup/restore.

The fix got to stable 5.5.2 and 5.4.18. I don't know if dm-thin actually
allows that, but is there a non-destructive way to reclaim the space?
Like using fstrim (the filesystem can tell the underlying storage which
blocks are free). According to
http://man7.org/linux/man-pages/man7/lvmthin.7.html ("Manually manage
free data space of thin pool LV") this should work but I have no
practical experience with that.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-19 14:31         ` David Sterba
@ 2020-02-19 15:36           ` Marc MERLIN
  2020-02-19 17:50             ` Roman Mamedov
  2020-02-20 21:46             ` Marc MERLIN
  0 siblings, 2 replies; 26+ messages in thread
From: Marc MERLIN @ 2020-02-19 15:36 UTC (permalink / raw)
  To: dsterba, Martin Steigerwald, Josef Bacik, linux-btrfs, kernel-team

On Wed, Feb 19, 2020 at 03:31:14PM +0100, David Sterba wrote:
> On Wed, Feb 19, 2020 at 05:43:27AM -0800, Marc MERLIN wrote:
> > On Wed, Feb 19, 2020 at 10:17:24AM +0100, Martin Steigerwald wrote:
> > > Marc MERLIN - 19.02.20, 01:42:57 CET:
> > > > Has the patch gotten to any 5.5 release too?
> > > 
> > > Yes, as git log easily reveals.
> > 
> > Sorry if I suck, but right now I only have pre-made kernel releases from
> > kernel.org.
> > This bug in 5.4 messed up some of my dm-thin volumes which now took 28% of a dm-thin
> > 14TB pool when the actual data is only using 4GB :( (at the same time it
> > also shows my FS is full when of course it's not).
> > 
> > I'll likely have to destroy the dm-thin to recover that space (or maybe
> > not, we'll see), but I'm travelling and don't really have countless time
> > to allocate to this.
> > If 5.5.4 is supposed to fix this too, I'll build it, install it and hope
> > it reclaims my lost dm-thin space, and if not suck up the deletion,
> > re-creation and backup/restore.
> 
> The fix got to stable 5.5.2 and 5.4.18. I don't know if dm-thin actually
> allows that, but is there a non-destructive way to reclaim the space?
> Like using fstrim (the filesystem can tell the underlying storage which
> blocks are free). According to
> http://man7.org/linux/man-pages/man7/lvmthin.7.html ("Manually manage
> free data space of thin pool LV") this should work but I have no
> practical experience with that.

Thanks. For some reason, debian's latest make-kpkg hangs forever on 5.5
kernels (not sure why) so I can't build it right now, but I just got
5.4.20 and I'm compiling that now, thanks.
As for dm-thin, I'm not sure yet, I'll find out when the new kernel is
installed. I was also hoping fstrim would work, I guess I'll find out.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-19 15:36           ` Marc MERLIN
@ 2020-02-19 17:50             ` Roman Mamedov
  2020-02-19 22:21               ` Martin Steigerwald
  2020-02-20 21:46             ` Marc MERLIN
  1 sibling, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2020-02-19 17:50 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: dsterba, Martin Steigerwald, Josef Bacik, linux-btrfs, kernel-team

On Wed, 19 Feb 2020 07:36:52 -0800
Marc MERLIN <marc@merlins.org> wrote:

> Thanks. For some reason, debian's latest make-kpkg hangs forever on 5.5
> kernels (not sure why) so I can't build it right now, but I just got
> 5.4.20 and I'm compiling that now, thanks.

Debian deprecates their own tooling for regular users (as opposed to package
mantainers) to easily make custom kernel deb packages[1], they now suggest to
use the kernel-provided "make bindeb-pkg" instead. 

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925411#17

> As for dm-thin, I'm not sure yet, I'll find out when the new kernel is
> installed. I was also hoping fstrim would work, I guess I'll find out.

Indeed fstrim does deprovision backing storage on thin LVM just fine.

However I am not sure what is the relation with the bug being discussed. The
"zero f_bavail" was just returning "0" to df, while that was not actually
true. Seems puzzling how would that lead to increased usage of dm-thin for you.

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-19 17:50             ` Roman Mamedov
@ 2020-02-19 22:21               ` Martin Steigerwald
  0 siblings, 0 replies; 26+ messages in thread
From: Martin Steigerwald @ 2020-02-19 22:21 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: Marc MERLIN, dsterba, Josef Bacik, linux-btrfs, kernel-team

Roman Mamedov - 19.02.20, 18:50:51 CET:
> On Wed, 19 Feb 2020 07:36:52 -0800
> 
> Marc MERLIN <marc@merlins.org> wrote:
> > Thanks. For some reason, debian's latest make-kpkg hangs forever on
> > 5.5 kernels (not sure why) so I can't build it right now, but I
> > just got 5.4.20 and I'm compiling that now, thanks.
> 
> Debian deprecates their own tooling for regular users (as opposed to
> package mantainers) to easily make custom kernel deb packages[1],
> they now suggest to use the kernel-provided "make bindeb-pkg"
> instead.

Yes. I use

eatmydata make -j4 bindeb-pkg LOCALVERSION=-tp520

to build my kernels meanwhile (still that good old ThinkPad T520 with 
the keyboard before Lenovo made it worse).

You also get a linux-headers package which I believe you need to install 
in order to have dkms build 3rd party modules if you have any.

> [1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=925411#17

Best,
-- 
Martin



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-19 15:36           ` Marc MERLIN
  2020-02-19 17:50             ` Roman Mamedov
@ 2020-02-20 21:46             ` Marc MERLIN
  2020-02-21  5:38               ` Marc MERLIN
  1 sibling, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-20 21:46 UTC (permalink / raw)
  To: dsterba, Martin Steigerwald, Josef Bacik, linux-btrfs,
	kernel-team, Roman Mamedov

Well, turns out this was a more serious bug than we thought.
With dm-thin overcommit, it causes this:
[1324107.675334] BTRFS info (device dm-13): forced readonly
[1324107.692909] BTRFS warning (device dm-13): Skipping commit of aborted transaction.
[1324107.717141] BTRFS: error (device dm-13) in cleanup_transaction:1828: errno=-5 IO failure
[1324107.743298] BTRFS info (device dm-13): delayed_refs has NO entry
[1324107.817671] device-mapper: thin: 252:9: switching pool to write mode
[1324108.662095] BTRFS error (device dm-13): bad tree block start, want 9050645626880 have 0
[1324108.694286] BTRFS error (device dm-13): bad tree block start, want 9050645626880 have 0

In other words, this broke my filesystem. I didn't try to see if it's damaged or just read-only,
but obviously, this isn't good.

New kernel should stop this from happening hopefully.

VG/dm details if you care:
  VG Name               vgds2
  System ID             
  Format                lvm2
  Metadata Areas        1
  Metadata Sequence No  88
  VG Access             read/write
  VG Status             resizable
  MAX LV                0
  Cur LV                8
  Open LV               7
  Max PV                0
  Cur PV                1
  Act PV                1
  VG Size               14.55 TiB
  PE Size               4.00 MiB
  Total PE              3815316
  Alloc PE / Size       3801146 / 14.50 TiB
  Free  PE / Size       14170 / 55.35 GiB
  VG UUID               pc1cTH-kFo7-g0Kz-dELp-j51s-1yOO-v20WIV
   
  --- Logical volume ---
  LV Name                thinpool2
  VG Name                vgds2
  LV UUID                rxJCsT-ImNv-ibvM-zOS0-Xzqv-O8AU-1STUH9
  LV Write Access        read/write
  LV Creation host, time gargamel.svh.merlins.org, 2018-07-26 08:42:51 -0700
  LV Pool metadata       thinpool2_tmeta
  LV Pool data           thinpool2_tdata
  LV Status              available
  # open                 8
  LV Size                14.50 TiB
  Allocated pool data    99.99%
  Allocated metadata     59.88%
  Current LE             3801088
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           252:9

  LV Path                /dev/vgds2/ubuntu
  LV Name                ubuntu
  VG Name                vgds2
  LV UUID                y42AA8-5zfq-Vbmr-TNo9-g7rn-UbGf-KOnFrf
  LV Write Access        read/write
  LV Creation host, time gargamel.svh.merlins.org, 2018-07-26 23:22:18 -0700
  LV Pool name           thinpool2
  LV Status              available
  # open                 1
  LV Size                14.00 TiB
  Mapped size            60.26%
  Current LE             3670016
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           252:13


-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-20 21:46             ` Marc MERLIN
@ 2020-02-21  5:38               ` Marc MERLIN
  2020-02-21  5:45                 ` Roman Mamedov
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-21  5:38 UTC (permalink / raw)
  To: dsterba, Martin Steigerwald, Josef Bacik, linux-btrfs,
	kernel-team, Roman Mamedov

On Thu, Feb 20, 2020 at 01:46:49PM -0800, Marc MERLIN wrote:
> Well, turns out this was a more serious bug than we thought.
> With dm-thin overcommit, it causes this:
> [1324107.675334] BTRFS info (device dm-13): forced readonly
> [1324107.692909] BTRFS warning (device dm-13): Skipping commit of aborted transaction.
> [1324107.717141] BTRFS: error (device dm-13) in cleanup_transaction:1828: errno=-5 IO failure
> [1324107.743298] BTRFS info (device dm-13): delayed_refs has NO entry
> [1324107.817671] device-mapper: thin: 252:9: switching pool to write mode
> [1324108.662095] BTRFS error (device dm-13): bad tree block start, want 9050645626880 have 0
> [1324108.694286] BTRFS error (device dm-13): bad tree block start, want 9050645626880 have 0

I had a closer look, and even with 5.4.20, my whole lv is full now:
  LV Name                thinpool2
  Allocated pool data    99.99%
  Allocated metadata     59.88%

Sure enough, that broken ubuntu one (that really only needs 4GB or so),
is now taking 60% of the mapped size (i.e. everything that was left)
  LV Name                ubuntu
  Mapped size            60.26%

I'm now running this overnight, but any command on that filesystem, just
hangs for now:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# fstrim -v .

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] btrfs: do not zero f_bavail if we have available space
  2020-02-21  5:38               ` Marc MERLIN
@ 2020-02-21  5:45                 ` Roman Mamedov
  2020-02-21 23:07                   ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Roman Mamedov @ 2020-02-21  5:45 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: dsterba, Martin Steigerwald, Josef Bacik, linux-btrfs, kernel-team

On Thu, 20 Feb 2020 21:38:04 -0800
Marc MERLIN <marc@merlins.org> wrote:

> I had a closer look, and even with 5.4.20, my whole lv is full now:
>   LV Name                thinpool2
>   Allocated pool data    99.99%
>   Allocated metadata     59.88%

Oversubscribing thin storage should be done carefully and only with a very
good reason, and when you run out of something you didn't have in the first
place, seems hard to blame Btrfs or anyone else for it.

> Sure enough, that broken ubuntu one (that really only needs 4GB or so),
> is now taking 60% of the mapped size (i.e. everything that was left)
>   LV Name                ubuntu
>   Mapped size            60.26%

Provide full output of lvdisplay -m, not snippets of it. As is, "omg 60%"
tells nothing to anyone, who knows maybe this LV is maybe 6 GB in size, and at
60% used, it comes out to 4GB exactly.

> I'm now running this overnight, but any command on that filesystem, just
> hangs for now:
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# fstrim -v .

At this point "Data%" in `lvs` output should be decreasing steadily. (if not,
check your dmesg for some kind of a hang or deadlock).

-- 
With respect,
Roman

^ permalink raw reply	[flat|nested] 26+ messages in thread

* btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-21  5:45                 ` Roman Mamedov
@ 2020-02-21 23:07                   ` Marc MERLIN
  2020-02-21 23:17                     ` How to roll back btrfs filesystem a few revisions? Marc MERLIN
  2020-02-21 23:43                     ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Josef Bacik
  0 siblings, 2 replies; 26+ messages in thread
From: Marc MERLIN @ 2020-02-21 23:07 UTC (permalink / raw)
  To: Roman Mamedov
  Cc: dsterba, Martin Steigerwald, Josef Bacik, linux-btrfs, kernel-team

Ok, first I'll update the subject line

On Fri, Feb 21, 2020 at 10:45:45AM +0500, Roman Mamedov wrote:
> On Thu, 20 Feb 2020 21:38:04 -0800
> Marc MERLIN <marc@merlins.org> wrote:
> 
> > I had a closer look, and even with 5.4.20, my whole lv is full now:
> >   LV Name                thinpool2
> >   Allocated pool data    99.99%
> >   Allocated metadata     59.88%
> 
> Oversubscribing thin storage should be done carefully and only with a very
> good reason, and when you run out of something you didn't have in the first
> place, seems hard to blame Btrfs or anyone else for it.

let's rewind.
It's a backup server, I used to have everything in a single 14TB
filesystem, I had too many snapshots, and was told to break it up in
smaller filesystems to work around btrfs' inability to scale properly
past a hundred snapshots or so (and that many snapshots blowing up both
kinds of btrfs check --repair, one of them forced me to buy 16GB of RAM
to max out my server until it still ran out of RAM and now I can't add
any).

I'm obviously not going to the olden days of making actual partitions
and guessing wrong every time how big each partition should be, so my
only solution left was to use dm-thin and subscribe the entire space to
all LVs.
I then have a cronjob that warns me if I start running low on in the
global VG pool.

Now, where it got confusing is that around the time I put the 5.4 with
the df problem, is the same time df filled up to 100% and started
mailing me. I ignored it because I knew about the bug.
However, I just found out that my LV actually filled up due to another
bug that was actually my fault.

Now, I triggered some real bugs in btrfs, see:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
        Total devices 1 FS bytes used 445.73GiB
        devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu

Ok, I'm using 445GB, but losing 8.4TB, sigh.
  LV Path                /dev/vgds2/ubuntu
  LV Name                ubuntu
  LV Pool name           thinpool2
  LV Size                14.00 TiB
  Mapped size            60.25%  <= this is all the space free in my VG, so it's full now

We talked about fstrim, let's try that:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# fstrim -v .
.: 5.6 TiB (6116423237632 bytes) trimmed

Oh, great. Except this freed up nothing in LVM.

gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .  
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
ERROR: error during balancing '.': Read-only file system

Ok, right, need to unmount/remount to clear read-only;
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .  
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 8624 chunks
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -dusage=0 -v .  
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 8624 chunks
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
        Total devices 1 FS bytes used 8.42TiB
        devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu

Well, carap, see how 'used' went from 445.73GiB to 8.42TiB after balance?

I ran du to make sure my data is indeed only using 445GB.

So now, I'm pretty much hosed, the fielsystem seems to have been damaged in interesting ways.

I'll wait until tomorrow in case someone wants something from it, and I'll delete the entire
LV and start over.

And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
[64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
[64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
[64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
[64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 26+ messages in thread

* How to roll back btrfs filesystem a few revisions?
  2020-02-21 23:07                   ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Marc MERLIN
@ 2020-02-21 23:17                     ` Marc MERLIN
  2020-02-21 23:47                       ` Josef Bacik
  2020-02-21 23:43                     ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Josef Bacik
  1 sibling, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-21 23:17 UTC (permalink / raw)
  To: Roman Mamedov
  Cc: dsterba, Martin Steigerwald, Josef Bacik, linux-btrfs, kernel-team

On Fri, Feb 21, 2020 at 03:07:40PM -0800, Marc MERLIN wrote:
> And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
> [64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
> [64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
> [64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
> [64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001

While I'm going to destroy and recreate one of the two filesystems, the
other one has lots of of btrfs relationships I really don't want to lose
and have to re-create.

I'm sure it got in a bad state because it got write denied when trying
to write.
I don't care about last data written, is there a clean way to open the
filesystem and revert it a few revisions?
Basically I want
git reset --hard HEAD^ or HEAD^^

I'm ok with data loss, I just want to get back to a previous good known
consistent state. If I've not disabled COW (which I have not), this
should be possible, correct?

If so, how to I proceed?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-21 23:07                   ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Marc MERLIN
  2020-02-21 23:17                     ` How to roll back btrfs filesystem a few revisions? Marc MERLIN
@ 2020-02-21 23:43                     ` Josef Bacik
  2020-02-22  0:01                       ` Marc MERLIN
  1 sibling, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-02-21 23:43 UTC (permalink / raw)
  To: Marc MERLIN, Roman Mamedov
  Cc: dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On 2/21/20 6:07 PM, Marc MERLIN wrote:
> Ok, first I'll update the subject line
> 
> On Fri, Feb 21, 2020 at 10:45:45AM +0500, Roman Mamedov wrote:
>> On Thu, 20 Feb 2020 21:38:04 -0800
>> Marc MERLIN <marc@merlins.org> wrote:
>>
>>> I had a closer look, and even with 5.4.20, my whole lv is full now:
>>>    LV Name                thinpool2
>>>    Allocated pool data    99.99%
>>>    Allocated metadata     59.88%
>>
>> Oversubscribing thin storage should be done carefully and only with a very
>> good reason, and when you run out of something you didn't have in the first
>> place, seems hard to blame Btrfs or anyone else for it.
> 
> let's rewind.
> It's a backup server, I used to have everything in a single 14TB
> filesystem, I had too many snapshots, and was told to break it up in
> smaller filesystems to work around btrfs' inability to scale properly
> past a hundred snapshots or so (and that many snapshots blowing up both
> kinds of btrfs check --repair, one of them forced me to buy 16GB of RAM
> to max out my server until it still ran out of RAM and now I can't add
> any).
> 
> I'm obviously not going to the olden days of making actual partitions
> and guessing wrong every time how big each partition should be, so my
> only solution left was to use dm-thin and subscribe the entire space to
> all LVs.
> I then have a cronjob that warns me if I start running low on in the
> global VG pool.
> 
> Now, where it got confusing is that around the time I put the 5.4 with
> the df problem, is the same time df filled up to 100% and started
> mailing me. I ignored it because I knew about the bug.
> However, I just found out that my LV actually filled up due to another
> bug that was actually my fault.
> 
> Now, I triggered some real bugs in btrfs, see:
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
> Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
>          Total devices 1 FS bytes used 445.73GiB
>          devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu
> 
> Ok, I'm using 445GB, but losing 8.4TB, sigh.
>    LV Path                /dev/vgds2/ubuntu
>    LV Name                ubuntu
>    LV Pool name           thinpool2
>    LV Size                14.00 TiB
>    Mapped size            60.25%  <= this is all the space free in my VG, so it's full now
> 
> We talked about fstrim, let's try that:
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# fstrim -v .
> .: 5.6 TiB (6116423237632 bytes) trimmed
> 
> Oh, great. Except this freed up nothing in LVM.
> 
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .
> Dumping filters: flags 0x6, state 0x0, force is off
>    METADATA (flags 0x2): balancing, usage=0
>    SYSTEM (flags 0x2): balancing, usage=0
> ERROR: error during balancing '.': Read-only file system
> 
> Ok, right, need to unmount/remount to clear read-only;
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .
> Dumping filters: flags 0x6, state 0x0, force is off
>    METADATA (flags 0x2): balancing, usage=0
>    SYSTEM (flags 0x2): balancing, usage=0
> Done, had to relocate 0 out of 8624 chunks
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -dusage=0 -v .
> Dumping filters: flags 0x1, state 0x0, force is off
>    DATA (flags 0x2): balancing, usage=0
> Done, had to relocate 0 out of 8624 chunks
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
> Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
>          Total devices 1 FS bytes used 8.42TiB
>          devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu
> 
> 
> Well, carap, see how 'used' went from 445.73GiB to 8.42TiB after balance?

Wtf?  Can you do btrfs filesystem usage on that fs?  I'd like to see the 
breakdown.  I'm super confused about what's happening there.

> 
> I ran du to make sure my data is indeed only using 445GB.
> 
> So now, I'm pretty much hosed, the fielsystem seems to have been damaged in interesting ways.
> 
> I'll wait until tomorrow in case someone wants something from it, and I'll delete the entire
> LV and start over.
> 
> And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
> [64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
> [64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
> [64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
> [64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
> 

This will happen if the transaction aborts, does it still happen after you 
unmount and remount?  Thanks,

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to roll back btrfs filesystem a few revisions?
  2020-02-21 23:17                     ` How to roll back btrfs filesystem a few revisions? Marc MERLIN
@ 2020-02-21 23:47                       ` Josef Bacik
  2020-02-22  0:08                         ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-02-21 23:47 UTC (permalink / raw)
  To: Marc MERLIN, Roman Mamedov
  Cc: dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On 2/21/20 6:17 PM, Marc MERLIN wrote:
> On Fri, Feb 21, 2020 at 03:07:40PM -0800, Marc MERLIN wrote:
>> And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
>> [64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
>> [64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
>> [64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
>> [64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
> 
> While I'm going to destroy and recreate one of the two filesystems, the
> other one has lots of of btrfs relationships I really don't want to lose
> and have to re-create.
> 
> I'm sure it got in a bad state because it got write denied when trying
> to write.
> I don't care about last data written, is there a clean way to open the
> filesystem and revert it a few revisions?
> Basically I want
> git reset --hard HEAD^ or HEAD^^
> 
> I'm ok with data loss, I just want to get back to a previous good known
> consistent state. If I've not disabled COW (which I have not), this
> should be possible, correct?
> 
> If so, how to I proceed?]

Yeah you can try the backup roots, btrfs check -b and see if that works out? 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-21 23:43                     ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Josef Bacik
@ 2020-02-22  0:01                       ` Marc MERLIN
  2020-02-22  0:43                         ` Josef Bacik
  2020-02-22  1:06                         ` Marc MERLIN
  0 siblings, 2 replies; 26+ messages in thread
From: Marc MERLIN @ 2020-02-22  0:01 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On Fri, Feb 21, 2020 at 06:43:45PM -0500, Josef Bacik wrote:
> > Well, carap, see how 'used' went from 445.73GiB to 8.42TiB after balance?
> 
> Wtf?  Can you do btrfs filesystem usage on that fs?  I'd like to see the
> breakdown.  I'm super confused about what's happening there.

You and me both :)
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi df .
Data, single: total=8.40TiB, used=8.40TiB
System, DUP: total=8.00MiB, used=912.00KiB
Metadata, DUP: total=17.00GiB, used=16.33GiB
GlobalReserve, single: total=512.00MiB, used=0.00B

Looks like used is back to 8.4TB there too.



> > And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
> > [64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
> > [64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
> > [64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
> > [64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
> 
> This will happen if the transaction aborts, does it still happen after you
> unmount and remount?  Thanks,

the problematic filesystem mounts fine, but that doesn't mean it's
clean. 
the one that I'd like very much not to be damaged, I'm not touching it
until I can get my VG back to having it's 50% of free space it needs to
have, with 99.9x%, it's not safe to use anything on it.
But thanks for the heads up that my other filesystem may be ok. I'll run
a btrfs check on it later when it's safe.

Back to dm-13, it's now hung on umount, I'm getting a string of these:
[67980.657803] BTRFS info (device dm-13): the free space cache file (4344624709632) is invalid, skip it
[67991.562812] BTRFS info (device dm-13): the free space cache file (4447703924736) is invalid, skip it
[67991.755262] BTRFS info (device dm-13): the free space cache file (4448777666560) is invalid, skip it
[68000.379059] BTRFS info (device dm-13): the free space cache file (4518570885120) is invalid, skip it
[68013.462077] BTRFS info (device dm-13): the free space cache file (4574405459968) is invalid, skip it
[68015.286730] BTRFS info (device dm-13): the free space cache file (4589437845504) is invalid, skip it
[68015.318239] BTRFS info (device dm-13): the free space cache file (4589437845504) is invalid, skip it
[68016.212246] BTRFS info (device dm-13): the free space cache file (4596954038272) is invalid, skip it
[68016.730826] BTRFS info (device dm-13): the free space cache file (4602322747392) is invalid, skip it
[68020.547135] BTRFS info (device dm-13): the free space cache file (4634535002112) is invalid, skip it
[68021.812820] BTRFS info (device dm-13): the free space cache file (4646346162176) is invalid, skip it
[68037.173441] BTRFS info (device dm-13): the free space cache file (4768752730112) is invalid, skip it
[68039.559383] BTRFS info (device dm-13): the free space cache file (4778416406528) is invalid, skip it
[68040.531083] BTRFS info (device dm-13): the free space cache file (4781637632000) is invalid, skip it
[68050.184300] BTRFS info (device dm-13): the free space cache file (4843914657792) is invalid, skip it
[68074.134080] BTRFS info (device dm-13): the free space cache file (4988869804032) is invalid, skip it
[68078.943126] BTRFS info (device dm-13): the free space cache file (5015713349632) is invalid, skip it
[68099.512978] BTRFS info (device dm-13): the free space cache file (5151004819456) is invalid, skip it
[68100.575692] BTRFS info (device dm-13): the free space cache file (5160668495872) is invalid, skip it
[68100.689222] BTRFS info (device dm-13): the free space cache file (5161742237696) is invalid, skip it

I knew that filling up a btrfs filesystem was bad, but filling it the
normal way makes it slow down enough that you usually know and fix it.
Filling it by having an underlying dm-thin deny writes, is much worse (I expected
it wouldn't be pretty though, which is why I had a cronjob to catch this before it
happened, but I missed it due to the df bug).

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to roll back btrfs filesystem a few revisions?
  2020-02-21 23:47                       ` Josef Bacik
@ 2020-02-22  0:08                         ` Marc MERLIN
  2020-02-22  0:36                           ` Josef Bacik
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-22  0:08 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On Fri, Feb 21, 2020 at 06:47:26PM -0500, Josef Bacik wrote:
> Yeah you can try the backup roots, btrfs check -b and see if that works out?

So, I'm not super clear on how to do this.
the backup roots are not really a way to go back in time, they're just
the same data that maybe didn't get written, so you can maybe go to the
last revision if all the roots are not up to date, correct?

If so, is it best to get the last root since it's the one most likely to
be the oldest?

More generally do I do a check -b to see if that looks clean, and if so,
what's the command to replicate that root onto all the other roots?

Thanks,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: How to roll back btrfs filesystem a few revisions?
  2020-02-22  0:08                         ` Marc MERLIN
@ 2020-02-22  0:36                           ` Josef Bacik
  0 siblings, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2020-02-22  0:36 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On 2/21/20 7:08 PM, Marc MERLIN wrote:
> On Fri, Feb 21, 2020 at 06:47:26PM -0500, Josef Bacik wrote:
>> Yeah you can try the backup roots, btrfs check -b and see if that works out?
> 
> So, I'm not super clear on how to do this.
> the backup roots are not really a way to go back in time, they're just
> the same data that maybe didn't get written, so you can maybe go to the
> last revision if all the roots are not up to date, correct?
> 

No they're the previous transaction id's, so it's like going back in time, just 
in 30 second jumps.

> If so, is it best to get the last root since it's the one most likely to
> be the oldest?
> 

Well it looks to go and find the best one out of the group.

> More generally do I do a check -b to see if that looks clean, and if so,
> what's the command to replicate that root onto all the other roots?
>

If you do -b and check finishes fine, you can do --repair and it'll reset the 
super to point at the backup root, and theoretically you should be good to go? 
Thanks,

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-22  0:01                       ` Marc MERLIN
@ 2020-02-22  0:43                         ` Josef Bacik
  2020-02-22  1:06                         ` Marc MERLIN
  1 sibling, 0 replies; 26+ messages in thread
From: Josef Bacik @ 2020-02-22  0:43 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On 2/21/20 7:01 PM, Marc MERLIN wrote:
> On Fri, Feb 21, 2020 at 06:43:45PM -0500, Josef Bacik wrote:
>>> Well, carap, see how 'used' went from 445.73GiB to 8.42TiB after balance?
>>
>> Wtf?  Can you do btrfs filesystem usage on that fs?  I'd like to see the
>> breakdown.  I'm super confused about what's happening there.
> 
> You and me both :)
> gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi df .
> Data, single: total=8.40TiB, used=8.40TiB
> System, DUP: total=8.00MiB, used=912.00KiB
> Metadata, DUP: total=17.00GiB, used=16.33GiB
> GlobalReserve, single: total=512.00MiB, used=0.00B
> 
> Looks like used is back to 8.4TB there too.

Man this is bizarre, does fsck say anything useful?  I wonder if the block 
groups are messed up and saying the wrong value for used.  You said du shows 
only ~400gib of space actually used right?  I'm curious to see what fsck says. 
If it comes back clean I'll write something up to go and figure out where the 
space is.

> 
> 
> 
>>> And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
>>> [64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
>>> [64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
>>> [64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
>>> [64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
>>
>> This will happen if the transaction aborts, does it still happen after you
>> unmount and remount?  Thanks,
> 
> the problematic filesystem mounts fine, but that doesn't mean it's
> clean.
> the one that I'd like very much not to be damaged, I'm not touching it
> until I can get my VG back to having it's 50% of free space it needs to
> have, with 99.9x%, it's not safe to use anything on it.
> But thanks for the heads up that my other filesystem may be ok. I'll run
> a btrfs check on it later when it's safe.
> 
> Back to dm-13, it's now hung on umount, I'm getting a string of these:
> [67980.657803] BTRFS info (device dm-13): the free space cache file (4344624709632) is invalid, skip it
> [67991.562812] BTRFS info (device dm-13): the free space cache file (4447703924736) is invalid, skip it
> [67991.755262] BTRFS info (device dm-13): the free space cache file (4448777666560) is invalid, skip it
> [68000.379059] BTRFS info (device dm-13): the free space cache file (4518570885120) is invalid, skip it
> [68013.462077] BTRFS info (device dm-13): the free space cache file (4574405459968) is invalid, skip it
> [68015.286730] BTRFS info (device dm-13): the free space cache file (4589437845504) is invalid, skip it
> [68015.318239] BTRFS info (device dm-13): the free space cache file (4589437845504) is invalid, skip it
> [68016.212246] BTRFS info (device dm-13): the free space cache file (4596954038272) is invalid, skip it
> [68016.730826] BTRFS info (device dm-13): the free space cache file (4602322747392) is invalid, skip it
> [68020.547135] BTRFS info (device dm-13): the free space cache file (4634535002112) is invalid, skip it
> [68021.812820] BTRFS info (device dm-13): the free space cache file (4646346162176) is invalid, skip it
> [68037.173441] BTRFS info (device dm-13): the free space cache file (4768752730112) is invalid, skip it
> [68039.559383] BTRFS info (device dm-13): the free space cache file (4778416406528) is invalid, skip it
> [68040.531083] BTRFS info (device dm-13): the free space cache file (4781637632000) is invalid, skip it
> [68050.184300] BTRFS info (device dm-13): the free space cache file (4843914657792) is invalid, skip it
> [68074.134080] BTRFS info (device dm-13): the free space cache file (4988869804032) is invalid, skip it
> [68078.943126] BTRFS info (device dm-13): the free space cache file (5015713349632) is invalid, skip it
> [68099.512978] BTRFS info (device dm-13): the free space cache file (5151004819456) is invalid, skip it
> [68100.575692] BTRFS info (device dm-13): the free space cache file (5160668495872) is invalid, skip it
> [68100.689222] BTRFS info (device dm-13): the free space cache file (5161742237696) is invalid, skip it
> 
> I knew that filling up a btrfs filesystem was bad, but filling it the
> normal way makes it slow down enough that you usually know and fix it.
> Filling it by having an underlying dm-thin deny writes, is much worse (I expected
> it wouldn't be pretty though, which is why I had a cronjob to catch this before it
> happened, but I missed it due to the df bug).
> 

Yeah I'm curious about this too, it was my understanding that thinp would just 
return an error, which should trigger a transaction abort and then you should 
come back to a completely valid file system.

I sort of wonder if there's a different failure case that allowed some writes to 
complete and let others not, which resulted in this bad file system state.  I'll 
put it on my list of things to investigate, because if that's the case we're 
likely missing some error condition that doesn't trigger a transaction abort 
properly.

We for sure bang the hell out of the "disk starts throwing errors" path, there's 
several xfstests for it and I've spent the last month running a bunch of them in 
a loop, so I know for full failures we're doing the right thing.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-22  0:01                       ` Marc MERLIN
  2020-02-22  0:43                         ` Josef Bacik
@ 2020-02-22  1:06                         ` Marc MERLIN
  2020-02-22  1:23                           ` Marc MERLIN
  1 sibling, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-22  1:06 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On Fri, Feb 21, 2020 at 04:01:42PM -0800, Marc MERLIN wrote:
> [68099.512978] BTRFS info (device dm-13): the free space cache file (5151004819456) is invalid, skip it
> [68100.575692] BTRFS info (device dm-13): the free space cache file (5160668495872) is invalid, skip it
> [68100.689222] BTRFS info (device dm-13): the free space cache file (5161742237696) is invalid, skip it
> 
> I knew that filling up a btrfs filesystem was bad, but filling it the
> normal way makes it slow down enough that you usually know and fix it.
> Filling it by having an underlying dm-thin deny writes, is much worse (I expected
> it wouldn't be pretty though, which is why I had a cronjob to catch this before it
> happened, but I missed it due to the df bug).

It took a while, but it finished eventually. Seems that unmount tries
to fix a lot of stuff, which took a long time and only stopped once LVM
returned an error, which forced readonly and finally allowed unmount to
succeed.
[68696.784521] BTRFS info (device dm-13): the free space cache file (9260214779904) is invalid, skip it 
[68766.967084] BTRFS: error (device dm-13) in btrfs_commit_transaction:2279: errno=-5 IO failure (Error while writing out transaction) 
[68767.005592] BTRFS info (device dm-13): forced readonly 
[68767.022448] BTRFS warning (device dm-13): Skipping commit of aborted transaction. 
[68767.045741] BTRFS: error (device dm-13) in cleanup_transaction:1830: errno=-5 IO failure 
[68767.070945] BTRFS info (device dm-13): delayed_refs has NO entry

I guess I'm probably the few (or first?) person using btrfs with dm-thin with over subscription
and running out of space due to another bug.
Would it make sense to add some filesystem tests to see how btrfs behaves when it gets IO denied
errors by the underlying LVM? (which obviously doesn't happen on regular drives)?

mount read only is a better idea, I'll do that for now:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# df -h .
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/vgds2-ubuntu   14T  8.5T  5.6T  61% /mnt/btrfs_pool2/backup/ubuntu

df is showing a value consistent with btrfs fi df, but of course, not the correct value
since I'm only using a 10th of that data. 

You asked for a check, it's running but may take a while:
gargamel:~# btrfs check /dev/mapper/vgds2-ubuntu
Checking filesystem on /dev/mapper/vgds2-ubuntu
UUID: 905c90db-8081-4071-9c79-57328b8ac0d5
checking extents
checking free space cache
checking fs roots
checking only csum items (without verifying data)

I'll paste the completion when it's done.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-22  1:06                         ` Marc MERLIN
@ 2020-02-22  1:23                           ` Marc MERLIN
  2020-02-22 14:51                             ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-22  1:23 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On Fri, Feb 21, 2020 at 05:06:37PM -0800, Marc MERLIN wrote:
> You asked for a check, it's running but may take a while:
> gargamel:~# btrfs check /dev/mapper/vgds2-ubuntu
> Checking filesystem on /dev/mapper/vgds2-ubuntu
> UUID: 905c90db-8081-4071-9c79-57328b8ac0d5
> checking extents
> checking free space cache
> checking fs roots
> checking only csum items (without verifying data)
> 
> I'll paste the completion when it's done.
 
Ok, faster than I thought. btrfs check came back clean
I added spaces for readability.
So this claims I'm using 9TB?

Is it possible that I'm hitting this problem
1) I did really fill the filesystem (well not to the filesystem size but
to the size that dm-thin was not able to give blocks anymore)
2) I deleted/freed up the space
3) btrfs needs space to free up the space, and there is no space left,
so it's unable to mark the free blocks, as free, and I'm therefore
stuck?

found 9 255 703 285 760 bytes used, no error found
total csum bytes: 9 019 442 564
total tree bytes: 17 533 894 656
total fs tree bytes: 7 411 073 024
total extent tree bytes: 379 928 576
btree space waste bytes: 1 769 834 145
file data blocks allocated: 9267682025472
 referenced 9272533270528


Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/  

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-22  1:23                           ` Marc MERLIN
@ 2020-02-22 14:51                             ` Marc MERLIN
  2020-02-22 14:52                               ` Josef Bacik
  0 siblings, 1 reply; 26+ messages in thread
From: Marc MERLIN @ 2020-02-22 14:51 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On Fri, Feb 21, 2020 at 05:23:12PM -0800, Marc MERLIN wrote:
> On Fri, Feb 21, 2020 at 05:06:37PM -0800, Marc MERLIN wrote:
> > You asked for a check, it's running but may take a while:
> > gargamel:~# btrfs check /dev/mapper/vgds2-ubuntu
> > Checking filesystem on /dev/mapper/vgds2-ubuntu
> > UUID: 905c90db-8081-4071-9c79-57328b8ac0d5
> > checking extents
> > checking free space cache
> > checking fs roots
> > checking only csum items (without verifying data)
> > 
> > I'll paste the completion when it's done.
>  
> Ok, faster than I thought. btrfs check came back clean
> I added spaces for readability.
> So this claims I'm using 9TB?
> 
> Is it possible that I'm hitting this problem
> 1) I did really fill the filesystem (well not to the filesystem size but
> to the size that dm-thin was not able to give blocks anymore)
> 2) I deleted/freed up the space
> 3) btrfs needs space to free up the space, and there is no space left,
> so it's unable to mark the free blocks, as free, and I'm therefore
> stuck?
> 
> found 9 255 703 285 760 bytes used, no error found
> total csum bytes: 9 019 442 564
> total tree bytes: 17 533 894 656
> total fs tree bytes: 7 411 073 024
> total extent tree bytes: 379 928 576
> btree space waste bytes: 1 769 834 145
> file data blocks allocated: 9267682025472
>  referenced 9272533270528

Ok, last call before I delete this filesystem and recover my system to a
working state. I don't need the filesystem fixed, it's fairly quick for
me restore it, but obviously if there is any useful state in it for
improving the code, that will be lost.

I understand it's the weekend, if someone thinks I should wait until
monday to give some other folks the chance to see/reply, let me know,
and I'll keep my system down until monday.

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-22 14:51                             ` Marc MERLIN
@ 2020-02-22 14:52                               ` Josef Bacik
  2020-02-22 15:24                                 ` Marc MERLIN
  0 siblings, 1 reply; 26+ messages in thread
From: Josef Bacik @ 2020-02-22 14:52 UTC (permalink / raw)
  To: Marc MERLIN
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On 2/22/20 9:51 AM, Marc MERLIN wrote:
> On Fri, Feb 21, 2020 at 05:23:12PM -0800, Marc MERLIN wrote:
>> On Fri, Feb 21, 2020 at 05:06:37PM -0800, Marc MERLIN wrote:
>>> You asked for a check, it's running but may take a while:
>>> gargamel:~# btrfs check /dev/mapper/vgds2-ubuntu
>>> Checking filesystem on /dev/mapper/vgds2-ubuntu
>>> UUID: 905c90db-8081-4071-9c79-57328b8ac0d5
>>> checking extents
>>> checking free space cache
>>> checking fs roots
>>> checking only csum items (without verifying data)
>>>
>>> I'll paste the completion when it's done.
>>   
>> Ok, faster than I thought. btrfs check came back clean
>> I added spaces for readability.
>> So this claims I'm using 9TB?
>>
>> Is it possible that I'm hitting this problem
>> 1) I did really fill the filesystem (well not to the filesystem size but
>> to the size that dm-thin was not able to give blocks anymore)
>> 2) I deleted/freed up the space
>> 3) btrfs needs space to free up the space, and there is no space left,
>> so it's unable to mark the free blocks, as free, and I'm therefore
>> stuck?
>>
>> found 9 255 703 285 760 bytes used, no error found
>> total csum bytes: 9 019 442 564
>> total tree bytes: 17 533 894 656
>> total fs tree bytes: 7 411 073 024
>> total extent tree bytes: 379 928 576
>> btree space waste bytes: 1 769 834 145
>> file data blocks allocated: 9267682025472
>>   referenced 9272533270528
> 
> Ok, last call before I delete this filesystem and recover my system to a
> working state. I don't need the filesystem fixed, it's fairly quick for
> me restore it, but obviously if there is any useful state in it for
> improving the code, that will be lost.
> 
> I understand it's the weekend, if someone thinks I should wait until
> monday to give some other folks the chance to see/reply, let me know,
> and I'll keep my system down until monday.
> 

Go ahead and blow it away, and I'll add "dm-thinp failure mode" to my list of 
things to look into.  Sorry Marc,

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.
  2020-02-22 14:52                               ` Josef Bacik
@ 2020-02-22 15:24                                 ` Marc MERLIN
  0 siblings, 0 replies; 26+ messages in thread
From: Marc MERLIN @ 2020-02-22 15:24 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Roman Mamedov, dsterba, Martin Steigerwald, linux-btrfs, kernel-team

On Sat, Feb 22, 2020 at 09:52:56AM -0500, Josef Bacik wrote:
> Go ahead and blow it away, and I'll add "dm-thinp failure mode" to my list
> of things to look into.  Sorry Marc,

No worries, I got lucky that it hit a filesystem that's easy to
recreate, so it's not a huge deal.

Thanks for the replies.
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.

Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2020-02-22 15:24 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-31 14:31 [PATCH] btrfs: do not zero f_bavail if we have available space Josef Bacik
2020-01-31 20:06 ` Martin Steigerwald
2020-02-01  1:00 ` Qu Wenruo
2020-02-02 17:52 ` David Sterba
     [not found]   ` <CAKhhfD7S=kcKLRURdNFZ8H4beS8=XjFvnOQXche7+SVOGFGC_w@mail.gmail.com>
2020-02-19  9:17     ` Martin Steigerwald
2020-02-19 13:43       ` Marc MERLIN
2020-02-19 14:31         ` David Sterba
2020-02-19 15:36           ` Marc MERLIN
2020-02-19 17:50             ` Roman Mamedov
2020-02-19 22:21               ` Martin Steigerwald
2020-02-20 21:46             ` Marc MERLIN
2020-02-21  5:38               ` Marc MERLIN
2020-02-21  5:45                 ` Roman Mamedov
2020-02-21 23:07                   ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Marc MERLIN
2020-02-21 23:17                     ` How to roll back btrfs filesystem a few revisions? Marc MERLIN
2020-02-21 23:47                       ` Josef Bacik
2020-02-22  0:08                         ` Marc MERLIN
2020-02-22  0:36                           ` Josef Bacik
2020-02-21 23:43                     ` btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that Josef Bacik
2020-02-22  0:01                       ` Marc MERLIN
2020-02-22  0:43                         ` Josef Bacik
2020-02-22  1:06                         ` Marc MERLIN
2020-02-22  1:23                           ` Marc MERLIN
2020-02-22 14:51                             ` Marc MERLIN
2020-02-22 14:52                               ` Josef Bacik
2020-02-22 15:24                                 ` Marc MERLIN

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.