All of lore.kernel.org
 help / color / mirror / Atom feed
* Global reserve and ENOSPC while deleting snapshots on 5.0.9
@ 2019-04-23 23:06 Zygo Blaxell
  2019-04-24  2:57 ` Paul Jones
  2019-06-23 14:14 ` Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11 Zygo Blaxell
  0 siblings, 2 replies; 7+ messages in thread
From: Zygo Blaxell @ 2019-04-23 23:06 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 13710 bytes --]

I had a test filesystem that ran out of unallocated space, then ran
out of metadata space during a snapshot delete, and forced readonly.
The workload before the failure was a lot of rsync and bees dedupe
combined with random snapshot creates and deletes.

I tried the usual fix strategies:

	1.  Immediately after mount, try to balance to free space for
	metadata

	2.  Immediately after mount, add additional disks to provide
	unallocated space for metadata

	3.  Mount -o nossd to increase metadata density

#3 had no effect.  #1 failed consistently.

#2 was successful, but the additional space was not used because
btrfs couldn't allocate chunks for metadata because it ran out of
metadata space for new metadata chunks.

When btrfs-cleaner tried to remove the first pending deleted snapshot,
it started a transaction that failed due to lack of metadata space.
Since the transaction failed, the filesystem reverts to its earlier state,
and exactly the same thing happens on the next mount.  The 'btrfs dev
add' in #2 is successful only if it is executed immediately after mount,
before the btrfs-cleaner thread wakes up.

Here's what the kernel said during one of the attempts:

	[41263.822252] BTRFS info (device dm-3): use zstd compression, level 0
	[41263.825135] BTRFS info (device dm-3): using free space tree
	[41263.827319] BTRFS info (device dm-3): has skinny extents
	[42046.463356] ------------[ cut here ]------------
	[42046.463387] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
	[42046.463404] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
	[42046.463407] BTRFS info (device dm-3): forced readonly
	[42046.463414] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
	[42046.463429] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
	[42046.463548] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
	[42046.471363] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
	[42046.471475] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
	[42046.471506] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
	[42046.473672] BTRFS: error (device dm-3) in btrfs_drop_snapshot:9489: errno=-28 No space left
	[42046.475643] WARNING: CPU: 0 PID: 10187 at fs/btrfs/extent-tree.c:7056 __btrfs_free_extent+0x364/0xf60
	[42046.475645] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio joydev ppdev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod snd_pcm aesni_intel aes_x86_64 snd_timer crypto_simd cryptd glue_helper sr_mod snd cdrom psmouse sg soundcore input_leds pcspkr serio_raw ide_pci_generic i2c_piix4 bochs_drm parport_pc piix rtc_cmos floppy parport pcc_cpufreq ide_core qemu_fw_cfg evbug evdev ip_tables x_tables ipv6 crc_ccitt autofs4
	[42046.475677] CPU: 0 PID: 10187 Comm: btrfs-transacti Tainted: G    B   W         5.0.8-zb64-10a85e8a1569+ #1
	[42046.475678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[42046.475681] RIP: 0010:__btrfs_free_extent+0x364/0xf60
	[42046.475684] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 00 85 be b8 e8 3c 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
	[42046.475685] RSP: 0018:ffff888103d8f8e0 EFLAGS: 00010282
	[42046.475688] RAX: 0000000000000000 RBX: ffff88802d9ce370 RCX: ffffffffb7224ca2
	[42046.475689] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88811960dfac
	[42046.475691] RBP: ffff888103d8fa98 R08: ffffed10232c24c1 R09: ffffed10232c24c0
	[42046.475693] R10: ffff888025c94aeb R11: ffffed10232c24c1 R12: 0000000000a5054e
	[42046.475694] R13: 0000000000003000 R14: 000000000000214a R15: ffff888103d8fa70
	[42046.475696] FS:  0000000000000000(0000) GS:ffff888119400000(0000) knlGS:0000000000000000
	[42046.475698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[42046.475700] CR2: 00007f1b84271718 CR3: 00000000a3636001 CR4: 00000000001606f0
	[42046.475702] Call Trace:
	[42046.475707]  ? trace_hardirqs_off+0x24/0x120
	[42046.475714]  ? update_block_group+0x6d0/0x6d0
	[42046.475718]  ? __lock_acquire+0xf8/0x26e0
	[42046.475721]  ? __kasan_slab_free+0x14d/0x230
	[42046.475724]  ? btrfs_delayed_ref_lock+0x3e/0x100
	[42046.475727]  ? __lock_acquire+0xf8/0x26e0
	[42046.475730]  ? kthread+0x1a9/0x200
	[42046.475733]  ? debug_show_all_locks+0x210/0x210
	[42046.475737]  ? __btrfs_run_delayed_refs+0xce/0x210
	[42046.475742]  ? debug_show_all_locks+0x210/0x210
	[42046.475747]  run_delayed_data_ref+0x15a/0x400
	[42046.475752]  ? alloc_reserved_file_extent+0x520/0x520
	[42046.475756]  ? rb_next+0x25/0x90
	[42046.475761]  run_one_delayed_ref+0x71/0xe0
	[42046.475765]  btrfs_run_delayed_refs_for_head+0x284/0x580
	[42046.475772]  __btrfs_run_delayed_refs+0xeb/0x210
	[42046.475777]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
	[42046.475781]  ? debug_show_all_locks+0x210/0x210
	[42046.475786]  btrfs_run_delayed_refs+0xc0/0x260
	[42046.475792]  btrfs_commit_transaction+0xf0/0x10f0
	[42046.475795]  ? do_raw_spin_unlock+0xa8/0x140
	[42046.475799]  ? _raw_spin_unlock+0x27/0x40
	[42046.475803]  ? btrfs_record_root_in_trans+0x24/0xb0
	[42046.475807]  ? btrfs_apply_pending_changes+0xb0/0xb0
	[42046.475809]  ? start_transaction+0x14a/0x760
	[42046.475816]  transaction_kthread+0x218/0x250
	[42046.475822]  kthread+0x1a9/0x200
	[42046.475824]  ? btrfs_cleanup_transaction+0xb80/0xb80
	[42046.475827]  ? kthread_park+0xb0/0xb0
	[42046.475831]  ret_from_fork+0x3a/0x50
	[42046.475837] irq event stamp: 0
	[42046.475839] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
	[42046.475843] hardirqs last disabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
	[42046.475846] softirqs last  enabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
	[42046.475847] softirqs last disabled at (0): [<0000000000000000>]           (null)
	[42046.475849] ---[ end trace b02d556137ea688c ]---
	[42046.475852] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
	[42046.475859] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
	[42046.475878] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
	[42046.475905] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left

I noticed that during the few minutes when the filesystem was running,
the "global reserve" number in "btrfs fi usage" output kept going up to
503M or so, then the transaction failed.

I then applied this patch to the kernel (5.0.8):

	diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
	index 83791d13c204..59ba456e2314 100644
	--- a/fs/btrfs/extent-tree.c
	+++ b/fs/btrfs/extent-tree.c
	@@ -5790,7 +5790,7 @@ static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
		spin_lock(&sinfo->lock);
		spin_lock(&block_rsv->lock);

	-       block_rsv->size = min_t(u64, num_bytes, SZ_512M);
	+       block_rsv->size = min_t(u64, num_bytes, SZ_1G);

		if (block_rsv->reserved < block_rsv->size) {
			num_bytes = btrfs_space_info_used(sinfo, true);

Then mounted the filesystem.

With the larger reserved block size, btrfs-cleaner was able to get
past its earlier failures, allocating 3GB of new metadata chunks in the
first transaction after it started deleting snapshots.  After this, the
filesystem deleted some more snapshots (there are currently 2470 deleted
snapshots remaining on this filesystem).

This isn't really a solution, though, for two reasons:

1.  The commit that adds the limit on block_rsv->size (fdf30d1c1b3
"Btrfs: limit the global reserve to 512mb") points out:

    A user reported a problem where he was getting early ENOSPC with
    hundreds of gigs of free data space and 6 gigs of free metadata space.
    This is because the global block reserve was taking up the entire
    free metadata space.  This is ridiculous, we have infrastructure in
    place to throttle if we start using too much of the global reserve,
    so instead of letting it get this huge just limit it to 512mb so that
    users can still get work done.  This allowed the user to complete
    his rsync without issues. 

In other words, something should be handling the case when reserved
space runs out, and that something is failing.

2.  The 1G reserved limit still isn't sufficient.  The filesystem deletes
a few dozen snapshots, then fails again later:

	[ 3705.477036] CPU: 2 PID: 8388 Comm: btrfs-transacti Tainted: G    B             5.0.9-zb64-b6eed6abc880+ #3
	[ 3705.481510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
	[ 3705.484038] RIP: 0010:__btrfs_free_extent+0x364/0xf60
	[ 3705.485061] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 80 85 be 8b e8 fc 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
	[ 3705.488046] RSP: 0018:ffff8881c29cf8e0 EFLAGS: 00010282
	[ 3705.488868] RAX: 0000000000000000 RBX: ffff888121dbdf20 RCX: ffffffff8a224a92
	[ 3705.490129] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8881f5e0dfac
	[ 3705.491246] RBP: ffff8881c29cfa98 R08: ffffed103ebc24c1 R09: ffffed103ebc24c0
	[ 3705.492376] R10: ffff88819e6d430b R11: ffffed103ebc24c1 R12: 00000000009ecb21
	[ 3705.493501] R13: 0000000000040000 R14: 00000000000021b3 R15: ffff8881c29cfa70
	[ 3705.494619] FS:  0000000000000000(0000) GS:ffff8881f5c00000(0000) knlGS:0000000000000000
	[ 3705.495883] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	[ 3705.496781] CR2: 0000556bc833ec70 CR3: 00000001ef9e0001 CR4: 00000000001606e0
	[ 3705.497872] Call Trace:
	[ 3705.498279]  ? update_block_group+0x6d0/0x6d0
	[ 3705.498950]  ? btrfs_run_delayed_refs_for_head+0x352/0x580
	[ 3705.499816]  ? kthread+0x1a9/0x200
	[ 3705.500379]  ? debug_show_all_locks+0x210/0x210
	[ 3705.501078]  ? __btrfs_run_delayed_refs+0xce/0x210
	[ 3705.501813]  ? debug_show_all_locks+0x210/0x210
	[ 3705.502516]  run_delayed_data_ref+0x15a/0x400
	[ 3705.503193]  ? alloc_reserved_file_extent+0x520/0x520
	[ 3705.503968]  ? rb_next+0x3c/0x90
	[ 3705.504479]  run_one_delayed_ref+0x71/0xe0
	[ 3705.505116]  btrfs_run_delayed_refs_for_head+0x284/0x580
	[ 3705.505935]  __btrfs_run_delayed_refs+0xeb/0x210
	[ 3705.506649]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
	[ 3705.507495]  ? debug_show_all_locks+0x210/0x210
	[ 3705.508201]  btrfs_run_delayed_refs+0xc0/0x260
	[ 3705.508912]  btrfs_commit_transaction+0xf0/0x10f0
	[ 3705.509654]  ? do_raw_spin_unlock+0xa8/0x140
	[ 3705.510343]  ? _raw_spin_unlock+0x27/0x40
	[ 3705.510979]  ? btrfs_record_root_in_trans+0x24/0xb0
	[ 3705.511748]  ? btrfs_apply_pending_changes+0xb0/0xb0
	[ 3705.512533]  ? start_transaction+0x14a/0x760
	[ 3705.513216]  transaction_kthread+0x218/0x250
	[ 3705.513900]  kthread+0x1a9/0x200
	[ 3705.514425]  ? btrfs_cleanup_transaction+0xb80/0xb80
	[ 3705.515233]  ? kthread_park+0xb0/0xb0
	[ 3705.515829]  ret_from_fork+0x3a/0x50
	[ 3705.516408] irq event stamp: 0
	[ 3705.516898] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
	[ 3705.518054] hardirqs last disabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
	[ 3705.519419] softirqs last  enabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
	[ 3705.520779] softirqs last disabled at (0): [<0000000000000000>]           (null)
	[ 3705.521944] ---[ end trace 3a5f720e889bc0d3 ]---
	[ 3705.522713] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
	[ 3705.524069] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left

If I increase the reserved limit again to 2G, btrfs-cleaner gets past
this shapshot (so far).  It takes more than 10 minutes to delete some
of the snapshots, so this will take a while to finish all 2470.

The current btrfs fi usage output worries me:

	Overall:
	    Device size:                   2.02TiB
	    Device allocated:              2.01TiB
	    Device unallocated:           10.03GiB
	    Device missing:                  0.00B
	    Used:                          1.70TiB
	    Free (estimated):            296.02GiB      (min: 291.01GiB)
	    Data ratio:                       1.00
	    Metadata ratio:                   2.00
	    Global reserve:                2.00GiB      (used: 864.00KiB)

	Data,single: Size:1.81TiB, Used:1.53TiB
	   /dev/mapper/vgnebtest-tvdb    928.97GiB
	   /dev/mapper/vgnebtest-tvdc    929.00GiB

	Metadata,RAID1: Size:99.99GiB, Used:82.29GiB
	   /dev/mapper/vgnebtest-lvol1     5.00GiB
	   /dev/mapper/vgnebtest-lvol2     5.00GiB
	   /dev/mapper/vgnebtest-tvdb     94.99GiB
	   /dev/mapper/vgnebtest-tvdc     94.99GiB

	System,RAID1: Size:8.00MiB, Used:240.00KiB
	   /dev/mapper/vgnebtest-tvdb      8.00MiB
	   /dev/mapper/vgnebtest-tvdc      8.00MiB

	Unallocated:
	   /dev/mapper/vgnebtest-lvol1     5.00GiB
	   /dev/mapper/vgnebtest-lvol2     5.00GiB
	   /dev/mapper/vgnebtest-tvdb     30.00MiB
	   /dev/mapper/vgnebtest-tvdc      1.00MiB

Note "lvol1" and "lvol2" were added after the initial failure.  

There is clearly a lot of metadata space available, but the filesystem still
intermittently hits ENOSPC while trying to free space.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Global reserve and ENOSPC while deleting snapshots on 5.0.9
  2019-04-23 23:06 Global reserve and ENOSPC while deleting snapshots on 5.0.9 Zygo Blaxell
@ 2019-04-24  2:57 ` Paul Jones
  2019-04-24  3:58   ` Zygo Blaxell
  2019-06-23 14:14 ` Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11 Zygo Blaxell
  1 sibling, 1 reply; 7+ messages in thread
From: Paul Jones @ 2019-04-24  2:57 UTC (permalink / raw)
  To: Zygo Blaxell, linux-btrfs

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
> owner@vger.kernel.org> On Behalf Of Zygo Blaxell
> Sent: Wednesday, 24 April 2019 9:07 AM
> To: linux-btrfs@vger.kernel.org
> Subject: Global reserve and ENOSPC while deleting snapshots on 5.0.9
> 
> I had a test filesystem that ran out of unallocated space, then ran out of
> metadata space during a snapshot delete, and forced readonly.
> The workload before the failure was a lot of rsync and bees dedupe
> combined with random snapshot creates and deletes.
> 
> I tried the usual fix strategies:
> 
> 	1.  Immediately after mount, try to balance to free space for
> 	metadata
> 
> 	2.  Immediately after mount, add additional disks to provide
> 	unallocated space for metadata
> 
> 	3.  Mount -o nossd to increase metadata density
> 
> #3 had no effect.  #1 failed consistently.
> 
> #2 was successful, but the additional space was not used because btrfs
> couldn't allocate chunks for metadata because it ran out of metadata space
> for new metadata chunks.
> 
> When btrfs-cleaner tried to remove the first pending deleted snapshot, it
> started a transaction that failed due to lack of metadata space.
> Since the transaction failed, the filesystem reverts to its earlier state, and
> exactly the same thing happens on the next mount.  The 'btrfs dev add' in #2
> is successful only if it is executed immediately after mount, before the btrfs-
> cleaner thread wakes up.

I had a similar problem on iirc 4.20, except that I couldn't get the new devices to add (raid1) before the cleaner thread ran, no matter how fast I added them after mount.
I ended up just commenting out the part that forces the fs to go read only. The cleaner thread exits gracefully (I think?) so then it was no trouble to add the devices.

Is it still necessary to have the fs go read only like that when it's out of space?

Paul.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Global reserve and ENOSPC while deleting snapshots on 5.0.9
  2019-04-24  2:57 ` Paul Jones
@ 2019-04-24  3:58   ` Zygo Blaxell
  0 siblings, 0 replies; 7+ messages in thread
From: Zygo Blaxell @ 2019-04-24  3:58 UTC (permalink / raw)
  To: Paul Jones; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2261 bytes --]

On Wed, Apr 24, 2019 at 02:57:47AM +0000, Paul Jones wrote:
> > -----Original Message-----
> > From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
> > owner@vger.kernel.org> On Behalf Of Zygo Blaxell
> > Sent: Wednesday, 24 April 2019 9:07 AM
> > To: linux-btrfs@vger.kernel.org
> > Subject: Global reserve and ENOSPC while deleting snapshots on 5.0.9
> > 
> > I had a test filesystem that ran out of unallocated space, then ran out of
> > metadata space during a snapshot delete, and forced readonly.
> > The workload before the failure was a lot of rsync and bees dedupe
> > combined with random snapshot creates and deletes.
> > 
> > I tried the usual fix strategies:
> > 
> > 	1.  Immediately after mount, try to balance to free space for
> > 	metadata
> > 
> > 	2.  Immediately after mount, add additional disks to provide
> > 	unallocated space for metadata
> > 
> > 	3.  Mount -o nossd to increase metadata density
> > 
> > #3 had no effect.  #1 failed consistently.
> > 
> > #2 was successful, but the additional space was not used because btrfs
> > couldn't allocate chunks for metadata because it ran out of metadata space
> > for new metadata chunks.
> > 
> > When btrfs-cleaner tried to remove the first pending deleted snapshot, it
> > started a transaction that failed due to lack of metadata space.
> > Since the transaction failed, the filesystem reverts to its earlier state, and
> > exactly the same thing happens on the next mount.  The 'btrfs dev add' in #2
> > is successful only if it is executed immediately after mount, before the btrfs-
> > cleaner thread wakes up.
> 
> I had a similar problem on iirc 4.20, except that I couldn't get the new devices to add (raid1) before the cleaner thread ran, no matter how fast I added them after mount.
> I ended up just commenting out the part that forces the fs to go read only. The cleaner thread exits gracefully (I think?) so then it was no trouble to add the devices.
> 
> Is it still necessary to have the fs go read only like that when it's out of space?

It's definitely a good idea to go read only on generic transaction
failures.

Maybe it's not such a good idea to lump ENOSPC in with other kinds of
transaction failure.

> Paul.
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11
  2019-04-23 23:06 Global reserve and ENOSPC while deleting snapshots on 5.0.9 Zygo Blaxell
  2019-04-24  2:57 ` Paul Jones
@ 2019-06-23 14:14 ` Zygo Blaxell
  2019-06-24 10:01   ` Martin Raiber
  2019-07-01  3:07   ` Global reserve and ENOSPC while deleting orphan inodes on 5.1.14 Zygo Blaxell
  1 sibling, 2 replies; 7+ messages in thread
From: Zygo Blaxell @ 2019-06-23 14:14 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 15460 bytes --]

On Tue, Apr 23, 2019 at 07:06:51PM -0400, Zygo Blaxell wrote:
> I had a test filesystem that ran out of unallocated space, then ran
> out of metadata space during a snapshot delete, and forced readonly.
> The workload before the failure was a lot of rsync and bees dedupe
> combined with random snapshot creates and deletes.

Had this happen again on a production filesystem, this time on 5.1.11,
and it happened during orphan inode cleanup instead of snapshot delete:

	[14303.076134][T20882] BTRFS: error (device dm-21) in add_to_free_space_tree:1037: errno=-28 No space left
	[14303.076144][T20882] BTRFS: error (device dm-21) in __btrfs_free_extent:7196: errno=-28 No space left
	[14303.076157][T20882] BTRFS: error (device dm-21) in btrfs_run_delayed_refs:3008: errno=-28 No space left
	[14303.076203][T20882] BTRFS error (device dm-21): Error removing orphan entry, stopping orphan cleanup
	[14303.076210][T20882] BTRFS error (device dm-21): could not do orphan cleanup -22
	[14303.076281][T20882] BTRFS error (device dm-21): commit super ret -30
	[14303.357337][T20882] BTRFS error (device dm-21): open_ctree failed

Same fix:  I bumped the reserved size limit from 512M to 2G and mounted
normally.  (OK, technically, I booted my old 5.0.21 kernel--but my 5.0.21
kernel has the 2G reserved space patch below in it.)

I've not been able to repeat this ENOSPC behavior under test conditions
in the last two months of trying, but it's now happened twice in different
places, so it has non-zero repeatability.

> I tried the usual fix strategies:
> 
> 	1.  Immediately after mount, try to balance to free space for
> 	metadata
> 
> 	2.  Immediately after mount, add additional disks to provide
> 	unallocated space for metadata
> 
> 	3.  Mount -o nossd to increase metadata density
> 
> #3 had no effect.  #1 failed consistently.
> 
> #2 was successful, but the additional space was not used because
> btrfs couldn't allocate chunks for metadata because it ran out of
> metadata space for new metadata chunks.
> 
> When btrfs-cleaner tried to remove the first pending deleted snapshot,
> it started a transaction that failed due to lack of metadata space.
> Since the transaction failed, the filesystem reverts to its earlier state,
> and exactly the same thing happens on the next mount.  The 'btrfs dev
> add' in #2 is successful only if it is executed immediately after mount,
> before the btrfs-cleaner thread wakes up.
> 
> Here's what the kernel said during one of the attempts:
> 
> 	[41263.822252] BTRFS info (device dm-3): use zstd compression, level 0
> 	[41263.825135] BTRFS info (device dm-3): using free space tree
> 	[41263.827319] BTRFS info (device dm-3): has skinny extents
> 	[42046.463356] ------------[ cut here ]------------
> 	[42046.463387] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> 	[42046.463404] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> 	[42046.463407] BTRFS info (device dm-3): forced readonly
> 	[42046.463414] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> 	[42046.463429] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> 	[42046.463548] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> 	[42046.471363] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> 	[42046.471475] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> 	[42046.471506] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> 	[42046.473672] BTRFS: error (device dm-3) in btrfs_drop_snapshot:9489: errno=-28 No space left
> 	[42046.475643] WARNING: CPU: 0 PID: 10187 at fs/btrfs/extent-tree.c:7056 __btrfs_free_extent+0x364/0xf60
> 	[42046.475645] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio joydev ppdev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod snd_pcm aesni_intel aes_x86_64 snd_timer crypto_simd cryptd glue_helper sr_mod snd cdrom psmouse sg soundcore input_leds pcspkr serio_raw ide_pci_generic i2c_piix4 bochs_drm parport_pc piix rtc_cmos floppy parport pcc_cpufreq ide_core qemu_fw_cfg evbug evdev ip_tables x_tables ipv6 crc_ccitt autofs4
> 	[42046.475677] CPU: 0 PID: 10187 Comm: btrfs-transacti Tainted: G    B   W         5.0.8-zb64-10a85e8a1569+ #1
> 	[42046.475678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> 	[42046.475681] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> 	[42046.475684] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 00 85 be b8 e8 3c 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> 	[42046.475685] RSP: 0018:ffff888103d8f8e0 EFLAGS: 00010282
> 	[42046.475688] RAX: 0000000000000000 RBX: ffff88802d9ce370 RCX: ffffffffb7224ca2
> 	[42046.475689] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88811960dfac
> 	[42046.475691] RBP: ffff888103d8fa98 R08: ffffed10232c24c1 R09: ffffed10232c24c0
> 	[42046.475693] R10: ffff888025c94aeb R11: ffffed10232c24c1 R12: 0000000000a5054e
> 	[42046.475694] R13: 0000000000003000 R14: 000000000000214a R15: ffff888103d8fa70
> 	[42046.475696] FS:  0000000000000000(0000) GS:ffff888119400000(0000) knlGS:0000000000000000
> 	[42046.475698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	[42046.475700] CR2: 00007f1b84271718 CR3: 00000000a3636001 CR4: 00000000001606f0
> 	[42046.475702] Call Trace:
> 	[42046.475707]  ? trace_hardirqs_off+0x24/0x120
> 	[42046.475714]  ? update_block_group+0x6d0/0x6d0
> 	[42046.475718]  ? __lock_acquire+0xf8/0x26e0
> 	[42046.475721]  ? __kasan_slab_free+0x14d/0x230
> 	[42046.475724]  ? btrfs_delayed_ref_lock+0x3e/0x100
> 	[42046.475727]  ? __lock_acquire+0xf8/0x26e0
> 	[42046.475730]  ? kthread+0x1a9/0x200
> 	[42046.475733]  ? debug_show_all_locks+0x210/0x210
> 	[42046.475737]  ? __btrfs_run_delayed_refs+0xce/0x210
> 	[42046.475742]  ? debug_show_all_locks+0x210/0x210
> 	[42046.475747]  run_delayed_data_ref+0x15a/0x400
> 	[42046.475752]  ? alloc_reserved_file_extent+0x520/0x520
> 	[42046.475756]  ? rb_next+0x25/0x90
> 	[42046.475761]  run_one_delayed_ref+0x71/0xe0
> 	[42046.475765]  btrfs_run_delayed_refs_for_head+0x284/0x580
> 	[42046.475772]  __btrfs_run_delayed_refs+0xeb/0x210
> 	[42046.475777]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
> 	[42046.475781]  ? debug_show_all_locks+0x210/0x210
> 	[42046.475786]  btrfs_run_delayed_refs+0xc0/0x260
> 	[42046.475792]  btrfs_commit_transaction+0xf0/0x10f0
> 	[42046.475795]  ? do_raw_spin_unlock+0xa8/0x140
> 	[42046.475799]  ? _raw_spin_unlock+0x27/0x40
> 	[42046.475803]  ? btrfs_record_root_in_trans+0x24/0xb0
> 	[42046.475807]  ? btrfs_apply_pending_changes+0xb0/0xb0
> 	[42046.475809]  ? start_transaction+0x14a/0x760
> 	[42046.475816]  transaction_kthread+0x218/0x250
> 	[42046.475822]  kthread+0x1a9/0x200
> 	[42046.475824]  ? btrfs_cleanup_transaction+0xb80/0xb80
> 	[42046.475827]  ? kthread_park+0xb0/0xb0
> 	[42046.475831]  ret_from_fork+0x3a/0x50
> 	[42046.475837] irq event stamp: 0
> 	[42046.475839] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> 	[42046.475843] hardirqs last disabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
> 	[42046.475846] softirqs last  enabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
> 	[42046.475847] softirqs last disabled at (0): [<0000000000000000>]           (null)
> 	[42046.475849] ---[ end trace b02d556137ea688c ]---
> 	[42046.475852] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> 	[42046.475859] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> 	[42046.475878] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> 	[42046.475905] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> 
> I noticed that during the few minutes when the filesystem was running,
> the "global reserve" number in "btrfs fi usage" output kept going up to
> 503M or so, then the transaction failed.
> 
> I then applied this patch to the kernel (5.0.8):
> 
> 	diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> 	index 83791d13c204..59ba456e2314 100644
> 	--- a/fs/btrfs/extent-tree.c
> 	+++ b/fs/btrfs/extent-tree.c
> 	@@ -5790,7 +5790,7 @@ static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
> 		spin_lock(&sinfo->lock);
> 		spin_lock(&block_rsv->lock);
> 
> 	-       block_rsv->size = min_t(u64, num_bytes, SZ_512M);
> 	+       block_rsv->size = min_t(u64, num_bytes, SZ_1G);
> 
> 		if (block_rsv->reserved < block_rsv->size) {
> 			num_bytes = btrfs_space_info_used(sinfo, true);
> 
> Then mounted the filesystem.
> 
> With the larger reserved block size, btrfs-cleaner was able to get
> past its earlier failures, allocating 3GB of new metadata chunks in the
> first transaction after it started deleting snapshots.  After this, the
> filesystem deleted some more snapshots (there are currently 2470 deleted
> snapshots remaining on this filesystem).
> 
> This isn't really a solution, though, for two reasons:
> 
> 1.  The commit that adds the limit on block_rsv->size (fdf30d1c1b3
> "Btrfs: limit the global reserve to 512mb") points out:
> 
>     A user reported a problem where he was getting early ENOSPC with
>     hundreds of gigs of free data space and 6 gigs of free metadata space.
>     This is because the global block reserve was taking up the entire
>     free metadata space.  This is ridiculous, we have infrastructure in
>     place to throttle if we start using too much of the global reserve,
>     so instead of letting it get this huge just limit it to 512mb so that
>     users can still get work done.  This allowed the user to complete
>     his rsync without issues. 
> 
> In other words, something should be handling the case when reserved
> space runs out, and that something is failing.
> 
> 2.  The 1G reserved limit still isn't sufficient.  The filesystem deletes
> a few dozen snapshots, then fails again later:
> 
> 	[ 3705.477036] CPU: 2 PID: 8388 Comm: btrfs-transacti Tainted: G    B             5.0.9-zb64-b6eed6abc880+ #3
> 	[ 3705.481510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> 	[ 3705.484038] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> 	[ 3705.485061] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 80 85 be 8b e8 fc 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> 	[ 3705.488046] RSP: 0018:ffff8881c29cf8e0 EFLAGS: 00010282
> 	[ 3705.488868] RAX: 0000000000000000 RBX: ffff888121dbdf20 RCX: ffffffff8a224a92
> 	[ 3705.490129] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8881f5e0dfac
> 	[ 3705.491246] RBP: ffff8881c29cfa98 R08: ffffed103ebc24c1 R09: ffffed103ebc24c0
> 	[ 3705.492376] R10: ffff88819e6d430b R11: ffffed103ebc24c1 R12: 00000000009ecb21
> 	[ 3705.493501] R13: 0000000000040000 R14: 00000000000021b3 R15: ffff8881c29cfa70
> 	[ 3705.494619] FS:  0000000000000000(0000) GS:ffff8881f5c00000(0000) knlGS:0000000000000000
> 	[ 3705.495883] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> 	[ 3705.496781] CR2: 0000556bc833ec70 CR3: 00000001ef9e0001 CR4: 00000000001606e0
> 	[ 3705.497872] Call Trace:
> 	[ 3705.498279]  ? update_block_group+0x6d0/0x6d0
> 	[ 3705.498950]  ? btrfs_run_delayed_refs_for_head+0x352/0x580
> 	[ 3705.499816]  ? kthread+0x1a9/0x200
> 	[ 3705.500379]  ? debug_show_all_locks+0x210/0x210
> 	[ 3705.501078]  ? __btrfs_run_delayed_refs+0xce/0x210
> 	[ 3705.501813]  ? debug_show_all_locks+0x210/0x210
> 	[ 3705.502516]  run_delayed_data_ref+0x15a/0x400
> 	[ 3705.503193]  ? alloc_reserved_file_extent+0x520/0x520
> 	[ 3705.503968]  ? rb_next+0x3c/0x90
> 	[ 3705.504479]  run_one_delayed_ref+0x71/0xe0
> 	[ 3705.505116]  btrfs_run_delayed_refs_for_head+0x284/0x580
> 	[ 3705.505935]  __btrfs_run_delayed_refs+0xeb/0x210
> 	[ 3705.506649]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
> 	[ 3705.507495]  ? debug_show_all_locks+0x210/0x210
> 	[ 3705.508201]  btrfs_run_delayed_refs+0xc0/0x260
> 	[ 3705.508912]  btrfs_commit_transaction+0xf0/0x10f0
> 	[ 3705.509654]  ? do_raw_spin_unlock+0xa8/0x140
> 	[ 3705.510343]  ? _raw_spin_unlock+0x27/0x40
> 	[ 3705.510979]  ? btrfs_record_root_in_trans+0x24/0xb0
> 	[ 3705.511748]  ? btrfs_apply_pending_changes+0xb0/0xb0
> 	[ 3705.512533]  ? start_transaction+0x14a/0x760
> 	[ 3705.513216]  transaction_kthread+0x218/0x250
> 	[ 3705.513900]  kthread+0x1a9/0x200
> 	[ 3705.514425]  ? btrfs_cleanup_transaction+0xb80/0xb80
> 	[ 3705.515233]  ? kthread_park+0xb0/0xb0
> 	[ 3705.515829]  ret_from_fork+0x3a/0x50
> 	[ 3705.516408] irq event stamp: 0
> 	[ 3705.516898] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> 	[ 3705.518054] hardirqs last disabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
> 	[ 3705.519419] softirqs last  enabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
> 	[ 3705.520779] softirqs last disabled at (0): [<0000000000000000>]           (null)
> 	[ 3705.521944] ---[ end trace 3a5f720e889bc0d3 ]---
> 	[ 3705.522713] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> 	[ 3705.524069] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> 
> If I increase the reserved limit again to 2G, btrfs-cleaner gets past
> this shapshot (so far).  It takes more than 10 minutes to delete some
> of the snapshots, so this will take a while to finish all 2470.
> 
> The current btrfs fi usage output worries me:
> 
> 	Overall:
> 	    Device size:                   2.02TiB
> 	    Device allocated:              2.01TiB
> 	    Device unallocated:           10.03GiB
> 	    Device missing:                  0.00B
> 	    Used:                          1.70TiB
> 	    Free (estimated):            296.02GiB      (min: 291.01GiB)
> 	    Data ratio:                       1.00
> 	    Metadata ratio:                   2.00
> 	    Global reserve:                2.00GiB      (used: 864.00KiB)
> 
> 	Data,single: Size:1.81TiB, Used:1.53TiB
> 	   /dev/mapper/vgnebtest-tvdb    928.97GiB
> 	   /dev/mapper/vgnebtest-tvdc    929.00GiB
> 
> 	Metadata,RAID1: Size:99.99GiB, Used:82.29GiB
> 	   /dev/mapper/vgnebtest-lvol1     5.00GiB
> 	   /dev/mapper/vgnebtest-lvol2     5.00GiB
> 	   /dev/mapper/vgnebtest-tvdb     94.99GiB
> 	   /dev/mapper/vgnebtest-tvdc     94.99GiB
> 
> 	System,RAID1: Size:8.00MiB, Used:240.00KiB
> 	   /dev/mapper/vgnebtest-tvdb      8.00MiB
> 	   /dev/mapper/vgnebtest-tvdc      8.00MiB
> 
> 	Unallocated:
> 	   /dev/mapper/vgnebtest-lvol1     5.00GiB
> 	   /dev/mapper/vgnebtest-lvol2     5.00GiB
> 	   /dev/mapper/vgnebtest-tvdb     30.00MiB
> 	   /dev/mapper/vgnebtest-tvdc      1.00MiB
> 
> Note "lvol1" and "lvol2" were added after the initial failure.  
> 
> There is clearly a lot of metadata space available, but the filesystem still
> intermittently hits ENOSPC while trying to free space.



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11
  2019-06-23 14:14 ` Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11 Zygo Blaxell
@ 2019-06-24 10:01   ` Martin Raiber
  2019-06-24 11:36     ` Zygo Blaxell
  2019-07-01  3:07   ` Global reserve and ENOSPC while deleting orphan inodes on 5.1.14 Zygo Blaxell
  1 sibling, 1 reply; 7+ messages in thread
From: Martin Raiber @ 2019-06-24 10:01 UTC (permalink / raw)
  To: Zygo Blaxell, linux-btrfs

I've fixed the same problem(s) by increasing the global metadata size as
well. Though I haven't encountered them since Josef Bacik's block rsv
rework in 5.0.
Another problem with increasing the global metadata size is, that I
think it is the only way dirty metadata is throttled. If increased too
much (as a percentage of RAM) the system goes OOM depending on work
load. As far as I can see dirty metdata isn't included into the
dirty_ratio calculation as well, causing issues on that front as well.
Another thing that I think helps is to run with "nodatasum" -- probably
because then less metadata needs to be changed when deleting.

On 23.06.2019 16:14 Zygo Blaxell wrote:
> On Tue, Apr 23, 2019 at 07:06:51PM -0400, Zygo Blaxell wrote:
>> I had a test filesystem that ran out of unallocated space, then ran
>> out of metadata space during a snapshot delete, and forced readonly.
>> The workload before the failure was a lot of rsync and bees dedupe
>> combined with random snapshot creates and deletes.
> Had this happen again on a production filesystem, this time on 5.1.11,
> and it happened during orphan inode cleanup instead of snapshot delete:
>
> 	[14303.076134][T20882] BTRFS: error (device dm-21) in add_to_free_space_tree:1037: errno=-28 No space left
> 	[14303.076144][T20882] BTRFS: error (device dm-21) in __btrfs_free_extent:7196: errno=-28 No space left
> 	[14303.076157][T20882] BTRFS: error (device dm-21) in btrfs_run_delayed_refs:3008: errno=-28 No space left
> 	[14303.076203][T20882] BTRFS error (device dm-21): Error removing orphan entry, stopping orphan cleanup
> 	[14303.076210][T20882] BTRFS error (device dm-21): could not do orphan cleanup -22
> 	[14303.076281][T20882] BTRFS error (device dm-21): commit super ret -30
> 	[14303.357337][T20882] BTRFS error (device dm-21): open_ctree failed
>
> Same fix:  I bumped the reserved size limit from 512M to 2G and mounted
> normally.  (OK, technically, I booted my old 5.0.21 kernel--but my 5.0.21
> kernel has the 2G reserved space patch below in it.)
>
> I've not been able to repeat this ENOSPC behavior under test conditions
> in the last two months of trying, but it's now happened twice in different
> places, so it has non-zero repeatability.
>
>> I tried the usual fix strategies:
>>
>> 	1.  Immediately after mount, try to balance to free space for
>> 	metadata
>>
>> 	2.  Immediately after mount, add additional disks to provide
>> 	unallocated space for metadata
>>
>> 	3.  Mount -o nossd to increase metadata density
>>
>> #3 had no effect.  #1 failed consistently.
>>
>> #2 was successful, but the additional space was not used because
>> btrfs couldn't allocate chunks for metadata because it ran out of
>> metadata space for new metadata chunks.
>>
>> When btrfs-cleaner tried to remove the first pending deleted snapshot,
>> it started a transaction that failed due to lack of metadata space.
>> Since the transaction failed, the filesystem reverts to its earlier state,
>> and exactly the same thing happens on the next mount.  The 'btrfs dev
>> add' in #2 is successful only if it is executed immediately after mount,
>> before the btrfs-cleaner thread wakes up.
>>
>> Here's what the kernel said during one of the attempts:
>>
>> 	[41263.822252] BTRFS info (device dm-3): use zstd compression, level 0
>> 	[41263.825135] BTRFS info (device dm-3): using free space tree
>> 	[41263.827319] BTRFS info (device dm-3): has skinny extents
>> 	[42046.463356] ------------[ cut here ]------------
>> 	[42046.463387] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
>> 	[42046.463404] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
>> 	[42046.463407] BTRFS info (device dm-3): forced readonly
>> 	[42046.463414] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
>> 	[42046.463429] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
>> 	[42046.463548] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
>> 	[42046.471363] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
>> 	[42046.471475] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
>> 	[42046.471506] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
>> 	[42046.473672] BTRFS: error (device dm-3) in btrfs_drop_snapshot:9489: errno=-28 No space left
>> 	[42046.475643] WARNING: CPU: 0 PID: 10187 at fs/btrfs/extent-tree.c:7056 __btrfs_free_extent+0x364/0xf60
>> 	[42046.475645] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio joydev ppdev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod snd_pcm aesni_intel aes_x86_64 snd_timer crypto_simd cryptd glue_helper sr_mod snd cdrom psmouse sg soundcore input_leds pcspkr serio_raw ide_pci_generic i2c_piix4 bochs_drm parport_pc piix rtc_cmos floppy parport pcc_cpufreq ide_core qemu_fw_cfg evbug evdev ip_tables x_tables ipv6 crc_ccitt autofs4
>> 	[42046.475677] CPU: 0 PID: 10187 Comm: btrfs-transacti Tainted: G    B   W         5.0.8-zb64-10a85e8a1569+ #1
>> 	[42046.475678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> 	[42046.475681] RIP: 0010:__btrfs_free_extent+0x364/0xf60
>> 	[42046.475684] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 00 85 be b8 e8 3c 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
>> 	[42046.475685] RSP: 0018:ffff888103d8f8e0 EFLAGS: 00010282
>> 	[42046.475688] RAX: 0000000000000000 RBX: ffff88802d9ce370 RCX: ffffffffb7224ca2
>> 	[42046.475689] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88811960dfac
>> 	[42046.475691] RBP: ffff888103d8fa98 R08: ffffed10232c24c1 R09: ffffed10232c24c0
>> 	[42046.475693] R10: ffff888025c94aeb R11: ffffed10232c24c1 R12: 0000000000a5054e
>> 	[42046.475694] R13: 0000000000003000 R14: 000000000000214a R15: ffff888103d8fa70
>> 	[42046.475696] FS:  0000000000000000(0000) GS:ffff888119400000(0000) knlGS:0000000000000000
>> 	[42046.475698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> 	[42046.475700] CR2: 00007f1b84271718 CR3: 00000000a3636001 CR4: 00000000001606f0
>> 	[42046.475702] Call Trace:
>> 	[42046.475707]  ? trace_hardirqs_off+0x24/0x120
>> 	[42046.475714]  ? update_block_group+0x6d0/0x6d0
>> 	[42046.475718]  ? __lock_acquire+0xf8/0x26e0
>> 	[42046.475721]  ? __kasan_slab_free+0x14d/0x230
>> 	[42046.475724]  ? btrfs_delayed_ref_lock+0x3e/0x100
>> 	[42046.475727]  ? __lock_acquire+0xf8/0x26e0
>> 	[42046.475730]  ? kthread+0x1a9/0x200
>> 	[42046.475733]  ? debug_show_all_locks+0x210/0x210
>> 	[42046.475737]  ? __btrfs_run_delayed_refs+0xce/0x210
>> 	[42046.475742]  ? debug_show_all_locks+0x210/0x210
>> 	[42046.475747]  run_delayed_data_ref+0x15a/0x400
>> 	[42046.475752]  ? alloc_reserved_file_extent+0x520/0x520
>> 	[42046.475756]  ? rb_next+0x25/0x90
>> 	[42046.475761]  run_one_delayed_ref+0x71/0xe0
>> 	[42046.475765]  btrfs_run_delayed_refs_for_head+0x284/0x580
>> 	[42046.475772]  __btrfs_run_delayed_refs+0xeb/0x210
>> 	[42046.475777]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
>> 	[42046.475781]  ? debug_show_all_locks+0x210/0x210
>> 	[42046.475786]  btrfs_run_delayed_refs+0xc0/0x260
>> 	[42046.475792]  btrfs_commit_transaction+0xf0/0x10f0
>> 	[42046.475795]  ? do_raw_spin_unlock+0xa8/0x140
>> 	[42046.475799]  ? _raw_spin_unlock+0x27/0x40
>> 	[42046.475803]  ? btrfs_record_root_in_trans+0x24/0xb0
>> 	[42046.475807]  ? btrfs_apply_pending_changes+0xb0/0xb0
>> 	[42046.475809]  ? start_transaction+0x14a/0x760
>> 	[42046.475816]  transaction_kthread+0x218/0x250
>> 	[42046.475822]  kthread+0x1a9/0x200
>> 	[42046.475824]  ? btrfs_cleanup_transaction+0xb80/0xb80
>> 	[42046.475827]  ? kthread_park+0xb0/0xb0
>> 	[42046.475831]  ret_from_fork+0x3a/0x50
>> 	[42046.475837] irq event stamp: 0
>> 	[42046.475839] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
>> 	[42046.475843] hardirqs last disabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
>> 	[42046.475846] softirqs last  enabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
>> 	[42046.475847] softirqs last disabled at (0): [<0000000000000000>]           (null)
>> 	[42046.475849] ---[ end trace b02d556137ea688c ]---
>> 	[42046.475852] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
>> 	[42046.475859] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
>> 	[42046.475878] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
>> 	[42046.475905] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
>>
>> I noticed that during the few minutes when the filesystem was running,
>> the "global reserve" number in "btrfs fi usage" output kept going up to
>> 503M or so, then the transaction failed.
>>
>> I then applied this patch to the kernel (5.0.8):
>>
>> 	diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> 	index 83791d13c204..59ba456e2314 100644
>> 	--- a/fs/btrfs/extent-tree.c
>> 	+++ b/fs/btrfs/extent-tree.c
>> 	@@ -5790,7 +5790,7 @@ static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
>> 		spin_lock(&sinfo->lock);
>> 		spin_lock(&block_rsv->lock);
>>
>> 	-       block_rsv->size = min_t(u64, num_bytes, SZ_512M);
>> 	+       block_rsv->size = min_t(u64, num_bytes, SZ_1G);
>>
>> 		if (block_rsv->reserved < block_rsv->size) {
>> 			num_bytes = btrfs_space_info_used(sinfo, true);
>>
>> Then mounted the filesystem.
>>
>> With the larger reserved block size, btrfs-cleaner was able to get
>> past its earlier failures, allocating 3GB of new metadata chunks in the
>> first transaction after it started deleting snapshots.  After this, the
>> filesystem deleted some more snapshots (there are currently 2470 deleted
>> snapshots remaining on this filesystem).
>>
>> This isn't really a solution, though, for two reasons:
>>
>> 1.  The commit that adds the limit on block_rsv->size (fdf30d1c1b3
>> "Btrfs: limit the global reserve to 512mb") points out:
>>
>>     A user reported a problem where he was getting early ENOSPC with
>>     hundreds of gigs of free data space and 6 gigs of free metadata space.
>>     This is because the global block reserve was taking up the entire
>>     free metadata space.  This is ridiculous, we have infrastructure in
>>     place to throttle if we start using too much of the global reserve,
>>     so instead of letting it get this huge just limit it to 512mb so that
>>     users can still get work done.  This allowed the user to complete
>>     his rsync without issues. 
>>
>> In other words, something should be handling the case when reserved
>> space runs out, and that something is failing.
>>
>> 2.  The 1G reserved limit still isn't sufficient.  The filesystem deletes
>> a few dozen snapshots, then fails again later:
>>
>> 	[ 3705.477036] CPU: 2 PID: 8388 Comm: btrfs-transacti Tainted: G    B             5.0.9-zb64-b6eed6abc880+ #3
>> 	[ 3705.481510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>> 	[ 3705.484038] RIP: 0010:__btrfs_free_extent+0x364/0xf60
>> 	[ 3705.485061] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 80 85 be 8b e8 fc 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
>> 	[ 3705.488046] RSP: 0018:ffff8881c29cf8e0 EFLAGS: 00010282
>> 	[ 3705.488868] RAX: 0000000000000000 RBX: ffff888121dbdf20 RCX: ffffffff8a224a92
>> 	[ 3705.490129] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8881f5e0dfac
>> 	[ 3705.491246] RBP: ffff8881c29cfa98 R08: ffffed103ebc24c1 R09: ffffed103ebc24c0
>> 	[ 3705.492376] R10: ffff88819e6d430b R11: ffffed103ebc24c1 R12: 00000000009ecb21
>> 	[ 3705.493501] R13: 0000000000040000 R14: 00000000000021b3 R15: ffff8881c29cfa70
>> 	[ 3705.494619] FS:  0000000000000000(0000) GS:ffff8881f5c00000(0000) knlGS:0000000000000000
>> 	[ 3705.495883] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> 	[ 3705.496781] CR2: 0000556bc833ec70 CR3: 00000001ef9e0001 CR4: 00000000001606e0
>> 	[ 3705.497872] Call Trace:
>> 	[ 3705.498279]  ? update_block_group+0x6d0/0x6d0
>> 	[ 3705.498950]  ? btrfs_run_delayed_refs_for_head+0x352/0x580
>> 	[ 3705.499816]  ? kthread+0x1a9/0x200
>> 	[ 3705.500379]  ? debug_show_all_locks+0x210/0x210
>> 	[ 3705.501078]  ? __btrfs_run_delayed_refs+0xce/0x210
>> 	[ 3705.501813]  ? debug_show_all_locks+0x210/0x210
>> 	[ 3705.502516]  run_delayed_data_ref+0x15a/0x400
>> 	[ 3705.503193]  ? alloc_reserved_file_extent+0x520/0x520
>> 	[ 3705.503968]  ? rb_next+0x3c/0x90
>> 	[ 3705.504479]  run_one_delayed_ref+0x71/0xe0
>> 	[ 3705.505116]  btrfs_run_delayed_refs_for_head+0x284/0x580
>> 	[ 3705.505935]  __btrfs_run_delayed_refs+0xeb/0x210
>> 	[ 3705.506649]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
>> 	[ 3705.507495]  ? debug_show_all_locks+0x210/0x210
>> 	[ 3705.508201]  btrfs_run_delayed_refs+0xc0/0x260
>> 	[ 3705.508912]  btrfs_commit_transaction+0xf0/0x10f0
>> 	[ 3705.509654]  ? do_raw_spin_unlock+0xa8/0x140
>> 	[ 3705.510343]  ? _raw_spin_unlock+0x27/0x40
>> 	[ 3705.510979]  ? btrfs_record_root_in_trans+0x24/0xb0
>> 	[ 3705.511748]  ? btrfs_apply_pending_changes+0xb0/0xb0
>> 	[ 3705.512533]  ? start_transaction+0x14a/0x760
>> 	[ 3705.513216]  transaction_kthread+0x218/0x250
>> 	[ 3705.513900]  kthread+0x1a9/0x200
>> 	[ 3705.514425]  ? btrfs_cleanup_transaction+0xb80/0xb80
>> 	[ 3705.515233]  ? kthread_park+0xb0/0xb0
>> 	[ 3705.515829]  ret_from_fork+0x3a/0x50
>> 	[ 3705.516408] irq event stamp: 0
>> 	[ 3705.516898] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
>> 	[ 3705.518054] hardirqs last disabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
>> 	[ 3705.519419] softirqs last  enabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
>> 	[ 3705.520779] softirqs last disabled at (0): [<0000000000000000>]           (null)
>> 	[ 3705.521944] ---[ end trace 3a5f720e889bc0d3 ]---
>> 	[ 3705.522713] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
>> 	[ 3705.524069] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
>>
>> If I increase the reserved limit again to 2G, btrfs-cleaner gets past
>> this shapshot (so far).  It takes more than 10 minutes to delete some
>> of the snapshots, so this will take a while to finish all 2470.
>>
>> The current btrfs fi usage output worries me:
>>
>> 	Overall:
>> 	    Device size:                   2.02TiB
>> 	    Device allocated:              2.01TiB
>> 	    Device unallocated:           10.03GiB
>> 	    Device missing:                  0.00B
>> 	    Used:                          1.70TiB
>> 	    Free (estimated):            296.02GiB      (min: 291.01GiB)
>> 	    Data ratio:                       1.00
>> 	    Metadata ratio:                   2.00
>> 	    Global reserve:                2.00GiB      (used: 864.00KiB)
>>
>> 	Data,single: Size:1.81TiB, Used:1.53TiB
>> 	   /dev/mapper/vgnebtest-tvdb    928.97GiB
>> 	   /dev/mapper/vgnebtest-tvdc    929.00GiB
>>
>> 	Metadata,RAID1: Size:99.99GiB, Used:82.29GiB
>> 	   /dev/mapper/vgnebtest-lvol1     5.00GiB
>> 	   /dev/mapper/vgnebtest-lvol2     5.00GiB
>> 	   /dev/mapper/vgnebtest-tvdb     94.99GiB
>> 	   /dev/mapper/vgnebtest-tvdc     94.99GiB
>>
>> 	System,RAID1: Size:8.00MiB, Used:240.00KiB
>> 	   /dev/mapper/vgnebtest-tvdb      8.00MiB
>> 	   /dev/mapper/vgnebtest-tvdc      8.00MiB
>>
>> 	Unallocated:
>> 	   /dev/mapper/vgnebtest-lvol1     5.00GiB
>> 	   /dev/mapper/vgnebtest-lvol2     5.00GiB
>> 	   /dev/mapper/vgnebtest-tvdb     30.00MiB
>> 	   /dev/mapper/vgnebtest-tvdc      1.00MiB
>>
>> Note "lvol1" and "lvol2" were added after the initial failure.  
>>
>> There is clearly a lot of metadata space available, but the filesystem still
>> intermittently hits ENOSPC while trying to free space.
>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11
  2019-06-24 10:01   ` Martin Raiber
@ 2019-06-24 11:36     ` Zygo Blaxell
  0 siblings, 0 replies; 7+ messages in thread
From: Zygo Blaxell @ 2019-06-24 11:36 UTC (permalink / raw)
  To: Martin Raiber; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 17097 bytes --]

On Mon, Jun 24, 2019 at 10:01:31AM +0000, Martin Raiber wrote:
> I've fixed the same problem(s) by increasing the global metadata size as
> well. Though I haven't encountered them since Josef Bacik's block rsv
> rework in 5.0.

Now _that_ is interesting--for me, this issue never happened in five
years _before_ 5.0.

> Another problem with increasing the global metadata size is, that I
> think it is the only way dirty metadata is throttled. If increased too
> much (as a percentage of RAM) the system goes OOM depending on work
> load. As far as I can see dirty metdata isn't included into the
> dirty_ratio calculation as well, causing issues on that front as well.
> Another thing that I think helps is to run with "nodatasum" -- probably
> because then less metadata needs to be changed when deleting.
> 
> On 23.06.2019 16:14 Zygo Blaxell wrote:
> > On Tue, Apr 23, 2019 at 07:06:51PM -0400, Zygo Blaxell wrote:
> >> I had a test filesystem that ran out of unallocated space, then ran
> >> out of metadata space during a snapshot delete, and forced readonly.
> >> The workload before the failure was a lot of rsync and bees dedupe
> >> combined with random snapshot creates and deletes.
> > Had this happen again on a production filesystem, this time on 5.1.11,
> > and it happened during orphan inode cleanup instead of snapshot delete:
> >
> > 	[14303.076134][T20882] BTRFS: error (device dm-21) in add_to_free_space_tree:1037: errno=-28 No space left
> > 	[14303.076144][T20882] BTRFS: error (device dm-21) in __btrfs_free_extent:7196: errno=-28 No space left
> > 	[14303.076157][T20882] BTRFS: error (device dm-21) in btrfs_run_delayed_refs:3008: errno=-28 No space left
> > 	[14303.076203][T20882] BTRFS error (device dm-21): Error removing orphan entry, stopping orphan cleanup
> > 	[14303.076210][T20882] BTRFS error (device dm-21): could not do orphan cleanup -22
> > 	[14303.076281][T20882] BTRFS error (device dm-21): commit super ret -30
> > 	[14303.357337][T20882] BTRFS error (device dm-21): open_ctree failed
> >
> > Same fix:  I bumped the reserved size limit from 512M to 2G and mounted
> > normally.  (OK, technically, I booted my old 5.0.21 kernel--but my 5.0.21
> > kernel has the 2G reserved space patch below in it.)
> >
> > I've not been able to repeat this ENOSPC behavior under test conditions
> > in the last two months of trying, but it's now happened twice in different
> > places, so it has non-zero repeatability.
> >
> >> I tried the usual fix strategies:
> >>
> >> 	1.  Immediately after mount, try to balance to free space for
> >> 	metadata
> >>
> >> 	2.  Immediately after mount, add additional disks to provide
> >> 	unallocated space for metadata
> >>
> >> 	3.  Mount -o nossd to increase metadata density
> >>
> >> #3 had no effect.  #1 failed consistently.
> >>
> >> #2 was successful, but the additional space was not used because
> >> btrfs couldn't allocate chunks for metadata because it ran out of
> >> metadata space for new metadata chunks.
> >>
> >> When btrfs-cleaner tried to remove the first pending deleted snapshot,
> >> it started a transaction that failed due to lack of metadata space.
> >> Since the transaction failed, the filesystem reverts to its earlier state,
> >> and exactly the same thing happens on the next mount.  The 'btrfs dev
> >> add' in #2 is successful only if it is executed immediately after mount,
> >> before the btrfs-cleaner thread wakes up.
> >>
> >> Here's what the kernel said during one of the attempts:
> >>
> >> 	[41263.822252] BTRFS info (device dm-3): use zstd compression, level 0
> >> 	[41263.825135] BTRFS info (device dm-3): using free space tree
> >> 	[41263.827319] BTRFS info (device dm-3): has skinny extents
> >> 	[42046.463356] ------------[ cut here ]------------
> >> 	[42046.463387] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> 	[42046.463404] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> 	[42046.463407] BTRFS info (device dm-3): forced readonly
> >> 	[42046.463414] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >> 	[42046.463429] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >> 	[42046.463548] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >> 	[42046.471363] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >> 	[42046.471475] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >> 	[42046.471506] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >> 	[42046.473672] BTRFS: error (device dm-3) in btrfs_drop_snapshot:9489: errno=-28 No space left
> >> 	[42046.475643] WARNING: CPU: 0 PID: 10187 at fs/btrfs/extent-tree.c:7056 __btrfs_free_extent+0x364/0xf60
> >> 	[42046.475645] Modules linked in: mq_deadline bfq dm_cache_smq dm_cache dm_persistent_data dm_bio_prison dm_bufio joydev ppdev crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel dm_mod snd_pcm aesni_intel aes_x86_64 snd_timer crypto_simd cryptd glue_helper sr_mod snd cdrom psmouse sg soundcore input_leds pcspkr serio_raw ide_pci_generic i2c_piix4 bochs_drm parport_pc piix rtc_cmos floppy parport pcc_cpufreq ide_core qemu_fw_cfg evbug evdev ip_tables x_tables ipv6 crc_ccitt autofs4
> >> 	[42046.475677] CPU: 0 PID: 10187 Comm: btrfs-transacti Tainted: G    B   W         5.0.8-zb64-10a85e8a1569+ #1
> >> 	[42046.475678] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >> 	[42046.475681] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> >> 	[42046.475684] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 00 85 be b8 e8 3c 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> >> 	[42046.475685] RSP: 0018:ffff888103d8f8e0 EFLAGS: 00010282
> >> 	[42046.475688] RAX: 0000000000000000 RBX: ffff88802d9ce370 RCX: ffffffffb7224ca2
> >> 	[42046.475689] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff88811960dfac
> >> 	[42046.475691] RBP: ffff888103d8fa98 R08: ffffed10232c24c1 R09: ffffed10232c24c0
> >> 	[42046.475693] R10: ffff888025c94aeb R11: ffffed10232c24c1 R12: 0000000000a5054e
> >> 	[42046.475694] R13: 0000000000003000 R14: 000000000000214a R15: ffff888103d8fa70
> >> 	[42046.475696] FS:  0000000000000000(0000) GS:ffff888119400000(0000) knlGS:0000000000000000
> >> 	[42046.475698] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> 	[42046.475700] CR2: 00007f1b84271718 CR3: 00000000a3636001 CR4: 00000000001606f0
> >> 	[42046.475702] Call Trace:
> >> 	[42046.475707]  ? trace_hardirqs_off+0x24/0x120
> >> 	[42046.475714]  ? update_block_group+0x6d0/0x6d0
> >> 	[42046.475718]  ? __lock_acquire+0xf8/0x26e0
> >> 	[42046.475721]  ? __kasan_slab_free+0x14d/0x230
> >> 	[42046.475724]  ? btrfs_delayed_ref_lock+0x3e/0x100
> >> 	[42046.475727]  ? __lock_acquire+0xf8/0x26e0
> >> 	[42046.475730]  ? kthread+0x1a9/0x200
> >> 	[42046.475733]  ? debug_show_all_locks+0x210/0x210
> >> 	[42046.475737]  ? __btrfs_run_delayed_refs+0xce/0x210
> >> 	[42046.475742]  ? debug_show_all_locks+0x210/0x210
> >> 	[42046.475747]  run_delayed_data_ref+0x15a/0x400
> >> 	[42046.475752]  ? alloc_reserved_file_extent+0x520/0x520
> >> 	[42046.475756]  ? rb_next+0x25/0x90
> >> 	[42046.475761]  run_one_delayed_ref+0x71/0xe0
> >> 	[42046.475765]  btrfs_run_delayed_refs_for_head+0x284/0x580
> >> 	[42046.475772]  __btrfs_run_delayed_refs+0xeb/0x210
> >> 	[42046.475777]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
> >> 	[42046.475781]  ? debug_show_all_locks+0x210/0x210
> >> 	[42046.475786]  btrfs_run_delayed_refs+0xc0/0x260
> >> 	[42046.475792]  btrfs_commit_transaction+0xf0/0x10f0
> >> 	[42046.475795]  ? do_raw_spin_unlock+0xa8/0x140
> >> 	[42046.475799]  ? _raw_spin_unlock+0x27/0x40
> >> 	[42046.475803]  ? btrfs_record_root_in_trans+0x24/0xb0
> >> 	[42046.475807]  ? btrfs_apply_pending_changes+0xb0/0xb0
> >> 	[42046.475809]  ? start_transaction+0x14a/0x760
> >> 	[42046.475816]  transaction_kthread+0x218/0x250
> >> 	[42046.475822]  kthread+0x1a9/0x200
> >> 	[42046.475824]  ? btrfs_cleanup_transaction+0xb80/0xb80
> >> 	[42046.475827]  ? kthread_park+0xb0/0xb0
> >> 	[42046.475831]  ret_from_fork+0x3a/0x50
> >> 	[42046.475837] irq event stamp: 0
> >> 	[42046.475839] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> >> 	[42046.475843] hardirqs last disabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
> >> 	[42046.475846] softirqs last  enabled at (0): [<ffffffffb70efe95>] copy_process.part.7+0xad5/0x3b80
> >> 	[42046.475847] softirqs last disabled at (0): [<0000000000000000>]           (null)
> >> 	[42046.475849] ---[ end trace b02d556137ea688c ]---
> >> 	[42046.475852] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> 	[42046.475859] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >> 	[42046.475878] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10517: errno=-28 No space left
> >> 	[42046.475905] BTRFS: error (device dm-3) in btrfs_create_pending_block_groups:10520: errno=-28 No space left
> >>
> >> I noticed that during the few minutes when the filesystem was running,
> >> the "global reserve" number in "btrfs fi usage" output kept going up to
> >> 503M or so, then the transaction failed.
> >>
> >> I then applied this patch to the kernel (5.0.8):
> >>
> >> 	diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> 	index 83791d13c204..59ba456e2314 100644
> >> 	--- a/fs/btrfs/extent-tree.c
> >> 	+++ b/fs/btrfs/extent-tree.c
> >> 	@@ -5790,7 +5790,7 @@ static void update_global_block_rsv(struct btrfs_fs_info *fs_info)
> >> 		spin_lock(&sinfo->lock);
> >> 		spin_lock(&block_rsv->lock);
> >>
> >> 	-       block_rsv->size = min_t(u64, num_bytes, SZ_512M);
> >> 	+       block_rsv->size = min_t(u64, num_bytes, SZ_1G);
> >>
> >> 		if (block_rsv->reserved < block_rsv->size) {
> >> 			num_bytes = btrfs_space_info_used(sinfo, true);
> >>
> >> Then mounted the filesystem.
> >>
> >> With the larger reserved block size, btrfs-cleaner was able to get
> >> past its earlier failures, allocating 3GB of new metadata chunks in the
> >> first transaction after it started deleting snapshots.  After this, the
> >> filesystem deleted some more snapshots (there are currently 2470 deleted
> >> snapshots remaining on this filesystem).
> >>
> >> This isn't really a solution, though, for two reasons:
> >>
> >> 1.  The commit that adds the limit on block_rsv->size (fdf30d1c1b3
> >> "Btrfs: limit the global reserve to 512mb") points out:
> >>
> >>     A user reported a problem where he was getting early ENOSPC with
> >>     hundreds of gigs of free data space and 6 gigs of free metadata space.
> >>     This is because the global block reserve was taking up the entire
> >>     free metadata space.  This is ridiculous, we have infrastructure in
> >>     place to throttle if we start using too much of the global reserve,
> >>     so instead of letting it get this huge just limit it to 512mb so that
> >>     users can still get work done.  This allowed the user to complete
> >>     his rsync without issues. 
> >>
> >> In other words, something should be handling the case when reserved
> >> space runs out, and that something is failing.
> >>
> >> 2.  The 1G reserved limit still isn't sufficient.  The filesystem deletes
> >> a few dozen snapshots, then fails again later:
> >>
> >> 	[ 3705.477036] CPU: 2 PID: 8388 Comm: btrfs-transacti Tainted: G    B             5.0.9-zb64-b6eed6abc880+ #3
> >> 	[ 3705.481510] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> >> 	[ 3705.484038] RIP: 0010:__btrfs_free_extent+0x364/0xf60
> >> 	[ 3705.485061] Code: 50 f0 48 0f ba a8 90 22 00 00 02 72 1f 8b 85 88 fe ff ff 83 f8 fb 0f 84 59 04 00 00 89 c6 48 c7 c7 80 85 be 8b e8 fc 1b 9b ff <0f> 0b 8b 8d 88 fe ff ff 48 8b bd 90 fe ff ff ba 90 1b 00 00 48 c7
> >> 	[ 3705.488046] RSP: 0018:ffff8881c29cf8e0 EFLAGS: 00010282
> >> 	[ 3705.488868] RAX: 0000000000000000 RBX: ffff888121dbdf20 RCX: ffffffff8a224a92
> >> 	[ 3705.490129] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff8881f5e0dfac
> >> 	[ 3705.491246] RBP: ffff8881c29cfa98 R08: ffffed103ebc24c1 R09: ffffed103ebc24c0
> >> 	[ 3705.492376] R10: ffff88819e6d430b R11: ffffed103ebc24c1 R12: 00000000009ecb21
> >> 	[ 3705.493501] R13: 0000000000040000 R14: 00000000000021b3 R15: ffff8881c29cfa70
> >> 	[ 3705.494619] FS:  0000000000000000(0000) GS:ffff8881f5c00000(0000) knlGS:0000000000000000
> >> 	[ 3705.495883] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> 	[ 3705.496781] CR2: 0000556bc833ec70 CR3: 00000001ef9e0001 CR4: 00000000001606e0
> >> 	[ 3705.497872] Call Trace:
> >> 	[ 3705.498279]  ? update_block_group+0x6d0/0x6d0
> >> 	[ 3705.498950]  ? btrfs_run_delayed_refs_for_head+0x352/0x580
> >> 	[ 3705.499816]  ? kthread+0x1a9/0x200
> >> 	[ 3705.500379]  ? debug_show_all_locks+0x210/0x210
> >> 	[ 3705.501078]  ? __btrfs_run_delayed_refs+0xce/0x210
> >> 	[ 3705.501813]  ? debug_show_all_locks+0x210/0x210
> >> 	[ 3705.502516]  run_delayed_data_ref+0x15a/0x400
> >> 	[ 3705.503193]  ? alloc_reserved_file_extent+0x520/0x520
> >> 	[ 3705.503968]  ? rb_next+0x3c/0x90
> >> 	[ 3705.504479]  run_one_delayed_ref+0x71/0xe0
> >> 	[ 3705.505116]  btrfs_run_delayed_refs_for_head+0x284/0x580
> >> 	[ 3705.505935]  __btrfs_run_delayed_refs+0xeb/0x210
> >> 	[ 3705.506649]  ? btrfs_run_delayed_refs_for_head+0x580/0x580
> >> 	[ 3705.507495]  ? debug_show_all_locks+0x210/0x210
> >> 	[ 3705.508201]  btrfs_run_delayed_refs+0xc0/0x260
> >> 	[ 3705.508912]  btrfs_commit_transaction+0xf0/0x10f0
> >> 	[ 3705.509654]  ? do_raw_spin_unlock+0xa8/0x140
> >> 	[ 3705.510343]  ? _raw_spin_unlock+0x27/0x40
> >> 	[ 3705.510979]  ? btrfs_record_root_in_trans+0x24/0xb0
> >> 	[ 3705.511748]  ? btrfs_apply_pending_changes+0xb0/0xb0
> >> 	[ 3705.512533]  ? start_transaction+0x14a/0x760
> >> 	[ 3705.513216]  transaction_kthread+0x218/0x250
> >> 	[ 3705.513900]  kthread+0x1a9/0x200
> >> 	[ 3705.514425]  ? btrfs_cleanup_transaction+0xb80/0xb80
> >> 	[ 3705.515233]  ? kthread_park+0xb0/0xb0
> >> 	[ 3705.515829]  ret_from_fork+0x3a/0x50
> >> 	[ 3705.516408] irq event stamp: 0
> >> 	[ 3705.516898] hardirqs last  enabled at (0): [<0000000000000000>]           (null)
> >> 	[ 3705.518054] hardirqs last disabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
> >> 	[ 3705.519419] softirqs last  enabled at (0): [<ffffffff8a0efe65>] copy_process.part.7+0xad5/0x3b80
> >> 	[ 3705.520779] softirqs last disabled at (0): [<0000000000000000>]           (null)
> >> 	[ 3705.521944] ---[ end trace 3a5f720e889bc0d3 ]---
> >> 	[ 3705.522713] BTRFS: error (device dm-3) in __btrfs_free_extent:7056: errno=-28 No space left
> >> 	[ 3705.524069] BTRFS: error (device dm-3) in btrfs_run_delayed_refs:3011: errno=-28 No space left
> >>
> >> If I increase the reserved limit again to 2G, btrfs-cleaner gets past
> >> this shapshot (so far).  It takes more than 10 minutes to delete some
> >> of the snapshots, so this will take a while to finish all 2470.
> >>
> >> The current btrfs fi usage output worries me:
> >>
> >> 	Overall:
> >> 	    Device size:                   2.02TiB
> >> 	    Device allocated:              2.01TiB
> >> 	    Device unallocated:           10.03GiB
> >> 	    Device missing:                  0.00B
> >> 	    Used:                          1.70TiB
> >> 	    Free (estimated):            296.02GiB      (min: 291.01GiB)
> >> 	    Data ratio:                       1.00
> >> 	    Metadata ratio:                   2.00
> >> 	    Global reserve:                2.00GiB      (used: 864.00KiB)
> >>
> >> 	Data,single: Size:1.81TiB, Used:1.53TiB
> >> 	   /dev/mapper/vgnebtest-tvdb    928.97GiB
> >> 	   /dev/mapper/vgnebtest-tvdc    929.00GiB
> >>
> >> 	Metadata,RAID1: Size:99.99GiB, Used:82.29GiB
> >> 	   /dev/mapper/vgnebtest-lvol1     5.00GiB
> >> 	   /dev/mapper/vgnebtest-lvol2     5.00GiB
> >> 	   /dev/mapper/vgnebtest-tvdb     94.99GiB
> >> 	   /dev/mapper/vgnebtest-tvdc     94.99GiB
> >>
> >> 	System,RAID1: Size:8.00MiB, Used:240.00KiB
> >> 	   /dev/mapper/vgnebtest-tvdb      8.00MiB
> >> 	   /dev/mapper/vgnebtest-tvdc      8.00MiB
> >>
> >> 	Unallocated:
> >> 	   /dev/mapper/vgnebtest-lvol1     5.00GiB
> >> 	   /dev/mapper/vgnebtest-lvol2     5.00GiB
> >> 	   /dev/mapper/vgnebtest-tvdb     30.00MiB
> >> 	   /dev/mapper/vgnebtest-tvdc      1.00MiB
> >>
> >> Note "lvol1" and "lvol2" were added after the initial failure.  
> >>
> >> There is clearly a lot of metadata space available, but the filesystem still
> >> intermittently hits ENOSPC while trying to free space.
> >
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Global reserve and ENOSPC while deleting orphan inodes on 5.1.14
  2019-06-23 14:14 ` Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11 Zygo Blaxell
  2019-06-24 10:01   ` Martin Raiber
@ 2019-07-01  3:07   ` Zygo Blaxell
  1 sibling, 0 replies; 7+ messages in thread
From: Zygo Blaxell @ 2019-07-01  3:07 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 2420 bytes --]

On Sun, Jun 23, 2019 at 10:14:34AM -0400, Zygo Blaxell wrote:
> On Tue, Apr 23, 2019 at 07:06:51PM -0400, Zygo Blaxell wrote:
> > I had a test filesystem that ran out of unallocated space, then ran
> > out of metadata space during a snapshot delete, and forced readonly.
> > The workload before the failure was a lot of rsync and bees dedupe
> > combined with random snapshot creates and deletes.
> 
> Had this happen again on a production filesystem, this time on 5.1.11,
> and it happened during orphan inode cleanup instead of snapshot delete:
> 
> 	[14303.076134][T20882] BTRFS: error (device dm-21) in add_to_free_space_tree:1037: errno=-28 No space left
> 	[14303.076144][T20882] BTRFS: error (device dm-21) in __btrfs_free_extent:7196: errno=-28 No space left
> 	[14303.076157][T20882] BTRFS: error (device dm-21) in btrfs_run_delayed_refs:3008: errno=-28 No space left
> 	[14303.076203][T20882] BTRFS error (device dm-21): Error removing orphan entry, stopping orphan cleanup
> 	[14303.076210][T20882] BTRFS error (device dm-21): could not do orphan cleanup -22
> 	[14303.076281][T20882] BTRFS error (device dm-21): commit super ret -30
> 	[14303.357337][T20882] BTRFS error (device dm-21): open_ctree failed
> 
> Same fix:  I bumped the reserved size limit from 512M to 2G and mounted
> normally.  (OK, technically, I booted my old 5.0.21 kernel--but my 5.0.21
> kernel has the 2G reserved space patch below in it.)

It turns out that it was the boot with 5.0.21 that fixed this, not the
increase reserved limit patch.  This filesystem's global reserve is is
only 1.71G (it's a 2x150G btrfs raid1), so raising the limit to 2G has
no effect on 5.1.x.

I was able to mount the filesystem with 5.0.21, but not with 5.1.14--even
after successfully mounting with 5.0.21 and finishing orphan inode
cleanup, mounting on 5.1.14 or 5.1.15 fails.

I tried a few other things:  balancing metadata (Size:19.04GiB,
Used:7.54GiB) and mounting with 'nossd' (which somehow became 'ssd'
after the fact?).  Eventually I was able to mount the filesystem with
kernel 5.1.15 and the 'nossd' option.  I can now wait and see how long
5.1.15 lasts until the next similar event occurs.

The full mount options are:  rw,noatime,degraded,compress=zstd:3,ssd,flushoncommit,space_cache=v2,skip_balance

(filesystem isn't degraded in this case, I just have that option in case
a disk fails some day)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-01  3:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-23 23:06 Global reserve and ENOSPC while deleting snapshots on 5.0.9 Zygo Blaxell
2019-04-24  2:57 ` Paul Jones
2019-04-24  3:58   ` Zygo Blaxell
2019-06-23 14:14 ` Global reserve and ENOSPC while deleting snapshots on 5.0.9 - still happens on 5.1.11 Zygo Blaxell
2019-06-24 10:01   ` Martin Raiber
2019-06-24 11:36     ` Zygo Blaxell
2019-07-01  3:07   ` Global reserve and ENOSPC while deleting orphan inodes on 5.1.14 Zygo Blaxell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.