On Wed, Apr 24, 2019 at 02:57:47AM +0000, Paul Jones wrote: > > -----Original Message----- > > From: linux-btrfs-owner@vger.kernel.org > owner@vger.kernel.org> On Behalf Of Zygo Blaxell > > Sent: Wednesday, 24 April 2019 9:07 AM > > To: linux-btrfs@vger.kernel.org > > Subject: Global reserve and ENOSPC while deleting snapshots on 5.0.9 > > > > I had a test filesystem that ran out of unallocated space, then ran out of > > metadata space during a snapshot delete, and forced readonly. > > The workload before the failure was a lot of rsync and bees dedupe > > combined with random snapshot creates and deletes. > > > > I tried the usual fix strategies: > > > > 1. Immediately after mount, try to balance to free space for > > metadata > > > > 2. Immediately after mount, add additional disks to provide > > unallocated space for metadata > > > > 3. Mount -o nossd to increase metadata density > > > > #3 had no effect. #1 failed consistently. > > > > #2 was successful, but the additional space was not used because btrfs > > couldn't allocate chunks for metadata because it ran out of metadata space > > for new metadata chunks. > > > > When btrfs-cleaner tried to remove the first pending deleted snapshot, it > > started a transaction that failed due to lack of metadata space. > > Since the transaction failed, the filesystem reverts to its earlier state, and > > exactly the same thing happens on the next mount. The 'btrfs dev add' in #2 > > is successful only if it is executed immediately after mount, before the btrfs- > > cleaner thread wakes up. > > I had a similar problem on iirc 4.20, except that I couldn't get the new devices to add (raid1) before the cleaner thread ran, no matter how fast I added them after mount. > I ended up just commenting out the part that forces the fs to go read only. The cleaner thread exits gracefully (I think?) so then it was no trouble to add the devices. > > Is it still necessary to have the fs go read only like that when it's out of space? It's definitely a good idea to go read only on generic transaction failures. Maybe it's not such a good idea to lump ENOSPC in with other kinds of transaction failure. > Paul. >