From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ip-88.staren.nu ([77.110.19.88]:37421 "EHLO iapetus.neab.net" rhost-flags-OK-FAIL-OK-OK) by vger.kernel.org with ESMTP id S1751828AbdHAPY1 (ORCPT ); Tue, 1 Aug 2017 11:24:27 -0400 Date: Tue, 1 Aug 2017 17:24:24 +0200 (CEST) From: pwm To: "Austin S. Hemmelgarn" cc: Hugo Mills , linux-btrfs@vger.kernel.org Subject: Re: Massive loss of disk space In-Reply-To: <7f2b5c3a-2f5c-e857-d2dc-3ea16b58ecaf@gmail.com> Message-ID: References: <20170801122039.GX7140@carfax.org.uk> <7f2b5c3a-2f5c-e857-d2dc-3ea16b58ecaf@gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; format=flowed; charset=US-ASCII Sender: linux-btrfs-owner@vger.kernel.org List-ID: Yes, the test code is as below - trying to match what snapraid tries to do: #include #include #include #include #include #include #include int main() { int fd = open("/mnt/snap_04/snapraid.parity",O_NOFOLLOW|O_RDWR); if (fd < 0) { printf("Failed opening parity file [%s]\n",strerror(errno)); return 1; } off_t filesize = 5151751667712ull; int res; struct stat statbuf; if (fstat(fd,&statbuf)) { printf("Failed stat [%s]\n",strerror(errno)); close(fd); return 1; } printf("Original file size is %llu bytes\n",i (unsigned long long)statbuf.st_size); printf("Trying to grow file to %llu bytes\n",i (unsigned long long)filesize); res = fallocate(fd,0,0,filesize); if (res) { printf("Failed fallocate [%s]\n",strerror(errno)); close(fd); return 1; } if (fsync(fd)) { printf("Failed fsync [%s]\n",fsync(errno)); close(fd); return 1; } close(fd); return 0; } So the call doesn't make use of the previous file size as offset for the extension. int fallocate(int fd, int mode, off_t offset, off_t len); What you are implying here is that if the fallocate() call is modified to: res = fallocate(fd,0,old_size,new_size-old_size); then everything should work as expected? /Per W On Tue, 1 Aug 2017, Austin S. Hemmelgarn wrote: > On 2017-08-01 10:47, Austin S. Hemmelgarn wrote: >> On 2017-08-01 10:39, pwm wrote: >>> Thanks for the links and suggestions. >>> >>> I did try your suggestions but it didn't solve the underlying problem. >>> >>> >>> >>> pwm@europium:~$ sudo btrfs balance start -v -dusage=20 /mnt/snap_04 >>> Dumping filters: flags 0x1, state 0x0, force is off >>> DATA (flags 0x2): balancing, usage=20 >>> Done, had to relocate 4596 out of 9317 chunks >>> >>> >>> pwm@europium:~$ sudo btrfs balance start -mconvert=dup,soft /mnt/snap_04/ >>> Done, had to relocate 2 out of 4721 chunks >>> >>> >>> pwm@europium:~$ sudo btrfs fi df /mnt/snap_04 >>> Data, single: total=4.60TiB, used=4.59TiB >>> System, DUP: total=40.00MiB, used=512.00KiB >>> Metadata, DUP: total=6.50GiB, used=4.81GiB >>> GlobalReserve, single: total=512.00MiB, used=0.00B >>> >>> >>> pwm@europium:~$ sudo btrfs fi show /mnt/snap_04 >>> Label: 'snap_04' uuid: c46df8fa-03db-4b32-8beb-5521d9931a31 >>> Total devices 1 FS bytes used 4.60TiB >>> devid 1 size 9.09TiB used 4.61TiB path /dev/sdg1 >>> >>> >>> So now device 1 usage is down from 9.09TiB to 4.61TiB. >>> >>> But if I test to fallocate() to grow the large parity file, I directly >>> fail. I wrote a little help program that just focuses on fallocate() >>> instead of having to run snapraid with lots of unknown additional actions >>> being performed. >>> >>> >>> Original file size is 5050486226944 bytes >>> Trying to grow file to 5151751667712 bytes >>> Failed fallocate [No space left on device] >>> >>> >>> >>> And result after shows 'used' have jumped up to 9.09TiB again. >>> >>> root@europium:/mnt# btrfs fi show snap_04 >>> Label: 'snap_04' uuid: c46df8fa-03db-4b32-8beb-5521d9931a31 >>> Total devices 1 FS bytes used 4.60TiB >>> devid 1 size 9.09TiB used 9.09TiB path /dev/sdg1 >>> >>> root@europium:/mnt# btrfs fi df /mnt/snap_04/ >>> Data, single: total=9.08TiB, used=4.59TiB >>> System, DUP: total=40.00MiB, used=992.00KiB >>> Metadata, DUP: total=6.50GiB, used=4.81GiB >>> GlobalReserve, single: total=512.00MiB, used=0.00B >>> >>> >>> It's almost like the file system have decided that it needs to make a >>> snapshot and store two complete copies of the complete file, which is >>> obviously not going to work with a file larger than 50% of the file >>> system. >> I think I _might_ understand what's going on here. Is that test program >> calling fallocate using the desired total size of the file, or just trying >> to allocate the range beyond the end to extend the file? I've seen issues >> with the first case on BTRFS before, and I'm starting to think that it >> might actually be trying to allocate the exact amount of space requested by >> fallocate, even if part of the range is already allocated space. > > OK, I just did a dead simple test by hand, and it looks like I was right. > The method I used to check this is as follows: > 1. Create and mount a reasonably small filesystem (I used an 8G temporary LV > for this, a file would work too though). > 2. Using dd or a similar tool, create a test file that takes up half of the > size of the filesystem. It is important that this _not_ be fallocated, but > just written out. > 3. Use `fallocate -l` to try and extend the size of the file beyond half the > size of the filesystem. > > For BTRFS, this will result in -ENOSPC, while for ext4 and XFS, it will > succeed with no error. Based on this and some low-level inspection, it looks > like BTRFS treats the full range of the fallocate call as unallocated, and > thus is trying to allocate space for regions of that range that are already > allocated. > >>> >>> No issue at all to grow the parity file on the other parity disk. And >>> that's why I wonder if there is some undetected file system corruption. >>> >