* Big disk space usage difference, even after defrag, on identical data @ 2015-04-11 19:59 Gian-Carlo Pascutto 2015-04-13 4:04 ` Zygo Blaxell 2015-04-13 5:06 ` Duncan 0 siblings, 2 replies; 8+ messages in thread From: Gian-Carlo Pascutto @ 2015-04-11 19:59 UTC (permalink / raw) To: linux-btrfs Linux mozwell 3.19.0-trunk-amd64 #1 SMP Debian 3.19.1-1~exp1 (2015-03-08) x86_64 GNU/Linux btrfs-progs v3.19.1 I have a btrfs volume that's been in use for a week or 2. It has about ~560G of uncompressible data (video files, tar.xz, git repos, ...) and ~200G of data that compresses 2:1 with LZO (PostgreSQL db). It's split into 2 subvolumes: ID 257 gen 6550 top level 5 path @db ID 258 gen 6590 top level 5 path @large and mounted like this: /dev/sdc /srv/db btrfs rw,noatime,compress=lzo,space_cache 0 0 /dev/sdc /srv/large btrfs rw,noatime,compress=lzo,space_cache 0 0 du -skh /srv 768G /srv df -h /dev/sdc 1.4T 754G 641G 55% /srv/db /dev/sdc 1.4T 754G 641G 55% /srv/large btrfs fi df /srv/large Data, single: total=808.01GiB, used=749.36GiB System, DUP: total=8.00MiB, used=112.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=3.50GiB, used=1.87GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B So that's a bit bigger than perhaps expected (~750G instead of ~660G+metadata). I thought it might've been related to compress bailing out too easily, but I've done a btrfs fi defragment -r -v -clzo /srv/db /srv/large and this doesn't change anything. I recently copied this data to a new, bigger disk, and the result looks worrying: mount options: /dev/sdd /mnt/large btrfs rw,noatime,compress=lzo,space_cache 0 0 /dev/sdd /mnt/db btrfs rw,noatime,compress=lzo,space_cache 0 0 btrfs fi df Data, single: total=684.00GiB, used=683.00GiB System, DUP: total=8.00MiB, used=96.00KiB System, single: total=4.00MiB, used=0.00B Metadata, DUP: total=3.50GiB, used=2.04GiB Metadata, single: total=8.00MiB, used=0.00B GlobalReserve, single: total=512.00MiB, used=0.00B df /dev/sdd 3.7T 688G 3.0T 19% /mnt/large /dev/sdd 3.7T 688G 3.0T 19% /mnt/db du 767G /mnt That's a 66G difference for the same data with the same compress option. The used size here is much more in line with what I'd have expected given the nature of the data. I would think that compression differences or things like fragmentation or bookending for modified files shouldn't affect this, because the first filesystem has been defragmented/recompressed and didn't shrink. So what can explain this? Where did the 66G go? -- GCP ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Big disk space usage difference, even after defrag, on identical data 2015-04-11 19:59 Big disk space usage difference, even after defrag, on identical data Gian-Carlo Pascutto @ 2015-04-13 4:04 ` Zygo Blaxell 2015-04-13 8:07 ` Duncan 2015-04-13 11:32 ` Gian-Carlo Pascutto 2015-04-13 5:06 ` Duncan 1 sibling, 2 replies; 8+ messages in thread From: Zygo Blaxell @ 2015-04-13 4:04 UTC (permalink / raw) To: Gian-Carlo Pascutto; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 6578 bytes --] On Sat, Apr 11, 2015 at 09:59:50PM +0200, Gian-Carlo Pascutto wrote: > Linux mozwell 3.19.0-trunk-amd64 #1 SMP Debian 3.19.1-1~exp1 > (2015-03-08) x86_64 GNU/Linux > btrfs-progs v3.19.1 > > I have a btrfs volume that's been in use for a week or 2. It has about > ~560G of uncompressible data (video files, tar.xz, git repos, ...) and > ~200G of data that compresses 2:1 with LZO (PostgreSQL db). > > It's split into 2 subvolumes: > ID 257 gen 6550 top level 5 path @db > ID 258 gen 6590 top level 5 path @large > > and mounted like this: > /dev/sdc /srv/db btrfs rw,noatime,compress=lzo,space_cache 0 0 > /dev/sdc /srv/large btrfs rw,noatime,compress=lzo,space_cache 0 0 > > du -skh /srv > 768G /srv > > df -h > /dev/sdc 1.4T 754G 641G 55% /srv/db > /dev/sdc 1.4T 754G 641G 55% /srv/large > > btrfs fi df /srv/large > Data, single: total=808.01GiB, used=749.36GiB > System, DUP: total=8.00MiB, used=112.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, DUP: total=3.50GiB, used=1.87GiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > So that's a bit bigger than perhaps expected (~750G instead of > ~660G+metadata). I thought it might've been related to compress bailing > out too easily, but I've done a > btrfs fi defragment -r -v -clzo /srv/db /srv/large > and this doesn't change anything. > > I recently copied this data to a new, bigger disk, and the result looks > worrying: > > mount options: > /dev/sdd /mnt/large btrfs rw,noatime,compress=lzo,space_cache 0 0 > /dev/sdd /mnt/db btrfs rw,noatime,compress=lzo,space_cache 0 0 > > btrfs fi df > Data, single: total=684.00GiB, used=683.00GiB > System, DUP: total=8.00MiB, used=96.00KiB > System, single: total=4.00MiB, used=0.00B > Metadata, DUP: total=3.50GiB, used=2.04GiB > Metadata, single: total=8.00MiB, used=0.00B > GlobalReserve, single: total=512.00MiB, used=0.00B > > df > /dev/sdd 3.7T 688G 3.0T 19% /mnt/large > /dev/sdd 3.7T 688G 3.0T 19% /mnt/db > > du > 767G /mnt > > That's a 66G difference for the same data with the same compress option. > The used size here is much more in line with what I'd have expected > given the nature of the data. > > I would think that compression differences or things like fragmentation > or bookending for modified files shouldn't affect this, because the > first filesystem has been defragmented/recompressed and didn't shrink. > > So what can explain this? Where did the 66G go? There are a few places: the kernel may have decided your files are not compressible and disabled compression on them (some older kernels did this with great enthusiasm); your files might have preallocated space from the fallocate system call (which disables compression and allocates contiguous space, so defrag will not touch it). 'filefrag -v' can tell you if this is happening to your files. In practice database files take about double the amount of space they appear to because of extent shingling. Suppose we have a defragmented file with one extent "A" like this: 0 MB AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA 1MB Now we overwrite about half of the blocks: 0 MB BBBBBBBBBBBBBBBBAAAAAAAAAAAAAAAA 1MB btrfs tracks references to the entire extent, so what is on disk now is this: 0 MB aaaaaaaaaaaaaaaaAAAAAAAAAAAAAAAA 1MB original extent 0 MB BBBBBBBBBBBBBBBB 1MB new extent The "a" are blocks from the original extent that are not visible in the file, but remain present on disk. In other words, this 1MB file is now taking up 1.5MB of space. This continues as long as any blocks of partially overwritten extents are visible in any file (including snapshots, dedup, and clones), with the worst case being something like this: 0 MB BBBBBBBBBBBBBCCCCCCCCCCCCCCDDDDA 1MB which could be like this on disk: 0 MB aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaA 1MB first extent 0 MB BBBBBBBBBBBBBbbb 1MB second extent 0 MB CCCCCCCCCCCCCCcccc 1MB third extent 0 MB DDDD 1MB fourth extent This 1MB file takes up a little over 2MB of disk space, and there are parts of extents A, B, and C which persist on disk but are no longer part of any file's content. In this case, if we wrote the last 4K of the file, we would free 1MB of disk space by doing so: (extent A now deleted) 0 MB BBBBBBBBBBBBBbbb 1MB second extent 0 MB CCCCCCCCCCCCCCcccc 1MB third extent 0 MB DDDD 1MB fourth extent 0 MB E 1MB fifth extent Similarly to free the "B" extent we have to overwrite all the visible blocks, i.e. from 0 to the beginning of the "C" extent, before the last visible block from "B" is destroyed and the entire "B" extent can be freed. The worst case is pretty bad: with the worst possible overwrite pattern, a file can occupy the square of its size on disk divided by the block size (4K) divided by two. That's a little under 128MB for a 1MB file, or 128TB for a 1GB file. Above 1GB, the scaling is linear instead of quadratic because the extent size limit (1G) has been reached and single-extent files are no longer possible (so a worst-case 2GB file takes only 256TB of space instead of 512TB). Defragmenting the files helps free space temporarily; however, space usage will quickly grow again until it returns to the steady state around 2x the file size. A database ends up maxing out at about a factor of two space usage because it tends to write short uniform-sized bursts of pages randomly, so we get a pattern a bit like bricks in a wall: 0 MB AA BB CC DD EE FF GG HH II JJ KK 1 MB half the extents 0 MB LL MM NN OO PP QQ RR SS TT UU V 1 MB the other half 0 MB ALLBMMCNNDOOEPPFQQGRRHSSITTJUUKV 1 MB what the file looks like Fixing this is non-trivial (it may require an incompatible disk format change). Until this is fixed, the most space-efficient approach seems to be to force compression (so the maximum extent is 128K instead of 1GB) and never defragment database files ever. > -- > GCP > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Big disk space usage difference, even after defrag, on identical data 2015-04-13 4:04 ` Zygo Blaxell @ 2015-04-13 8:07 ` Duncan 2015-04-13 11:32 ` Gian-Carlo Pascutto 1 sibling, 0 replies; 8+ messages in thread From: Duncan @ 2015-04-13 8:07 UTC (permalink / raw) To: linux-btrfs Zygo Blaxell posted on Mon, 13 Apr 2015 00:04:36 -0400 as excerpted: > A database ends up maxing out at about a factor of two space usage > because it tends to write short uniform-sized bursts of pages randomly, > so we get a pattern a bit like bricks in a wall: > > 0 MB AA BB CC DD EE FF GG HH II JJ KK 1 MB half the extents 0 MB > LL MM NN OO PP QQ RR SS TT UU V 1 MB the other half > > 0 MB ALLBMMCNNDOOEPPFQQGRRHSSITTJUUKV 1 MB what the file looks > like > > Fixing this is non-trivial (it may require an incompatible disk format > change). Until this is fixed, the most space-efficient approach seems > to be to force compression (so the maximum extent is 128K instead of > 1GB) and never defragment database files ever. ... Or set the database file nocow at creation, and don't snapshot it, so overwrites are always in-place. (Btrfs compression and checksumming get turned off with nocow, but as we've seen, compression isn't all that effective on random-rewrite-pattern files anyway, and databases generally have their own data integrity handling, so neither one is a huge loss, and the in-place rewrite makes for better performance and a more predictable steady-state.) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Big disk space usage difference, even after defrag, on identical data 2015-04-13 4:04 ` Zygo Blaxell 2015-04-13 8:07 ` Duncan @ 2015-04-13 11:32 ` Gian-Carlo Pascutto 1 sibling, 0 replies; 8+ messages in thread From: Gian-Carlo Pascutto @ 2015-04-13 11:32 UTC (permalink / raw) To: linux-btrfs; +Cc: Zygo Blaxell On 13-04-15 06:04, Zygo Blaxell wrote: >> I would think that compression differences or things like >> fragmentation or bookending for modified files shouldn't affect >> this, because the first filesystem has been >> defragmented/recompressed and didn't shrink. >> >> So what can explain this? Where did the 66G go? > > There are a few places: the kernel may have decided your files are > not compressible and disabled compression on them (some older kernels > did this with great enthusiasm); As stated in the previous mail, this is 3.19.1. Moreover, the data is either uniformly compressible or not at all. Lastly, note that the *exact same* mount options are being used on *the exact same kernel* with *the exact same data*. Getting a different compressible decision given the same inputs would point to bugs. > your files might have preallocated space from the fallocate system > call (which disables compression and allocates contiguous space, so > defrag will not touch it). So defrag -clzo or -czlib won't actually re-compress mostly-continuous files? That's evil. I have no idea whether PostgreSQL allocates files that way, though. > 'filefrag -v' can tell you if this is happening to your files. Not sure how to interpret that. Without "-v", I see most of the (DB) data has 2-5 extents per Gigabyte. A few have 8192 extents per Gigabyte. Comparing to the copy that takes 66G less, there every (compressible) file has about 8192 extents per Gigabyte, and the others 5 or 6. So you may be right that some DB files are "wedged" in a format that btrfs can't compress. I forced the files to be rewritten (VACUUM FULL) and that "fixed" the problem. > In practice database files take about double the amount of space > they appear to because of extent shingling. This is what I called "bookending" in the original mail, I didn't know the correct name, but I understand doing updates can result in N^2/2 or thereabouts disk space usage, however: > Defragmenting the files helps free space temporarily; however, space > usage will quickly grow again until it returns to the steady state > around 2x the file size. As stated in the original mail, the filesystem was *freshly defragmented* so that can't have been the cause. > Until this is fixed, the most space-efficient approach seems to be to > force compression (so the maximum extent is 128K instead of 1GB) Would that fix the problem with fallocated() files? -- GCP ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Big disk space usage difference, even after defrag, on identical data 2015-04-11 19:59 Big disk space usage difference, even after defrag, on identical data Gian-Carlo Pascutto 2015-04-13 4:04 ` Zygo Blaxell @ 2015-04-13 5:06 ` Duncan 2015-04-13 14:06 ` Gian-Carlo Pascutto 1 sibling, 1 reply; 8+ messages in thread From: Duncan @ 2015-04-13 5:06 UTC (permalink / raw) To: linux-btrfs Gian-Carlo Pascutto posted on Sat, 11 Apr 2015 21:59:50 +0200 as excerpted: > That's a 66G difference for the same data with the same compress option. > The used size here is much more in line with what I'd have expected > given the nature of the data. > > I would think that compression differences or things like fragmentation > or bookending for modified files shouldn't affect this, because the > first filesystem has been defragmented/recompressed and didn't shrink. > > So what can explain this? Where did the 66G go? Out of curiosity, does a balance on the actively used btrfs help? You mentioned defrag -v -r -clzo, but didn't use the -f (flush) or -t (minimum size file) options. Does adding -f -t1 help? You aren't doing btrfs snapshots of either subvolume, are you? I'm not sure this is related to the answer to your question, since you did defrag, but it might be, and it's good to know when dealing with database files on btrfs in any case. Btrfs is in general a copy-on-write (COW) based filesystem. Random rewrite pattern files, database and VM image files being prime examples, typically HEAVILY fragment on COW filesystems, since any rewrite forces a copy of the rewritten data block elsewhere. The often rather large original extents get holes, but remained pinned by the existing data still remaining in them that hasn't been rewritten. This is analogous to the way databases often rewrite records but leave holes behind that aren't immediately cleaned up, only it's occurring at the filesystem extent level. Only after all the data in an extent has been rewritten, can the extent itself be unpinned and returned to the free space pool. Defrag should force the rewrite of entire files and take care of this, but obviously it's not returning to "clean" state. I forgot what the default minimum file size is if -t isn't set, maybe 128 MiB? But a -t1 will force it to defrag even small files, and I recall at least one thread here where the poster said it made all the difference for him, so try that. And the -f should force a filesystem sync afterward, so you know the numbers from any report you run afterward match the final state. Meanwhile, you may consider using the nocow attribute on those database files. It will disable compression on them, but rewrites should then occur in-place, so you don't get the fragmentation and extent usage holes and duplication that you'd have otherwise. It'll also disable btrfs checksumming, but mature databases already have their own error detection and correction system, since they don't normally run on filesystems that provide that sort of service like btrfs does. While initial usage will be higher due to the lack of compression, as you've discovered, over time, on an actively updated database, compression isn't all that effective anyway. And while usage may be a bit higher at least originally, it should be stable, but for expanding the actual size of the database, anyway. But there's a couple of caveats to nocow. First, in ordered to be properly effective, it needs to be set on a file while it's still empty. The most effective way to do this is to set nocow on the empty parent directory, then copy the nocow-target files into it so they inherit the nocow attribute as they are created, before they actually have any data. The second pertains to btrfs snapshots. Snapshots lock the existing file in place, effectively making an otherwise nocow file cow1 -- the first write to an existing file block will cow it, but after that, further writes to the same block will rewrite in-place... until the next snapshot, of course. So try to minimize the number of snapshots done to nocow files, and if you do snapshot them, defrag them once in awhile as well. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Big disk space usage difference, even after defrag, on identical data 2015-04-13 5:06 ` Duncan @ 2015-04-13 14:06 ` Gian-Carlo Pascutto 2015-04-13 21:45 ` Zygo Blaxell 2015-04-14 3:18 ` Duncan 0 siblings, 2 replies; 8+ messages in thread From: Gian-Carlo Pascutto @ 2015-04-13 14:06 UTC (permalink / raw) To: linux-btrfs On 13-04-15 07:06, Duncan wrote: >> So what can explain this? Where did the 66G go? > > Out of curiosity, does a balance on the actively used btrfs help? > > You mentioned defrag -v -r -clzo, but didn't use the -f (flush) or -t > (minimum size file) options. Does adding -f -t1 help? Unfortunately I can no longer try this, see the other reply why. But the problem turned out to be some 1G-sized files, written using 3-5 extents, that for whatever reason defrag was not touching. > You aren't doing btrfs snapshots of either subvolume, are you? No :-) I should've mentioned that. > Defrag should force the rewrite of entire files and take care of this, > but obviously it's not returning to "clean" state. I forgot what the > default minimum file size is if -t isn't set, maybe 128 MiB? But a -t1 > will force it to defrag even small files, and I recall at least one > thread here where the poster said it made all the difference for him, so > try that. And the -f should force a filesystem sync afterward, so you > know the numbers from any report you run afterward match the final state. Reading the corresponding manual, the -t explanation says that "any extent bigger than this size will be considered already defragged". So I guess setting -t1 might've fixed the problem too...but after checking the source, I'm not so sure. I didn't find the -t default in the manpages - after browsing through the source, the default is in the kernel: https://github.com/torvalds/linux/blob/4f671fe2f9523a1ea206f63fe60a7c7b3a56d5c7/fs/btrfs/ioctl.c#L1268 (Not sure what units those are.) I wonder if this is relevant: https://github.com/torvalds/linux/blob/4f671fe2f9523a1ea206f63fe60a7c7b3a56d5c7/fs/btrfs/ioctl.c#L2572 This seems to reset the -t flag if compress (-c) is set? This looks a bit fishy? > Meanwhile, you may consider using the nocow attribute on those database > files. It will disable compression on them, I'm using btrfs specifically to get compression, so this isn't an option. > While initial usage will be higher due to the lack of compression, > as you've discovered, over time, on an actively updated database, > compression isn't all that effective anyway. I don't see why. If you're referring to the additional overhead of continuously compressing and decompressing everything - yes, of course. But in my case I have a mostly-append workload to a huge amount of fairly compressible data that's on magnetic storage, so compression is a win in disk space and perhaps even in performance. I'm well aware of the many caveats in using btrfs for databases - they're well documented and although I much appreciate your extended explanation, it wasn't new to me. It turns out that if your dataset isn't update heavy (so it doesn't fragment to begin with), or has to be queried via indexed access (i.e. mostly via random seeks), the fragmentation doesn't matter much anyway. Conversely, btrfs appears to have better sync performance with multiple threads, and allows one to disable part of the partial-page-write protection logic in the database (full_page_writes=off for PostgreSQL), because btrfs is already doing the COW to ensure those can't actually happen [1]. The net result is a *boost* from about 40 tps (ext4) to 55 tps (btrfs), which certainly is contrary to popular wisdom. Maybe btrfs would fall off eventually as fragementation does set in gradually, but given that there's an offline defragmentation tool that can run in the background, I don't care. [1] I wouldn't be too surprised if database COW, which consists of journal-writing a copy of the data out of band, then rewriting it again in the original place, is actually functionally equivalent do disabling COW in the database and running btrfs + defrag. Obviously you shouldn't keep COW enabled in btrfs *AND* the DB, requiring all data to be copied around at least 3 times...which I'm afraid almost everyone does because it's the default... -- GCP ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Big disk space usage difference, even after defrag, on identical data 2015-04-13 14:06 ` Gian-Carlo Pascutto @ 2015-04-13 21:45 ` Zygo Blaxell 2015-04-14 3:18 ` Duncan 1 sibling, 0 replies; 8+ messages in thread From: Zygo Blaxell @ 2015-04-13 21:45 UTC (permalink / raw) To: Gian-Carlo Pascutto; +Cc: linux-btrfs [-- Attachment #1: Type: text/plain, Size: 4670 bytes --] On Mon, Apr 13, 2015 at 04:06:39PM +0200, Gian-Carlo Pascutto wrote: > On 13-04-15 07:06, Duncan wrote: > > >> So what can explain this? Where did the 66G go? > > > > Out of curiosity, does a balance on the actively used btrfs help? > > > > You mentioned defrag -v -r -clzo, but didn't use the -f (flush) or -t > > (minimum size file) options. Does adding -f -t1 help? > > Unfortunately I can no longer try this, see the other reply why. But the > problem turned out to be some 1G-sized files, written using 3-5 extents, > that for whatever reason defrag was not touching. There are several corner cases that defrag won't touch by default. It's designed to be conservative and favor speed over size. Also when the kernel decides you're not getting enough compression, it seems to disable compression on the file _forever_ even if future writes are compressible again. mount -o compress-force works around that. > > You aren't doing btrfs snapshots of either subvolume, are you? > > No :-) I should've mentioned that. read-only snapshots: yet another thing defrag won't touch. > > While initial usage will be higher due to the lack of compression, > > as you've discovered, over time, on an actively updated database, > > compression isn't all that effective anyway. > > I don't see why. If you're referring to the additional overhead of > continuously compressing and decompressing everything - yes, of course. > But in my case I have a mostly-append workload to a huge amount of > fairly compressible data that's on magnetic storage, so compression is a > win in disk space and perhaps even in performance. Short writes won't compress--not just well, but at all--because btrfs won't look at adjacent already-written blocks. If you write a file at less than 4K/minute, there will be no compression, as each new extent (or replacement extent for overwritten data) is already minimum-sized. If you write in bursts of 128K or more, consecutively, then you can get compression benefit. There has been talk of teaching autodefrag to roll up the last few dozen extents of files that grow slowly so they can be compressed. > It turns out that if your dataset isn't update heavy (so it doesn't > fragment to begin with), or has to be queried via indexed access (i.e. > mostly via random seeks), the fragmentation doesn't matter much anyway. > Conversely, btrfs appears to have better sync performance with multiple > threads, and allows one to disable part of the partial-page-write > protection logic in the database (full_page_writes=off for PostgreSQL), > because btrfs is already doing the COW to ensure those can't actually > happen [1]. > > The net result is a *boost* from about 40 tps (ext4) to 55 tps (btrfs), > which certainly is contrary to popular wisdom. Maybe btrfs would fall > off eventually as fragementation does set in gradually, but given that > there's an offline defragmentation tool that can run in the background, > I don't care. I've found the performance of PostgreSQL to be wildly variable on btrfs. It may be OK at first, but watch it for a week or two to admire the full four-orders-of-magnitude swing (100 tps to 0.01 tps). :-O > [1] I wouldn't be too surprised if database COW, which consists of > journal-writing a copy of the data out of band, then rewriting it again > in the original place, is actually functionally equivalent do disabling > COW in the database and running btrfs + defrag. Obviously you shouldn't > keep COW enabled in btrfs *AND* the DB, requiring all data to be copied > around at least 3 times...which I'm afraid almost everyone does because > it's the default... Journalling writes all the data twice: once to the journal, once to update the origin page after the journal (though PostgreSQL will omit some of those duplicate writes in cases where there is no origin page to overwrite). COW writes all the new and updated data only once. In the event of a crash, if the log tree is not recoverable (and it's a rich source of btrfs bugs, so it's often not), you lose everything that happened to the database in the last 30 seconds. If you were already using async commit in PostgreSQL anyway then that's not much of a concern (and not having to call fsync 100 times a second _really_ helps performance!) but if you really need sync commit then btrfs is not the filesystem for you. > -- > GCP > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Big disk space usage difference, even after defrag, on identical data 2015-04-13 14:06 ` Gian-Carlo Pascutto 2015-04-13 21:45 ` Zygo Blaxell @ 2015-04-14 3:18 ` Duncan 1 sibling, 0 replies; 8+ messages in thread From: Duncan @ 2015-04-14 3:18 UTC (permalink / raw) To: linux-btrfs Gian-Carlo Pascutto posted on Mon, 13 Apr 2015 16:06:39 +0200 as excerpted: >> Defrag should force the rewrite of entire files and take care of this, >> but obviously it's not returning to "clean" state. I forgot what the >> default minimum file size is if -t isn't set, maybe 128 MiB? But a -t1 >> will force it to defrag even small files, and I recall at least one >> thread here where the poster said it made all the difference for him, >> so try that. And the -f should force a filesystem sync afterward, so >> you know the numbers from any report you run afterward match the final >> state. > > Reading the corresponding manual, the -t explanation says that "any > extent bigger than this size will be considered already defragged". So I > guess setting -t1 might've fixed the problem too...but after checking > the source, I'm not so sure. Oops! You are correct. There was an on-list discussion of that before that I had forgotten. The "make sure everything gets defragged" magic setting is -t 1G or higher, *not* the -t 1 I was trying to tell you previously (which will end up skipping everything, instead of defragging everything). Thanks for spotting the inconsistency and calling me on it! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-04-14 3:18 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-04-11 19:59 Big disk space usage difference, even after defrag, on identical data Gian-Carlo Pascutto 2015-04-13 4:04 ` Zygo Blaxell 2015-04-13 8:07 ` Duncan 2015-04-13 11:32 ` Gian-Carlo Pascutto 2015-04-13 5:06 ` Duncan 2015-04-13 14:06 ` Gian-Carlo Pascutto 2015-04-13 21:45 ` Zygo Blaxell 2015-04-14 3:18 ` Duncan
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.