From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCAAEC5DF60 for ; Thu, 7 Nov 2019 19:41:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 7D4AE2087E for ; Thu, 7 Nov 2019 19:41:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=thatsmathematics.com header.i=@thatsmathematics.com header.b="N+T3l2u8" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726888AbfKGTlq (ORCPT ); Thu, 7 Nov 2019 14:41:46 -0500 Received: from aurora.thatsmathematics.com ([162.209.10.89]:53194 "EHLO aurora.thatsmathematics.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725844AbfKGTlq (ORCPT ); Thu, 7 Nov 2019 14:41:46 -0500 Received: from moneta (unknown [50.28.189.162]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aurora.thatsmathematics.com (Postfix) with ESMTPSA id 0C2E47E200; Thu, 7 Nov 2019 19:41:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=thatsmathematics.com; s=swordfish; t=1573155705; bh=KMjiYCPuCf+yJ1fI1gdtawPZkvwk/Euw0v7TGEJabnY=; h=Date:From:To:cc:Subject:In-Reply-To:References:From; b=N+T3l2u8fofFim32E+QrOwF4FlrS1GP84aRqGG8sPcL4IwWusEBH1D6OoC7AyYUG3 MC5i9p5VXKUEtq/GxQVC/2hJF79pzaO1rTZfl1TpEgH3gWXWgVsCZ098NQHaMwqH/G 2AVYTYMEvrd3p3Nur+i2imsPMHOPMvfj1EI8xdPk= Date: Thu, 7 Nov 2019 14:41:44 -0500 (EST) From: Nate Eldredge X-X-Sender: nate@moneta To: Remi Gauvin cc: linux-btrfs@vger.kernel.org Subject: Re: Defragmenting to recover wasted space In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="8323329-44035990-1573155553=:3492" Content-ID: Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. --8323329-44035990-1573155553=:3492 Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 8BIT Content-ID: On Thu, 7 Nov 2019, Remi Gauvin wrote: > On 2019-11-07 9:03 a.m., Nate Eldredge wrote: > >> 1. What causes this?  I saw some references to "unused extents" but it >> wasn't clear how that happens, or why they wouldn't be freed through >> normal operation.  Are there certain usage patterns that exacerbate it? > > Virtual Box Image files are subject to many, many small writes... (just > booting windows, for example, can create well over 5000 file fragments.) > When the image file is new, the extents will be very large. In BTRFS, > the extents are immutable. When a small write creates a new 4K COW > extent, the old 4k remains as part of the old extent as well. This > situation will remain until all the data in the old extent is > re-written.. when none of that data is referenced anymore, the extent > will be freed. Thanks, Remi. This is very helpful in understanding what is going on. In particular, I didn't realize that extents are immutable even when there is only one reference to them (I have no snapshots or reflinks to these files). I guess this also means that in the worst case, if I want to overwrite the entire file "in place" in a random order, I actually need additional free space equal to the file's size, until I get around to defragging. That's rather counterintuitive for somebody used to traditional filesystems. >> 5. Is there a better way to detect this kind of wastage, to distinguish >> it from more mundane causes (deleted files still open, etc) and see how >> much space could be recovered? In particular, is there a way to tell >> which files are most affected, so that I can just defragment those? > > Generally speaking, files that are subject to many random writes are > few, and you should be well aware of the larger ones where this might be > an issues,, (virtual image files, large databases, etc.) These files > should be defragmented frequently. I don't see any reason not run > defrag over the whole subvolume, but if you want to search for files > with absurd fragments, you can always use the find command to search for > files, run the filefrag command on them, then use whatever tools you > like to search the output for files with thousands of fragments. Okay. Defragmenting is kind of inconvenient, though, and I suppose it involves some extra wear on the SSD since data is really being moved. There's also the issue, as I understand it, that defragmenting will break up existing reflinks, which in some other situations I may really want to keep. In fact, it seems that somehow what I really want is for the file to be *completely* fragmented, so that every write replaces an extent and frees the old one. On an SSD I don't really care if the data blocks are actually contiguous. It seems perverse, but even if there is more overhead, it might be worth it when I don't have a lot of free space to spare. I don't suppose there is any way to arrange that? Thanks again! -- Nate Eldredge nate@thatsmathematics.com --8323329-44035990-1573155553=:3492--