From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8CD82C169C4 for ; Thu, 31 Jan 2019 20:02:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 5599A20881 for ; Thu, 31 Jan 2019 20:02:49 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=colorremedies-com.20150623.gappssmtp.com header.i=@colorremedies-com.20150623.gappssmtp.com header.b="k+MN2Sgb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727202AbfAaUCs (ORCPT ); Thu, 31 Jan 2019 15:02:48 -0500 Received: from mail-lf1-f66.google.com ([209.85.167.66]:46164 "EHLO mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726494AbfAaUCs (ORCPT ); Thu, 31 Jan 2019 15:02:48 -0500 Received: by mail-lf1-f66.google.com with SMTP id f5so3263218lfc.13 for ; Thu, 31 Jan 2019 12:02:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=colorremedies-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=QX7oJpHhTdd8CJJ3kkVePiZ4+AqhH5XoaK8//FrdGeM=; b=k+MN2Sgb6GJl9KQH2QjEax+wS2CgONFE/kRrit3tOxMKFXuP9VlR/4QqzSpHkcP8sX WyTXhfgsokkDiiM6OaO1e8bZkZxHkVZ6fgO3ioovSPiz++TmXFAQfkCB+GqI9F6ruUfi I8dZIaugKgDwCNXhZMi4wgyszZUc0j4IGQunXzITZmo8Uc386/uWBodwAbJfPmuvHCch Q6MBa+soxoX4k6TIEIL6HnI9qC7XROocoxU2AK9FM8koAfifXou+AfVAHhiMfVSIZDlZ RiC6rx4es9aUiVi331UTMqJ5k6t5dv0GVMkl00ZK1WIxLIdrwXKQHKuH4WaWkTzPnoo7 L3Rw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=QX7oJpHhTdd8CJJ3kkVePiZ4+AqhH5XoaK8//FrdGeM=; b=pQOkUsD+QOwke0PEZfRzGbuGBOuUfRKShTOljBT3sSEPZBP1o6uQ1ojNfuXiJ7Vga2 A3AZSvbsy8uYEozUsglfWJwPNKhdDP+ilrdrfBG2iGD3JmxEoRCVufAT55Dc1wMdMp5Q KNOjNDXT75RurqNJMl1DaeTEmdp5ozBCmYn7zFLUgcGYdyKV9gm+aMuBV90qDzNQK0ur 5XbRZxs+udvaSON7z/YvPxWSF+6TwXPNDvpsMJ44wPwC7AbEN2bSxjTOAQvLN9JFBGYI WCDo4ltsRbas3F2WTA2QFFCM4j4xoLWljeaHB96o//g3jWWY/5zezRjyI6AUwQVbnZ5I NqFA== X-Gm-Message-State: AJcUukcHEyrvsU0T/5qWodkj9zi5E6DhEULzszsR1nmXYDfFedFPUEX8 51edhpRvVBaS3uPBZUIUNBMXZdF/PKNw/8dqDhtPVNMKuqY= X-Google-Smtp-Source: ALg8bN5p95/Z9QH5Oj9W8naV6P9wjLLwQSDIED4oDct+FdLt22qlLFAfkpsZKDwxkG/Zr3gNCwg2v7xpd3yy5Kuuv1I= X-Received: by 2002:a19:7919:: with SMTP id u25mr29480315lfc.18.1548964966265; Thu, 31 Jan 2019 12:02:46 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Chris Murphy Date: Thu, 31 Jan 2019 13:02:34 -0700 Message-ID: Subject: Re: [LSF/MM TOPIC] Lazy file reflink To: Amir Goldstein Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel , linux-xfs , "Darrick J. Wong" , Christoph Hellwig , Jan Kara Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Fri, Jan 25, 2019 at 7:28 AM Amir Goldstein wrote: > > Hi, > > I would like to discuss the concept of lazy file reflink. > The use case is backup of a very large read-mostly file. > Backup application would like to read consistent content from the > file, "atomic read" sort of speak. If it's even a few thousand such files let alone millions, whether XFS or Btrfs, you're talking about a lot of metadata writes (hence I sorta understand the request for lazy+volatile reflink). But this quickly is a metric f ton of data, it's in effect a duplicate of each file's metadata which includes a list of its extents. So in simple cases it can be unwritten, but not in every case can you be sure the operation fits in memory. example from my sysroot: 36.87GiB data extents 1.12GiB filesystem metadata If I reflink copy that whole file system, it translates into 1.12GiB metadata read and then 1.12GiB written. If it's a Btrfs snapshot of the containing subvolumes, it's maybe 128KiB written per snapshot. The reflink copy is only cheap compared to full data copy. It's not that cheap compared to snapshots. It sounds to me like a lazy reflink copy is no longer lazy if it has to write out to disk because it can't all fit in memory, or ends up evicting something in memory that otherwise slows things down. A Btrfs snapshot is cheaper than an LVM thinp snapshot which comes with a need to then mount that snapshot's filesystem in order to do the backup. But if the file system is big enough that there are long mount times, chances are you're talking about a lot of data to backup also, which means a alot of metadata to read and then write out unless you're lucky enough to have gobs of RAM. So *shrug* I'm not seeing a consistent optimization with lazy reflink. It'll be faster if we're not talking about a lot of data in the first place. > I have based my assumption that reflink of a large file may incur > lots of metadata updates on my limited knowledge of xfs reflink > implementation, but perhaps it is not the case for other filesystems? > (btrfs?) and perhaps the current metadata overhead on reflink of a large > file is an implementation detail that could be optimized in the future? The optimum use case is maybe a few hundred big files. Tens of thousands to millions - I think you start creating a lot of competition for memory, and the ensuing consequences. Something has to be evicted. Either the lazy reflink is a lower priority and it functionally becomes a partial or full reflink by writing out to block devices; or it's a higher priority and kicks something else out. No free lunch. -- Chris Murphy