From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=1j7C=QH=vger.kernel.org=linux-fsdevel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-1.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS
	autolearn=ham autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8CD82C169C4
	for <linux-fsdevel@archiver.kernel.org>; Thu, 31 Jan 2019 20:02:49 +0000 (UTC)
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by mail.kernel.org (Postfix) with ESMTP id 5599A20881
	for <linux-fsdevel@archiver.kernel.org>; Thu, 31 Jan 2019 20:02:49 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=colorremedies-com.20150623.gappssmtp.com header.i=@colorremedies-com.20150623.gappssmtp.com header.b="k+MN2Sgb"
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1727202AbfAaUCs (ORCPT
        <rfc822;linux-fsdevel@archiver.kernel.org>);
        Thu, 31 Jan 2019 15:02:48 -0500
Received: from mail-lf1-f66.google.com ([209.85.167.66]:46164 "EHLO
        mail-lf1-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1726494AbfAaUCs (ORCPT
        <rfc822;linux-fsdevel@vger.kernel.org>);
        Thu, 31 Jan 2019 15:02:48 -0500
Received: by mail-lf1-f66.google.com with SMTP id f5so3263218lfc.13
        for <linux-fsdevel@vger.kernel.org>; Thu, 31 Jan 2019 12:02:47 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=colorremedies-com.20150623.gappssmtp.com; s=20150623;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=QX7oJpHhTdd8CJJ3kkVePiZ4+AqhH5XoaK8//FrdGeM=;
        b=k+MN2Sgb6GJl9KQH2QjEax+wS2CgONFE/kRrit3tOxMKFXuP9VlR/4QqzSpHkcP8sX
         WyTXhfgsokkDiiM6OaO1e8bZkZxHkVZ6fgO3ioovSPiz++TmXFAQfkCB+GqI9F6ruUfi
         I8dZIaugKgDwCNXhZMi4wgyszZUc0j4IGQunXzITZmo8Uc386/uWBodwAbJfPmuvHCch
         Q6MBa+soxoX4k6TIEIL6HnI9qC7XROocoxU2AK9FM8koAfifXou+AfVAHhiMfVSIZDlZ
         RiC6rx4es9aUiVi331UTMqJ5k6t5dv0GVMkl00ZK1WIxLIdrwXKQHKuH4WaWkTzPnoo7
         L3Rw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=QX7oJpHhTdd8CJJ3kkVePiZ4+AqhH5XoaK8//FrdGeM=;
        b=pQOkUsD+QOwke0PEZfRzGbuGBOuUfRKShTOljBT3sSEPZBP1o6uQ1ojNfuXiJ7Vga2
         A3AZSvbsy8uYEozUsglfWJwPNKhdDP+ilrdrfBG2iGD3JmxEoRCVufAT55Dc1wMdMp5Q
         KNOjNDXT75RurqNJMl1DaeTEmdp5ozBCmYn7zFLUgcGYdyKV9gm+aMuBV90qDzNQK0ur
         5XbRZxs+udvaSON7z/YvPxWSF+6TwXPNDvpsMJ44wPwC7AbEN2bSxjTOAQvLN9JFBGYI
         WCDo4ltsRbas3F2WTA2QFFCM4j4xoLWljeaHB96o//g3jWWY/5zezRjyI6AUwQVbnZ5I
         NqFA==
X-Gm-Message-State: AJcUukcHEyrvsU0T/5qWodkj9zi5E6DhEULzszsR1nmXYDfFedFPUEX8
        51edhpRvVBaS3uPBZUIUNBMXZdF/PKNw/8dqDhtPVNMKuqY=
X-Google-Smtp-Source: ALg8bN5p95/Z9QH5Oj9W8naV6P9wjLLwQSDIED4oDct+FdLt22qlLFAfkpsZKDwxkG/Zr3gNCwg2v7xpd3yy5Kuuv1I=
X-Received: by 2002:a19:7919:: with SMTP id u25mr29480315lfc.18.1548964966265;
 Thu, 31 Jan 2019 12:02:46 -0800 (PST)
MIME-Version: 1.0
References: <CAOQ4uxgqm-m1Zj073o_vSnwkTbGObJiQ-CdWV2ESd_P-29=jZw@mail.gmail.com>
In-Reply-To: <CAOQ4uxgqm-m1Zj073o_vSnwkTbGObJiQ-CdWV2ESd_P-29=jZw@mail.gmail.com>
From:   Chris Murphy <lists@colorremedies.com>
Date:   Thu, 31 Jan 2019 13:02:34 -0700
Message-ID: <CAJCQCtRGrt-K7WUe76D1g_HhU8NCi7ZuGExstVOQujtRS2Rf_w@mail.gmail.com>
Subject: Re: [LSF/MM TOPIC] Lazy file reflink
To:     Amir Goldstein <amir73il@gmail.com>
Cc:     lsf-pc@lists.linux-foundation.org,
        linux-fsdevel <linux-fsdevel@vger.kernel.org>,
        linux-xfs <linux-xfs@vger.kernel.org>,
        "Darrick J. Wong" <darrick.wong@oracle.com>,
        Christoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-fsdevel.vger.kernel.org>
X-Mailing-List: linux-fsdevel@vger.kernel.org

On Fri, Jan 25, 2019 at 7:28 AM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Hi,
>
> I would like to discuss the concept of lazy file reflink.
> The use case is backup of a very large read-mostly file.
> Backup application would like to read consistent content from the
> file, "atomic read" sort of speak.

If it's even a few thousand such files let alone millions, whether XFS
or Btrfs,  you're talking about a lot of metadata writes (hence I
sorta understand the request for lazy+volatile reflink). But this
quickly is a metric f ton of data, it's in effect a duplicate of each
file's metadata which includes a list of its extents. So in simple
cases it can be unwritten, but not in every case can you be sure the
operation fits in memory.

example from my sysroot:
36.87GiB data extents
1.12GiB filesystem metadata

If I reflink copy that whole file system, it translates into 1.12GiB
metadata read and then 1.12GiB written. If it's a Btrfs snapshot of
the containing subvolumes, it's maybe 128KiB written per snapshot. The
reflink copy is only cheap compared to full data copy. It's not that
cheap compared to snapshots. It sounds to me like a lazy reflink copy
is no longer lazy if it has to write out to disk because it can't all
fit in memory, or ends up evicting something in memory that otherwise
slows things down.

A Btrfs snapshot is cheaper than an LVM thinp snapshot which comes
with a need to then mount that snapshot's filesystem in order to do
the backup. But if the file system is big enough that there are long
mount times, chances are you're talking about a lot of data to backup
also, which means a alot of metadata to read and then write out unless
you're lucky enough to have gobs of RAM.

So *shrug* I'm not seeing a consistent optimization with lazy reflink.
It'll be faster if we're not talking about a lot of data in the first
place.

> I have based my assumption that reflink of a large file may incur
> lots of metadata updates on my limited knowledge of xfs reflink
> implementation, but perhaps it is not the case for other filesystems?
> (btrfs?) and perhaps the current metadata overhead on reflink of a large
> file is an implementation detail that could be optimized in the future?

The optimum use case is maybe a few hundred big files. Tens of
thousands to millions - I think you start creating a lot of
competition for memory, and the ensuing consequences. Something has to
be evicted. Either the lazy reflink is a lower priority and it
functionally becomes a partial or full reflink by writing out to block
devices; or it's a higher priority and kicks something else out. No
free lunch.


-- 
Chris Murphy