From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAD31C28CC0 for ; Thu, 30 May 2019 09:23:07 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 89B792546A for ; Thu, 30 May 2019 09:23:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="key not found in DNS" (0-bit key) header.d=szeredi.hu header.i=@szeredi.hu header.b="X2KxT+ZW" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726965AbfE3JXH (ORCPT ); Thu, 30 May 2019 05:23:07 -0400 Received: from mail-it1-f195.google.com ([209.85.166.195]:56286 "EHLO mail-it1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726628AbfE3JXG (ORCPT ); Thu, 30 May 2019 05:23:06 -0400 Received: by mail-it1-f195.google.com with SMTP id g24so8733071iti.5 for ; Thu, 30 May 2019 02:23:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=szeredi.hu; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=a+5aEhVutop7tTeywzU7pWYUCUGkPUoyIEB0oJC3s4w=; b=X2KxT+ZWbhdBIXQIbPAxqr3e7c2pnXc5lo5pEwP58+veIa2y/EfnZ7yFBLnq13R/QM EPynTGlLEeMAKOg3EH91ZeTSSRzmoyMHy7AOQ0yFNmgNJpiF0MPqx3v1E3D5FkB4ZNAd A2D98ATFoOu3bq5VDVvNp46ULKYcwlIUX8SvA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=a+5aEhVutop7tTeywzU7pWYUCUGkPUoyIEB0oJC3s4w=; b=qT4A5oPBY0MC3Vi1sltBoeKzdlfEhIVmvzGIB/U0p4PtnMwqQ8ibVRya8Sbm++bXz+ OeCNLtbBOTQdaClFkHhszrLci42hfa2GmE6N7zFlYy/vvwJ/Q+gfpxDsekwbfT3aV7Yj 3LD5DDvDFY3kACh62P+jwKJ9xu7aVnKP6bOMB6c9NZK+720poVYZJp50U4KH/Qbb2RbS x37vbqE09m4WvNw11zWO9z+hahDGYyD2ki5UxvFpPDU1vN8JW1j/Z7NUdTQJdQvLXsMg 12UDhyAv/GB8CdvEpJr3tzFTLf5oqP0uXJD4+MKeoGpf73f+CHpqe8ok6B9bALTPKasb RM+A== X-Gm-Message-State: APjAAAXZnLHGmp/g+e5cXGc8b2bzWHnQzp5USzBYxu1rkqOSMGQKAMsh Ihe3n1C/mm7/k8qEO8amwS32Nn5wWpRs1fzaTAJN+A== X-Google-Smtp-Source: APXvYqxGtgaxGvZRnxZZeT7U0mE9T7MAFKM92+yT+LMximCAZ/y+JjinSYPIBOIjEaFSaGKNREat1XifEPO8Sd6x50A= X-Received: by 2002:a24:1acc:: with SMTP id 195mr1367036iti.118.1559208185554; Thu, 30 May 2019 02:23:05 -0700 (PDT) MIME-Version: 1.0 References: <20190502040331.81196-1-ezemtsov@google.com> <20190502131034.GA25007@mit.edu> <20190502132623.GU23075@ZenIV.linux.org.uk> In-Reply-To: From: Miklos Szeredi Date: Thu, 30 May 2019 11:22:54 +0200 Message-ID: Subject: Re: Initial patches for Incremental FS To: Yurii Zubrytskyi Cc: Eugene Zemtsov , Amir Goldstein , linux-fsdevel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Wed, May 29, 2019 at 11:06 PM Yurii Zubrytskyi wrote: > Yes, and this was _exactly_ our first plan, and it mitigates the read > performance > issue. The reasons why we didn't move forward with it are that we figured out > all other requirements, and fixing each of those needs another change in > FUSE, up to the level when FUSE interface becomes 50% dedicated to > our specific goal: > 1. MAP message would have to support data compression (with different > algorithms), hash verification (same thing) with hash streaming (because > even the Merkle tree for a 5GB file is huge, and can't be preloaded > at once) With the proposed FUSE solution the following sequences would occur: kernel: if index for given block is missing, send MAP message userspace: if data/hash is missing for given block then download data/hash userspace: send MAP reply kernel: decompress data and verify hash based on index The kernel would not be involved in either streaming data or hash, it would only work with data/hash that has already been downloaded. Right? Or is your implementation doing streamed decompress/hash or partial blocks? > 1.1. Mapping memory usage can get out of hands pretty quickly: it has to > be at least (offset + size + compression type + hash location + hash size + > hash kind) per each block. I'm not even thinking about multiple storage files > here. For that 5GB file (that's a debug APK for some Android game we're > targeting) we have 1.3M blocks, so ~16 bytes *1.3M = 20M of index only, > without actual overhead for the lookup table. > If the kernel code owns and manages its own on-disk data store and the > format, this index can be loaded and discarded on demand there. Why does the kernel have to know the on-disk format to be able to load and discard parts of the index on-demand? It only needs to know which blocks were accessed recently and which not so recently. > > There's also work currently ongoing in optimizing the overhead of > > userspace roundtrip. The most promising thing appears to be matching > > up the CPU for the userspace server with that of the task doing the > > request. This can apparently result in 60-500% speed improvement. > > That sounds almost too good to be true, and will be really cool. > Do you have any patches or git remote available in any compilable state to > try the optimization out? Android has quite complicated hardware config > and I want to see how this works, especially with our model where > several processes may send requests into the same filesystem FD. Currently it's only a bunch of hacks, no proper interfaces yet. I'll let you know once there's something useful for testing with a real filesystem. BTW, which interface does your fuse filesystem use? Libfuse? Raw device? Thanks, Miklos