linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yurii Zubrytskyi <zyy@google.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Eugene Zemtsov <ezemtsov@google.com>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Initial patches for Incremental FS
Date: Thu, 30 May 2019 15:45:42 -0700	[thread overview]
Message-ID: <CAJeUaNAcZXfX-7Ws0q7SnaWrD+nzK3hxPwoW-NYvjAL0b=8M9g@mail.gmail.com> (raw)
In-Reply-To: <CAJfpeguys2P9q5EpE3GzKHcOS9GVLO9Fj9HB3JBLw36eax+NkQ@mail.gmail.com>

> With the proposed FUSE solution the following sequences would occur:
>
> kernel: if index for given block is missing, send MAP message
>   userspace: if data/hash is missing for given block then download data/hash
>   userspace: send MAP reply
> kernel: decompress data and verify hash based on index
>
> The kernel would not be involved in either streaming data or hash, it
> would only work with data/hash that has already been downloaded.
> Right?
>
> Or is your implementation doing streamed decompress/hash or partial blocks?
> ...
> Why does the kernel have to know the on-disk format to be able to load
> and discard parts of the index on-demand?  It only needs to know which
> blocks were accessed recently and which not so recently.
>
(1) You're correct, only the userspace deals with all streaming.
Kernel then sees full blocks of data (usually LZ4-compressed) and
blocks of hashes
We'd need to give the location of the hash tree instead of the
individual hash here though - verification has to go all the way to
the top and even check the signature there. And the same 5 GB file
would have over 40 MB of hashes (32 bytes of SHA2 for each 4K block),
so those have to be read from disk as well.
Overall, let's just imagine a phone with 100 apps, 100MB each,
installed this way. That ends up being ~10GB of data, so we'd need _at
least_ 40 MB for the index and 80 MB for hashes *in kernel*. Android
now fights for each megabyte of RAM used in the system services, so
FUSE won't be able to cache that, going back to the user mode for
almost all reads again.
(1 and 2) ... If FUSE were to know the on-disk format it would be able
to simply parse and read it when needed, with as little memory
footprint as it can. Requesting this data from the usermode every time
with little caching defeats the whole purpose of the change.

> BTW, which interface does your fuse filesystem use?  Libfuse?  Raw device?
Yes, our code interacts with the raw FUSE fd via poll/read/write
calls. We have tried the multithreaded approach via duping the control
fd and FUSE_DEV_IOC_CLONE, but it didn't give much improvement -
Android apps aren't usually use multithreaded, so there's at most two
pending reads at once. I've seen 10 once, but that was some kind of
miractle

And again, we have not even looked at the directory structure and stat
caching yet, neither interface nor memory usage. For a general case we
have to make direct disk reads from kernel and this forces even bigger
part of the disk format to be defined there. The end result is what
we've got when researching FUSE - a huge chunk of FUSE gets
overspecialized to handle our own way of using it end to end, with no
real configurability (because making it configurable makes that code
even bigger and more complex)

--
Thanks, Yurii

  reply	other threads:[~2019-05-30 22:45 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-02  4:03 Initial patches for Incremental FS ezemtsov
2019-05-02  4:03 ` [PATCH 1/6] incfs: Add first files of incrementalfs ezemtsov
2019-05-02 19:06   ` Miklos Szeredi
2019-05-02 20:41   ` Randy Dunlap
2019-05-07 15:57   ` Jann Horn
2019-05-07 17:13   ` Greg KH
2019-05-07 17:18   ` Greg KH
2019-05-02  4:03 ` [PATCH 2/6] incfs: Backing file format ezemtsov
2019-05-02  4:03 ` [PATCH 3/6] incfs: Management of in-memory FS data structures ezemtsov
2019-05-02  4:03 ` [PATCH 4/6] incfs: Integration with VFS layer ezemtsov
2019-05-02  4:03 ` [PATCH 6/6] incfs: Integration tests for incremental-fs ezemtsov
2019-05-02 11:19 ` Initial patches for Incremental FS Amir Goldstein
2019-05-02 13:10   ` Theodore Ts'o
2019-05-02 13:26     ` Al Viro
2019-05-03  4:23       ` Eugene Zemtsov
2019-05-03  5:19         ` Amir Goldstein
2019-05-08 20:09           ` Eugene Zemtsov
2019-05-09  8:15             ` Amir Goldstein
     [not found]               ` <CAK8JDrEQnXTcCtAPkb+S4r4hORiKh_yX=0A0A=LYSVKUo_n4OA@mail.gmail.com>
2019-05-21  1:32                 ` Yurii Zubrytskyi
2019-05-22  8:32                   ` Miklos Szeredi
2019-05-22 17:25                     ` Yurii Zubrytskyi
2019-05-23  4:25                       ` Miklos Szeredi
2019-05-29 21:06                         ` Yurii Zubrytskyi
2019-05-30  9:22                           ` Miklos Szeredi
2019-05-30 22:45                             ` Yurii Zubrytskyi [this message]
2019-05-31  9:02                               ` Miklos Szeredi
2019-05-22 10:54                   ` Amir Goldstein
2019-05-03  7:23         ` Richard Weinberger
2019-05-03 10:22         ` Miklos Szeredi
2019-05-02 13:46     ` Amir Goldstein
2019-05-02 18:16   ` Richard Weinberger
2019-05-02 18:33     ` Richard Weinberger
2019-05-02 13:47 ` J. R. Okajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJeUaNAcZXfX-7Ws0q7SnaWrD+nzK3hxPwoW-NYvjAL0b=8M9g@mail.gmail.com' \
    --to=zyy@google.com \
    --cc=amir73il@gmail.com \
    --cc=ezemtsov@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).