linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yurii Zubrytskyi <zyy@google.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: Eugene Zemtsov <ezemtsov@google.com>,
	Amir Goldstein <amir73il@gmail.com>,
	linux-fsdevel@vger.kernel.org
Subject: Re: Initial patches for Incremental FS
Date: Wed, 22 May 2019 10:25:05 -0700	[thread overview]
Message-ID: <CAJeUaNC5rXuNsoKmJjJN74iH9YNp94L450gcpxyc_dG=D8CCjA@mail.gmail.com> (raw)
In-Reply-To: <CAJfpegvmFJ63F2h_gFVPJeEgWS8UmxAYCUgA-4=j9iCNXaXARA@mail.gmail.com>

> Hang on, fuse does use caches in the kernel (page cache,
> dcache/icache).  The issue is probably not lack of cache, it's how the
> caches are primed and used.  Did you disable these caches?  Did you
> not disable invalidation for data, metadata and dcache?  In recent
> kernels we added caching readdir as well.  The only objects not cached
> are (non-acl) xattrs.   Do you have those?
Android (which is our primary use case) is constantly under memory
pressure, so caches
don't actually last long. Our experience with FOPEN_KEEP_CACHE has
shown that pages are
evicted more often than the files are getting reopened, so it doesn't
help. FUSE has to re-read
the data from the backing store all the time.
We didn't use xattrs for the FUSE-based implementation, but ended up
requiring a similar thing in
the Incremental FS, so the final design would have to include them.

> Re prefetching data:
> there's the NOTIFY_STORE message.
To add to the previous point, we do not have the data for prefetching,
as we're loading it page-by-page
from the host. We had to disable readahead for FUSE completely,
otherwise even USB3 isn't fast enough
to deliver data in that big chunks in time, and applications keep
hanging on page faults.

Overall, better caching doesn't save much *on Android*; what would
work is a full-blown data storage system inside
FUSE kernel code, that can intercept requests before they go into user
mode and process them completely. That's how
we could keep the data out of RAM but still get rid of that extra
context switch and kernel-user transition.
But this also means that FUSE becomes damn too much aware of the
specific storage format and all its features, and
basically gets specialized implementation of one of its filesystem
inside the generic FUSE code.
Even if we separate that out, the kernel API between the storage and
FUSE ended up being complete VFS API copy,
with some additions to send data blocks and Merkle tree blocks in. The
code is truly if we stuff the Incremental FS into
FUSE instead of mounting it directly.

-- 
Thanks, Yurii

  reply	other threads:[~2019-05-22 17:25 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-02  4:03 Initial patches for Incremental FS ezemtsov
2019-05-02  4:03 ` [PATCH 1/6] incfs: Add first files of incrementalfs ezemtsov
2019-05-02 19:06   ` Miklos Szeredi
2019-05-02 20:41   ` Randy Dunlap
2019-05-07 15:57   ` Jann Horn
2019-05-07 17:13   ` Greg KH
2019-05-07 17:18   ` Greg KH
2019-05-02  4:03 ` [PATCH 2/6] incfs: Backing file format ezemtsov
2019-05-02  4:03 ` [PATCH 3/6] incfs: Management of in-memory FS data structures ezemtsov
2019-05-02  4:03 ` [PATCH 4/6] incfs: Integration with VFS layer ezemtsov
2019-05-02  4:03 ` [PATCH 6/6] incfs: Integration tests for incremental-fs ezemtsov
2019-05-02 11:19 ` Initial patches for Incremental FS Amir Goldstein
2019-05-02 13:10   ` Theodore Ts'o
2019-05-02 13:26     ` Al Viro
2019-05-03  4:23       ` Eugene Zemtsov
2019-05-03  5:19         ` Amir Goldstein
2019-05-08 20:09           ` Eugene Zemtsov
2019-05-09  8:15             ` Amir Goldstein
     [not found]               ` <CAK8JDrEQnXTcCtAPkb+S4r4hORiKh_yX=0A0A=LYSVKUo_n4OA@mail.gmail.com>
2019-05-21  1:32                 ` Yurii Zubrytskyi
2019-05-22  8:32                   ` Miklos Szeredi
2019-05-22 17:25                     ` Yurii Zubrytskyi [this message]
2019-05-23  4:25                       ` Miklos Szeredi
2019-05-29 21:06                         ` Yurii Zubrytskyi
2019-05-30  9:22                           ` Miklos Szeredi
2019-05-30 22:45                             ` Yurii Zubrytskyi
2019-05-31  9:02                               ` Miklos Szeredi
2019-05-22 10:54                   ` Amir Goldstein
2019-05-03  7:23         ` Richard Weinberger
2019-05-03 10:22         ` Miklos Szeredi
2019-05-02 13:46     ` Amir Goldstein
2019-05-02 18:16   ` Richard Weinberger
2019-05-02 18:33     ` Richard Weinberger
2019-05-02 13:47 ` J. R. Okajima

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJeUaNC5rXuNsoKmJjJN74iH9YNp94L450gcpxyc_dG=D8CCjA@mail.gmail.com' \
    --to=zyy@google.com \
    --cc=amir73il@gmail.com \
    --cc=ezemtsov@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).