linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: martin@omnibond.com
To: Andreas Dilger <adilger@dilger.ca>
Cc: Mike Marshall <hubcap@omnibond.com>,
	Mike Marshall <hubcap@clemson.edu>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 00/17] orangefs: page cache
Date: Tue, 2 Oct 2018 16:13:52 -0400	[thread overview]
Message-ID: <20181002201352.GA41654@t480s.mkb.name> (raw)
In-Reply-To: <D61D4718-1AD0-45CD-92B2-BDCAC61AA295@dilger.ca>

On Mon, Oct 01, 2018 at 02:03:49PM -0600, Andreas Dilger wrote:
> On Sep 20, 2018, at 12:31 PM, Mike Marshall <hubcap@omnibond.com> wrote:
> > 
> > Using the page cache seems like a game changer for the Orangefs kernel module.
> > Workloads with small IO suffer trying to push a parallel filesystem
> > with just a handful of bytes at a time. Below, vm2 with Fedora's 4.17
> > has /pvfsmnt mounted from an Orangefs filesystem that is itself running
> > on vm2. vm1 with 4.19.0-rc2  plus the Orangefs page cache patch, also has
> > its /pvfsmnt mounted from a local Orangefs filesystem.
> 
> Is there some mechanism to prevent the client cache size exceeding the amount
> of free space on the filesystem?  If not, then the client may write data that
> can never be flushed to disk on the server.

There is not.

I will add that the statfs returned filesystem size isn't accurate
either.  Our server returns the statfs info from the underlying
filesystem, which (1) doesn't account for our metadata and (2) doesn't
account for the possibility that there are other users on the underlying
filesystem.

About the best I can imagine doing is to stop the cached size from
becoming absurdly large by putting a limit before we stop and do
writeback.

> 
> Cheers, Andreas
> 
> > [vm2]$ dd if=/dev/zero of=/pvfsmnt/d.vm2/d.foo/dds.out bs=128 count=4194304
> > 4194304+0 records in
> > 4194304+0 records out
> > 536870912 bytes (537 MB, 512 MiB) copied, 662.013 s, 811 kB/s
> > 
> > [vm1]$ dd if=/dev/zero of=/pvfsmnt/d.vm1/d.foo/dds.out bs=128 count=4194304
> > 4194304+0 records in
> > 4194304+0 records out
> > 536870912 bytes (537 MB, 512 MiB) copied, 11.3072 s, 47.5 MB/s
> > 
> > Small IO collects in the page cache until a reasonable amount of
> > data is available for writeback.
> > 
> > The trick, it seems, is to improve small IO without harming large IO.
> > Aligning writeback sizes, when possible, with the size of the IO buffer
> > that the Orangefs kernel module shares with its userspace component seems
> > promising on my dinky vm tests.
> > 
> > -Mike
> > 
> > On Mon, Sep 17, 2018 at 4:11 PM Martin Brandenburg <martin@omnibond.com> wrote:
> >> 
> >> If no major issues are found in review or in our testing, we intend to
> >> submit this during the next merge window.
> >> 
> >> The goal of all this is to significantly reduce the number of network
> >> requests made to the OrangeFS
> >> 
> >> First the xattr cache is needed because otherwise we make a ton of
> >> getxattr calls from security_inode_need_killpriv.
> >> 
> >> Then there's some reorganization so inode changes can be cached.
> >> Finally, we enable write_inode.
> >> 
> >> Then remove the old readpages.  Next there's some reorganization to
> >> support readpage/writepage.  Finally, enable readpage/writepage which
> >> is fairly straightforward except for the need to separate writes from
> >> different uid/gid pairs due to the design of our server.
> >> 
> >> Martin Brandenburg (17):
> >>  orangefs: implement xattr cache
> >>  orangefs: do not invalidate attributes on inode create
> >>  orangefs: simply orangefs_inode_getattr interface
> >>  orangefs: update attributes rather than relying on server
> >>  orangefs: hold i_lock during inode_getattr
> >>  orangefs: set up and use backing_dev_info
> >>  orangefs: let setattr write to cached inode
> >>  orangefs: reorganize setattr functions to track attribute changes
> >>  orangefs: remove orangefs_readpages
> >>  orangefs: service ops done for writeback are not killable
> >>  orangefs: migrate to generic_file_read_iter
> >>  orangefs: implement writepage
> >>  orangefs: skip inode writeout if nothing to write
> >>  orangefs: write range tracking
> >>  orangefs: avoid fsync service operation on flush
> >>  orangefs: use kmem_cache for orangefs_write_request
> >>  orangefs: implement writepages
> >> 
> >> fs/orangefs/acl.c             |   4 +-
> >> fs/orangefs/file.c            | 193 ++++--------
> >> fs/orangefs/inode.c           | 576 +++++++++++++++++++++++++++-------
> >> fs/orangefs/namei.c           |  41 ++-
> >> fs/orangefs/orangefs-cache.c  |  24 +-
> >> fs/orangefs/orangefs-kernel.h |  56 +++-
> >> fs/orangefs/orangefs-mod.c    |  10 +-
> >> fs/orangefs/orangefs-utils.c  | 181 +++++------
> >> fs/orangefs/super.c           |  38 ++-
> >> fs/orangefs/waitqueue.c       |  18 +-
> >> fs/orangefs/xattr.c           | 104 ++++++
> >> 11 files changed, 839 insertions(+), 406 deletions(-)
> >> 
> >> --
> >> 2.19.0
> >> 
> 
> 
> Cheers, Andreas
> 
> 
> 
> 
> 

      parent reply	other threads:[~2018-10-03  2:59 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-09-17 20:10 [PATCH 00/17] orangefs: page cache Martin Brandenburg
2018-09-17 20:10 ` [PATCH 01/17] orangefs: implement xattr cache Martin Brandenburg
2018-09-17 20:10 ` [PATCH 02/17] orangefs: do not invalidate attributes on inode create Martin Brandenburg
2018-09-17 20:10 ` [PATCH 03/17] orangefs: simply orangefs_inode_getattr interface Martin Brandenburg
2018-09-17 20:10 ` [PATCH 04/17] orangefs: update attributes rather than relying on server Martin Brandenburg
2018-09-17 20:10 ` [PATCH 05/17] orangefs: hold i_lock during inode_getattr Martin Brandenburg
2018-09-17 20:10 ` [PATCH 06/17] orangefs: set up and use backing_dev_info Martin Brandenburg
2018-09-17 20:10 ` [PATCH 07/17] orangefs: let setattr write to cached inode Martin Brandenburg
2018-09-17 20:10 ` [PATCH 08/17] orangefs: reorganize setattr functions to track attribute changes Martin Brandenburg
2018-09-17 20:10 ` [PATCH 09/17] orangefs: remove orangefs_readpages Martin Brandenburg
2018-09-17 20:10 ` [PATCH 10/17] orangefs: service ops done for writeback are not killable Martin Brandenburg
2018-09-17 20:10 ` [PATCH 11/17] orangefs: migrate to generic_file_read_iter Martin Brandenburg
2018-09-17 20:10 ` [PATCH 12/17] orangefs: implement writepage Martin Brandenburg
2018-09-17 20:10 ` [PATCH 13/17] orangefs: skip inode writeout if nothing to write Martin Brandenburg
2018-09-17 20:10 ` [PATCH 14/17] orangefs: write range tracking Martin Brandenburg
2018-09-17 20:10 ` [PATCH 15/17] orangefs: avoid fsync service operation on flush Martin Brandenburg
2018-09-17 20:10 ` [PATCH 16/17] orangefs: use kmem_cache for orangefs_write_request Martin Brandenburg
2018-09-17 20:10 ` [PATCH 17/17] orangefs: implement writepages Martin Brandenburg
2018-09-18 21:46   ` martin
2018-09-20 18:31 ` [PATCH 00/17] orangefs: page cache Mike Marshall
2018-10-01 20:03   ` Andreas Dilger
2018-10-02 17:58     ` Mike Marshall
2018-10-02 20:13     ` martin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181002201352.GA41654@t480s.mkb.name \
    --to=martin@omnibond.com \
    --cc=adilger@dilger.ca \
    --cc=hubcap@clemson.edu \
    --cc=hubcap@omnibond.com \
    --cc=linux-fsdevel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).