From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-f196.google.com ([209.85.219.196]:45876 "EHLO mail-yb1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726811AbeJCC7B (ORCPT ); Tue, 2 Oct 2018 22:59:01 -0400 Received: by mail-yb1-f196.google.com with SMTP id d9-v6so1350011ybr.12 for ; Tue, 02 Oct 2018 13:13:56 -0700 (PDT) Date: Tue, 2 Oct 2018 16:13:52 -0400 From: martin@omnibond.com To: Andreas Dilger Cc: Mike Marshall , Mike Marshall , linux-fsdevel Subject: Re: [PATCH 00/17] orangefs: page cache Message-ID: <20181002201352.GA41654@t480s.mkb.name> References: <20180917201054.3530-1-martin@omnibond.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On Mon, Oct 01, 2018 at 02:03:49PM -0600, Andreas Dilger wrote: > On Sep 20, 2018, at 12:31 PM, Mike Marshall wrote: > > > > Using the page cache seems like a game changer for the Orangefs kernel module. > > Workloads with small IO suffer trying to push a parallel filesystem > > with just a handful of bytes at a time. Below, vm2 with Fedora's 4.17 > > has /pvfsmnt mounted from an Orangefs filesystem that is itself running > > on vm2. vm1 with 4.19.0-rc2 plus the Orangefs page cache patch, also has > > its /pvfsmnt mounted from a local Orangefs filesystem. > > Is there some mechanism to prevent the client cache size exceeding the amount > of free space on the filesystem? If not, then the client may write data that > can never be flushed to disk on the server. There is not. I will add that the statfs returned filesystem size isn't accurate either. Our server returns the statfs info from the underlying filesystem, which (1) doesn't account for our metadata and (2) doesn't account for the possibility that there are other users on the underlying filesystem. About the best I can imagine doing is to stop the cached size from becoming absurdly large by putting a limit before we stop and do writeback. > > Cheers, Andreas > > > [vm2]$ dd if=/dev/zero of=/pvfsmnt/d.vm2/d.foo/dds.out bs=128 count=4194304 > > 4194304+0 records in > > 4194304+0 records out > > 536870912 bytes (537 MB, 512 MiB) copied, 662.013 s, 811 kB/s > > > > [vm1]$ dd if=/dev/zero of=/pvfsmnt/d.vm1/d.foo/dds.out bs=128 count=4194304 > > 4194304+0 records in > > 4194304+0 records out > > 536870912 bytes (537 MB, 512 MiB) copied, 11.3072 s, 47.5 MB/s > > > > Small IO collects in the page cache until a reasonable amount of > > data is available for writeback. > > > > The trick, it seems, is to improve small IO without harming large IO. > > Aligning writeback sizes, when possible, with the size of the IO buffer > > that the Orangefs kernel module shares with its userspace component seems > > promising on my dinky vm tests. > > > > -Mike > > > > On Mon, Sep 17, 2018 at 4:11 PM Martin Brandenburg wrote: > >> > >> If no major issues are found in review or in our testing, we intend to > >> submit this during the next merge window. > >> > >> The goal of all this is to significantly reduce the number of network > >> requests made to the OrangeFS > >> > >> First the xattr cache is needed because otherwise we make a ton of > >> getxattr calls from security_inode_need_killpriv. > >> > >> Then there's some reorganization so inode changes can be cached. > >> Finally, we enable write_inode. > >> > >> Then remove the old readpages. Next there's some reorganization to > >> support readpage/writepage. Finally, enable readpage/writepage which > >> is fairly straightforward except for the need to separate writes from > >> different uid/gid pairs due to the design of our server. > >> > >> Martin Brandenburg (17): > >> orangefs: implement xattr cache > >> orangefs: do not invalidate attributes on inode create > >> orangefs: simply orangefs_inode_getattr interface > >> orangefs: update attributes rather than relying on server > >> orangefs: hold i_lock during inode_getattr > >> orangefs: set up and use backing_dev_info > >> orangefs: let setattr write to cached inode > >> orangefs: reorganize setattr functions to track attribute changes > >> orangefs: remove orangefs_readpages > >> orangefs: service ops done for writeback are not killable > >> orangefs: migrate to generic_file_read_iter > >> orangefs: implement writepage > >> orangefs: skip inode writeout if nothing to write > >> orangefs: write range tracking > >> orangefs: avoid fsync service operation on flush > >> orangefs: use kmem_cache for orangefs_write_request > >> orangefs: implement writepages > >> > >> fs/orangefs/acl.c | 4 +- > >> fs/orangefs/file.c | 193 ++++-------- > >> fs/orangefs/inode.c | 576 +++++++++++++++++++++++++++------- > >> fs/orangefs/namei.c | 41 ++- > >> fs/orangefs/orangefs-cache.c | 24 +- > >> fs/orangefs/orangefs-kernel.h | 56 +++- > >> fs/orangefs/orangefs-mod.c | 10 +- > >> fs/orangefs/orangefs-utils.c | 181 +++++------ > >> fs/orangefs/super.c | 38 ++- > >> fs/orangefs/waitqueue.c | 18 +- > >> fs/orangefs/xattr.c | 104 ++++++ > >> 11 files changed, 839 insertions(+), 406 deletions(-) > >> > >> -- > >> 2.19.0 > >> > > > Cheers, Andreas > > > > >