From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yb1-f193.google.com ([209.85.219.193]:47076 "EHLO mail-yb1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727252AbeIUAQy (ORCPT ); Thu, 20 Sep 2018 20:16:54 -0400 Received: by mail-yb1-f193.google.com with SMTP id y20-v6so4328522ybi.13 for ; Thu, 20 Sep 2018 11:32:06 -0700 (PDT) MIME-Version: 1.0 References: <20180917201054.3530-1-martin@omnibond.com> In-Reply-To: <20180917201054.3530-1-martin@omnibond.com> From: Mike Marshall Date: Thu, 20 Sep 2018 14:31:54 -0400 Message-ID: Subject: Re: [PATCH 00/17] orangefs: page cache To: Mike Marshall , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-fsdevel-owner@vger.kernel.org List-ID: Using the page cache seems like a game changer for the Orangefs kernel module. Workloads with small IO suffer trying to push a parallel filesystem with just a handful of bytes at a time. Below, vm2 with Fedora's 4.17 has /pvfsmnt mounted from an Orangefs filesystem that is itself running on vm2. vm1 with 4.19.0-rc2 plus the Orangefs page cache patch, also has its /pvfsmnt mounted from a local Orangefs filesystem. [vm2]$ dd if=/dev/zero of=/pvfsmnt/d.vm2/d.foo/dds.out bs=128 count=4194304 4194304+0 records in 4194304+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 662.013 s, 811 kB/s [vm1]$ dd if=/dev/zero of=/pvfsmnt/d.vm1/d.foo/dds.out bs=128 count=4194304 4194304+0 records in 4194304+0 records out 536870912 bytes (537 MB, 512 MiB) copied, 11.3072 s, 47.5 MB/s Small IO collects in the page cache until a reasonable amount of data is available for writeback. The trick, it seems, is to improve small IO without harming large IO. Aligning writeback sizes, when possible, with the size of the IO buffer that the Orangefs kernel module shares with its userspace component seems promising on my dinky vm tests. -Mike On Mon, Sep 17, 2018 at 4:11 PM Martin Brandenburg wrote: > > If no major issues are found in review or in our testing, we intend to > submit this during the next merge window. > > The goal of all this is to significantly reduce the number of network > requests made to the OrangeFS > > First the xattr cache is needed because otherwise we make a ton of > getxattr calls from security_inode_need_killpriv. > > Then there's some reorganization so inode changes can be cached. > Finally, we enable write_inode. > > Then remove the old readpages. Next there's some reorganization to > support readpage/writepage. Finally, enable readpage/writepage which > is fairly straightforward except for the need to separate writes from > different uid/gid pairs due to the design of our server. > > Martin Brandenburg (17): > orangefs: implement xattr cache > orangefs: do not invalidate attributes on inode create > orangefs: simply orangefs_inode_getattr interface > orangefs: update attributes rather than relying on server > orangefs: hold i_lock during inode_getattr > orangefs: set up and use backing_dev_info > orangefs: let setattr write to cached inode > orangefs: reorganize setattr functions to track attribute changes > orangefs: remove orangefs_readpages > orangefs: service ops done for writeback are not killable > orangefs: migrate to generic_file_read_iter > orangefs: implement writepage > orangefs: skip inode writeout if nothing to write > orangefs: write range tracking > orangefs: avoid fsync service operation on flush > orangefs: use kmem_cache for orangefs_write_request > orangefs: implement writepages > > fs/orangefs/acl.c | 4 +- > fs/orangefs/file.c | 193 ++++-------- > fs/orangefs/inode.c | 576 +++++++++++++++++++++++++++------- > fs/orangefs/namei.c | 41 ++- > fs/orangefs/orangefs-cache.c | 24 +- > fs/orangefs/orangefs-kernel.h | 56 +++- > fs/orangefs/orangefs-mod.c | 10 +- > fs/orangefs/orangefs-utils.c | 181 +++++------ > fs/orangefs/super.c | 38 ++- > fs/orangefs/waitqueue.c | 18 +- > fs/orangefs/xattr.c | 104 ++++++ > 11 files changed, 839 insertions(+), 406 deletions(-) > > -- > 2.19.0 >