From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9DD80C433E0 for ; Mon, 15 Feb 2021 18:06:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6EAF564DEB for ; Mon, 15 Feb 2021 18:06:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230108AbhBOSGr (ORCPT ); Mon, 15 Feb 2021 13:06:47 -0500 Received: from us-smtp-delivery-124.mimecast.com ([63.128.21.124]:51189 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229993AbhBOSGo (ORCPT ); Mon, 15 Feb 2021 13:06:44 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613412313; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RvKgM/cU8h7EBERz61fRNCj2y18WKgIK9i+ka1lcCTY=; b=UBjAHML0Iu21IsbLzgPliEVVO9lchi6ZWEURlJnjANuJ1Y7EJxp9366AKELd5DdsvuKJf7 OnaToFoft+c+hFbFO0Z2XA46osULizOZYQJj37p+cX7UlcmvdNnMZadJWkbZGaVXGib3em 6hHo/cJqKh6VJ9+Wo/hb/z+YF4wSTyo= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-480-tVB5rUwEOHOSgdMUHQOUNg-1; Mon, 15 Feb 2021 13:05:11 -0500 X-MC-Unique: tVB5rUwEOHOSgdMUHQOUNg-1 Received: by mail-qt1-f199.google.com with SMTP id o7so5795819qtw.7 for ; Mon, 15 Feb 2021 10:05:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=RvKgM/cU8h7EBERz61fRNCj2y18WKgIK9i+ka1lcCTY=; b=p9nFg8xOMJ6VduGNVCRwGkfqYuM+vW/u43AHOFJ2zgSfk/AZwZ8lJR5s6GBG0iyzco sqnuxV6CVTuh15XF9K9Xzg9n0g8Z0cfkRzBD7x5Ep2CjezVEH+EEz8xpyGtHN2QUJJbq qSvBQopxVX15wvzs37EFDgNwHuu7q5jt125qx1CeSFHHRkRan7ZiHMqbA3H9UMm+JfD7 ZwLv8vwloj9CDL5swX3SBAqcX8+zHtKpu6DuK1vzaOnr/oUO/9bpCB9VjGyw1Z/o3UxQ mWx8Gyxja4wzStSN5DHd9XpmtojvaalDPwvRa6IpgKsjB3zIBFALsJhn/1287iq9AgPJ xIRg== X-Gm-Message-State: AOAM5322mXflaF+lQQny8jwtVF+vt2jFWxnD3JIXBb3W7eduN4qnOYqT fICibnhrayAGS8ApioWvR3OoFFOude98DxlUIaOGK7cRk8vXGuwnHp3P74UlDcSvY9eOi8P6SOP NuvhRm8NGRDELe4tN7ctjxw== X-Received: by 2002:ac8:6f04:: with SMTP id g4mr15602987qtv.20.1613412310879; Mon, 15 Feb 2021 10:05:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJzVqhx29FzIOMOJlU1RndSABYvBRNepxFeV6onEcdZEX4MJmol12Tso61j/Kc34t/K7GsDnug== X-Received: by 2002:ac8:6f04:: with SMTP id g4mr15602958qtv.20.1613412310536; Mon, 15 Feb 2021 10:05:10 -0800 (PST) Received: from tleilax.poochiereds.net (68-20-15-154.lightspeed.rlghnc.sbcglobal.net. [68.20.15.154]) by smtp.gmail.com with ESMTPSA id t129sm3626939qkh.33.2021.02.15.10.05.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Feb 2021 10:05:10 -0800 (PST) Message-ID: <9e49f96cd80eaf9c8ed267a7fbbcb4c6467ee790.camel@redhat.com> Subject: Re: [PATCH 00/33] Network fs helper library & fscache kiocb API [ver #3] From: Jeff Layton To: David Howells , Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, Matthew Wilcox , linux-cachefs@redhat.com, Alexander Viro , linux-mm@kvack.org, linux-afs@lists.infradead.org, v9fs-developer@lists.sourceforge.net, Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, Linus Torvalds , David Wysochanski , linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 13:05:09 -0500 In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> Content-Type: text/plain; charset="ISO-8859-15" User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-cifs@vger.kernel.org On Mon, 2021-02-15 at 15:44 +0000, David Howells wrote: > Here's a set of patches to do two things: > >  (1) Add a helper library to handle the new VM readahead interface. This >      is intended to be used unconditionally by the filesystem (whether or >      not caching is enabled) and provides a common framework for doing >      caching, transparent huge pages and, in the future, possibly fscrypt >      and read bandwidth maximisation. It also allows the netfs and the >      cache to align, expand and slice up a read request from the VM in >      various ways; the netfs need only provide a function to read a stretch >      of data to the pagecache and the helper takes care of the rest. > >  (2) Add an alternative fscache/cachfiles I/O API that uses the kiocb >      facility to do async DIO to transfer data to/from the netfs's pages, >      rather than using readpage with wait queue snooping on one side and >      vfs_write() on the other. It also uses less memory, since it doesn't >      do buffered I/O on the backing file. > >      Note that this uses SEEK_HOLE/SEEK_DATA to locate the data available >      to be read from the cache. Whilst this is an improvement from the >      bmap interface, it still has a problem with regard to a modern >      extent-based filesystem inserting or removing bridging blocks of >      zeros. Fixing that requires a much greater overhaul. > > This is a step towards overhauling the fscache API. The change is opt-in > on the part of the network filesystem. A netfs should not try to mix the > old and the new API because of conflicting ways of handling pages and the > PG_fscache page flag and because it would be mixing DIO with buffered I/O. > Further, the helper library can't be used with the old API. > > This does not change any of the fscache cookie handling APIs or the way > invalidation is done. > > In the near term, I intend to deprecate and remove the old I/O API > (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(), > fscache_write_page() and fscache_uncache_page()) and eventually replace > most of fscache/cachefiles with something simpler and easier to follow. > > The patchset contains five parts: > >  (1) Some helper patches, including provision of an ITER_XARRAY iov >      iterator and a function to do readahead expansion. > >  (2) Patches to add the netfs helper library. > >  (3) A patch to add the fscache/cachefiles kiocb API. > >  (4) Patches to add support in AFS for this. > >  (5) Patches from Jeff Layton to add support in Ceph for this. > > Dave Wysochanski also has patches for NFS for this, though they're not > included on this branch as there's an issue with PNFS. > > With this, AFS without a cache passes all expected xfstests; with a cache, > there's an extra failure, but that's also there before these patches. > Fixing that probably requires a greater overhaul. Ceph and NFS also pass > the expected tests. > > These patches can be found also on: > > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fscache-netfs-lib > > For diffing reference, the tag for the 9th Feb pull request is > fscache-ioapi-20210203 and can be found in the same repository. > > > > Changes > ======= > >  (v3) Rolled in the bug fixes. > >       Adjusted the functions that unlock and wait for PG_fscache according >       to Linus's suggestion. > >       Hold a ref on a page when PG_fscache is set as per Linus's >       suggestion. > >       Dropped NFS support and added Ceph support. > >  (v2) Fixed some bugs and added NFS support. > > > References > ========== > > These patches have been published for review before, firstly as part of a > larger set: > > Link: https://lore.kernel.org/linux-fsdevel/158861203563.340223.7585359869938129395.stgit@warthog.procyon.org.uk/ > > Link: https://lore.kernel.org/linux-fsdevel/159465766378.1376105.11619976251039287525.stgit@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/159465784033.1376674.18106463693989811037.stgit@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/159465821598.1377938.2046362270225008168.stgit@warthog.procyon.org.uk/ > > Link: https://lore.kernel.org/linux-fsdevel/160588455242.3465195.3214733858273019178.stgit@warthog.procyon.org.uk/ > > Then as a cut-down set: > > Link: https://lore.kernel.org/linux-fsdevel/161118128472.1232039.11746799833066425131.stgit@warthog.procyon.org.uk/ > > Link: https://lore.kernel.org/linux-fsdevel/161161025063.2537118.2009249444682241405.stgit@warthog.procyon.org.uk/ > > > Proposals/information about the design has been published here: > > Link: https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/2758811.1610621106@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/1441311.1598547738@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/160655.1611012999@warthog.procyon.org.uk/ > > And requests for information: > > Link: https://lore.kernel.org/linux-fsdevel/3326.1579019665@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/4467.1579020509@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/3577430.1579705075@warthog.procyon.org.uk/ > > The NFS parts, though not included here, have been tested by someone who's > using fscache in production: > > Link: https://listman.redhat.com/archives/linux-cachefs/2020-December/msg00000.html > > I've posted partial patches to try and help 9p and cifs along: > > Link: https://lore.kernel.org/linux-fsdevel/1514086.1605697347@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-cifs/1794123.1605713481@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/241017.1612263863@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-cifs/270998.1612265397@warthog.procyon.org.uk/ > > David > --- > David Howells (27): >       iov_iter: Add ITER_XARRAY >       mm: Add an unlock function for PG_private_2/PG_fscache >       mm: Implement readahead_control pageset expansion >       vfs: Export rw_verify_area() for use by cachefiles >       netfs: Make a netfs helper module >       netfs, mm: Move PG_fscache helper funcs to linux/netfs.h >       netfs, mm: Add unlock_page_fscache() and wait_on_page_fscache() >       netfs: Provide readahead and readpage netfs helpers >       netfs: Add tracepoints >       netfs: Gather stats >       netfs: Add write_begin helper >       netfs: Define an interface to talk to a cache >       netfs: Hold a ref on a page when PG_private_2 is set >       fscache, cachefiles: Add alternate API to use kiocb for read/write to cache >       afs: Disable use of the fscache I/O routines >       afs: Pass page into dirty region helpers to provide THP size >       afs: Print the operation debug_id when logging an unexpected data version >       afs: Move key to afs_read struct >       afs: Don't truncate iter during data fetch >       afs: Log remote unmarshalling errors >       afs: Set up the iov_iter before calling afs_extract_data() >       afs: Use ITER_XARRAY for writing >       afs: Wait on PG_fscache before modifying/releasing a page >       afs: Extract writeback extension into its own function >       afs: Prepare for use of THPs >       afs: Use the fs operation ops to handle FetchData completion >       afs: Use new fscache read helper API > > Jeff Layton (6): >       ceph: disable old fscache readpage handling >       ceph: rework PageFsCache handling >       ceph: fix fscache invalidation >       ceph: convert readpage to fscache read helper >       ceph: plug write_begin into read helper >       ceph: convert ceph_readpages to ceph_readahead > > >  fs/Kconfig | 1 + >  fs/Makefile | 1 + >  fs/afs/Kconfig | 1 + >  fs/afs/dir.c | 225 ++++--- >  fs/afs/file.c | 470 ++++--------- >  fs/afs/fs_operation.c | 4 +- >  fs/afs/fsclient.c | 108 +-- >  fs/afs/inode.c | 7 +- >  fs/afs/internal.h | 58 +- >  fs/afs/rxrpc.c | 150 ++--- >  fs/afs/write.c | 610 +++++++++-------- >  fs/afs/yfsclient.c | 82 +-- >  fs/cachefiles/Makefile | 1 + >  fs/cachefiles/interface.c | 5 +- >  fs/cachefiles/internal.h | 9 + >  fs/cachefiles/rdwr2.c | 412 ++++++++++++ >  fs/ceph/Kconfig | 1 + >  fs/ceph/addr.c | 535 ++++++--------- >  fs/ceph/cache.c | 125 ---- >  fs/ceph/cache.h | 101 +-- >  fs/ceph/caps.c | 10 +- >  fs/ceph/inode.c | 1 + >  fs/ceph/super.h | 1 + >  fs/fscache/Kconfig | 1 + >  fs/fscache/Makefile | 3 +- >  fs/fscache/internal.h | 3 + >  fs/fscache/page.c | 2 +- >  fs/fscache/page2.c | 117 ++++ >  fs/fscache/stats.c | 1 + >  fs/internal.h | 5 - >  fs/netfs/Kconfig | 23 + >  fs/netfs/Makefile | 5 + >  fs/netfs/internal.h | 97 +++ >  fs/netfs/read_helper.c | 1169 +++++++++++++++++++++++++++++++++ >  fs/netfs/stats.c | 59 ++ >  fs/read_write.c | 1 + >  include/linux/fs.h | 1 + >  include/linux/fscache-cache.h | 4 + >  include/linux/fscache.h | 40 +- >  include/linux/netfs.h | 195 ++++++ >  include/linux/pagemap.h | 3 + >  include/net/af_rxrpc.h | 2 +- >  include/trace/events/afs.h | 74 +-- >  include/trace/events/netfs.h | 201 ++++++ >  mm/filemap.c | 20 + >  mm/readahead.c | 70 ++ >  net/rxrpc/recvmsg.c | 9 +- >  47 files changed, 3473 insertions(+), 1550 deletions(-) >  create mode 100644 fs/cachefiles/rdwr2.c >  create mode 100644 fs/fscache/page2.c >  create mode 100644 fs/netfs/Kconfig >  create mode 100644 fs/netfs/Makefile >  create mode 100644 fs/netfs/internal.h >  create mode 100644 fs/netfs/read_helper.c >  create mode 100644 fs/netfs/stats.c >  create mode 100644 include/linux/netfs.h >  create mode 100644 include/trace/events/netfs.h > > Thanks David, I did an xfstests run on ceph with a kernel based on this and it seemed to do fine. I'll plan to pull this into the ceph-client/testing branch and run it through the ceph kclient test harness. There are only a few differences from the last run we did, so I'm not expecting big changes, but I'll keep you posted. -- Jeff Layton From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF3EDC433DB for ; Mon, 15 Feb 2021 18:05:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3BC9864DEB for ; Mon, 15 Feb 2021 18:05:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3BC9864DEB Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD3CD8D0133; Mon, 15 Feb 2021 13:05:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A5D2A8D012C; Mon, 15 Feb 2021 13:05:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 924978D0133; Mon, 15 Feb 2021 13:05:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id 762958D012C for ; Mon, 15 Feb 2021 13:05:15 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2D398180D0F92 for ; Mon, 15 Feb 2021 18:05:15 +0000 (UTC) X-FDA: 77821278990.30.9976986 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf16.hostedemail.com (Postfix) with ESMTP id 9D4DC80192DD for ; Mon, 15 Feb 2021 18:05:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613412313; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RvKgM/cU8h7EBERz61fRNCj2y18WKgIK9i+ka1lcCTY=; b=UBjAHML0Iu21IsbLzgPliEVVO9lchi6ZWEURlJnjANuJ1Y7EJxp9366AKELd5DdsvuKJf7 OnaToFoft+c+hFbFO0Z2XA46osULizOZYQJj37p+cX7UlcmvdNnMZadJWkbZGaVXGib3em 6hHo/cJqKh6VJ9+Wo/hb/z+YF4wSTyo= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-78-ahOxe0w8PVO2pleXP4E3sw-1; Mon, 15 Feb 2021 13:05:11 -0500 X-MC-Unique: ahOxe0w8PVO2pleXP4E3sw-1 Received: by mail-qt1-f199.google.com with SMTP id n4so5777695qte.11 for ; Mon, 15 Feb 2021 10:05:11 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=RvKgM/cU8h7EBERz61fRNCj2y18WKgIK9i+ka1lcCTY=; b=maxd9aEIRkhpxIX7M4l1a6nccE2ei1I+aPhs7q7WBcJX1qKHUNFDoBKPuWsfyoYT7x ZwAzamh9LS6gwAXVWpiltHeBTnm2YYamoP3S8Xc5tEOzMw/wLKiTzYg4Xl9G2n7QwOvE Hd7ptiwo1VacyXH3AEEIbdkpTmcHAweutI6OWBY+3P7xrgcd5vupHZsPnBUbx5KobI+A hbdvdRoXdmaRaJ9R9hIQsZssmKCi6xunZYIZDfjI2SRUD2g+mk9u5jXJ9NSsXbGQJ0OB y7xRfuCzz2C5mDoV2ZYvxI9Ja8YcmC2eySHw2hUCNwPiXgV/0jzN0EF4YfPhaajpWoWK A4lw== X-Gm-Message-State: AOAM532NPP4QjW+hl0kklqewPNcGswWklsecAtavlLQfS8spgoygcY6j lKqSsySqkFBIInRw/WQyCDo5ILwRrmzKkPVg+pOSFSiJcbmvkYoKb4bc9L0l36DLb/tmlLWsAki owAeQbsFRBAo= X-Received: by 2002:ac8:6f04:: with SMTP id g4mr15602996qtv.20.1613412310881; Mon, 15 Feb 2021 10:05:10 -0800 (PST) X-Google-Smtp-Source: ABdhPJzVqhx29FzIOMOJlU1RndSABYvBRNepxFeV6onEcdZEX4MJmol12Tso61j/Kc34t/K7GsDnug== X-Received: by 2002:ac8:6f04:: with SMTP id g4mr15602958qtv.20.1613412310536; Mon, 15 Feb 2021 10:05:10 -0800 (PST) Received: from tleilax.poochiereds.net (68-20-15-154.lightspeed.rlghnc.sbcglobal.net. [68.20.15.154]) by smtp.gmail.com with ESMTPSA id t129sm3626939qkh.33.2021.02.15.10.05.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Feb 2021 10:05:10 -0800 (PST) Message-ID: <9e49f96cd80eaf9c8ed267a7fbbcb4c6467ee790.camel@redhat.com> Subject: Re: [PATCH 00/33] Network fs helper library & fscache kiocb API [ver #3] From: Jeff Layton To: David Howells , Trond Myklebust , Anna Schumaker , Steve French , Dominique Martinet Cc: linux-cifs@vger.kernel.org, ceph-devel@vger.kernel.org, Matthew Wilcox , linux-cachefs@redhat.com, Alexander Viro , linux-mm@kvack.org, linux-afs@lists.infradead.org, v9fs-developer@lists.sourceforge.net, Christoph Hellwig , linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, Linus Torvalds , David Wysochanski , linux-kernel@vger.kernel.org Date: Mon, 15 Feb 2021 13:05:09 -0500 In-Reply-To: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> References: <161340385320.1303470.2392622971006879777.stgit@warthog.procyon.org.uk> User-Agent: Evolution 3.38.4 (3.38.4-1.fc33) MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=jlayton@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="ISO-8859-15" X-Stat-Signature: co1wpmqx3h85tbyd1iaiknkiem1xg8ka X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9D4DC80192DD Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf16; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613412313-978984 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, 2021-02-15 at 15:44 +0000, David Howells wrote: > Here's a set of patches to do two things: >=20 > =A0(1) Add a helper library to handle the new VM readahead interface. = This > =A0=A0=A0=A0=A0is intended to be used unconditionally by the filesystem= (whether or > =A0=A0=A0=A0=A0not caching is enabled) and provides a common framework = for doing > =A0=A0=A0=A0=A0caching, transparent huge pages and, in the future, poss= ibly fscrypt > =A0=A0=A0=A0=A0and read bandwidth maximisation. It also allows the net= fs and the > =A0=A0=A0=A0=A0cache to align, expand and slice up a read request from = the VM in > =A0=A0=A0=A0=A0various ways; the netfs need only provide a function to = read a stretch > =A0=A0=A0=A0=A0of data to the pagecache and the helper takes care of th= e rest. >=20 > =A0(2) Add an alternative fscache/cachfiles I/O API that uses the kiocb > =A0=A0=A0=A0=A0facility to do async DIO to transfer data to/from the ne= tfs's pages, > =A0=A0=A0=A0=A0rather than using readpage with wait queue snooping on o= ne side and > =A0=A0=A0=A0=A0vfs_write() on the other. It also uses less memory, sin= ce it doesn't > =A0=A0=A0=A0=A0do buffered I/O on the backing file. >=20 > =A0=A0=A0=A0=A0Note that this uses SEEK_HOLE/SEEK_DATA to locate the da= ta available > =A0=A0=A0=A0=A0to be read from the cache. Whilst this is an improvemen= t from the > =A0=A0=A0=A0=A0bmap interface, it still has a problem with regard to a = modern > =A0=A0=A0=A0=A0extent-based filesystem inserting or removing bridging b= locks of > =A0=A0=A0=A0=A0zeros. Fixing that requires a much greater overhaul. >=20 > This is a step towards overhauling the fscache API. The change is opt-= in > on the part of the network filesystem. A netfs should not try to mix t= he > old and the new API because of conflicting ways of handling pages and t= he > PG_fscache page flag and because it would be mixing DIO with buffered I= /O. > Further, the helper library can't be used with the old API. >=20 > This does not change any of the fscache cookie handling APIs or the way > invalidation is done. >=20 > In the near term, I intend to deprecate and remove the old I/O API > (fscache_allocate_page{,s}(), fscache_read_or_alloc_page{,s}(), > fscache_write_page() and fscache_uncache_page()) and eventually replace > most of fscache/cachefiles with something simpler and easier to follow. >=20 > The patchset contains five parts: >=20 > =A0(1) Some helper patches, including provision of an ITER_XARRAY iov > =A0=A0=A0=A0=A0iterator and a function to do readahead expansion. >=20 > =A0(2) Patches to add the netfs helper library. >=20 > =A0(3) A patch to add the fscache/cachefiles kiocb API. >=20 > =A0(4) Patches to add support in AFS for this. >=20 > =A0(5) Patches from Jeff Layton to add support in Ceph for this. >=20 > Dave Wysochanski also has patches for NFS for this, though they're not > included on this branch as there's an issue with PNFS. >=20 > With this, AFS without a cache passes all expected xfstests; with a cac= he, > there's an extra failure, but that's also there before these patches. > Fixing that probably requires a greater overhaul. Ceph and NFS also pa= ss > the expected tests. >=20 > These patches can be found also on: >=20 > https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/= log/?h=3Dfscache-netfs-lib >=20 > For diffing reference, the tag for the 9th Feb pull request is > fscache-ioapi-20210203 and can be found in the same repository. >=20 >=20 >=20 > Changes > =3D=3D=3D=3D=3D=3D=3D >=20 > =A0(v3) Rolled in the bug fixes. >=20 > =A0=A0=A0=A0=A0=A0Adjusted the functions that unlock and wait for PG_fs= cache according > =A0=A0=A0=A0=A0=A0to Linus's suggestion. >=20 > =A0=A0=A0=A0=A0=A0Hold a ref on a page when PG_fscache is set as per Li= nus's > =A0=A0=A0=A0=A0=A0suggestion. >=20 > =A0=A0=A0=A0=A0=A0Dropped NFS support and added Ceph support. >=20 > =A0(v2) Fixed some bugs and added NFS support. >=20 >=20 > References > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >=20 > These patches have been published for review before, firstly as part of= a > larger set: >=20 > Link: https://lore.kernel.org/linux-fsdevel/158861203563.340223.7585359= 869938129395.stgit@warthog.procyon.org.uk/ >=20 > Link: https://lore.kernel.org/linux-fsdevel/159465766378.1376105.116199= 76251039287525.stgit@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/159465784033.1376674.181064= 63693989811037.stgit@warthog.procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/159465821598.1377938.204636= 2270225008168.stgit@warthog.procyon.org.uk/ >=20 > Link: https://lore.kernel.org/linux-fsdevel/160588455242.3465195.321473= 3858273019178.stgit@warthog.procyon.org.uk/ >=20 > Then as a cut-down set: >=20 > Link: https://lore.kernel.org/linux-fsdevel/161118128472.1232039.117467= 99833066425131.stgit@warthog.procyon.org.uk/ >=20 > Link: https://lore.kernel.org/linux-fsdevel/161161025063.2537118.200924= 9444682241405.stgit@warthog.procyon.org.uk/ >=20 >=20 > Proposals/information about the design has been published here: >=20 > Link: https://lore.kernel.org/lkml/24942.1573667720@warthog.procyon.org= .uk/ > Link: https://lore.kernel.org/linux-fsdevel/2758811.1610621106@warthog.= procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/1441311.1598547738@warthog.= procyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/160655.1611012999@warthog.p= rocyon.org.uk/ >=20 > And requests for information: >=20 > Link: https://lore.kernel.org/linux-fsdevel/3326.1579019665@warthog.pro= cyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/4467.1579020509@warthog.pro= cyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/3577430.1579705075@warthog.= procyon.org.uk/ >=20 > The NFS parts, though not included here, have been tested by someone wh= o's > using fscache in production: >=20 > Link: https://listman.redhat.com/archives/linux-cachefs/2020-December/m= sg00000.html >=20 > I've posted partial patches to try and help 9p and cifs along: >=20 > Link: https://lore.kernel.org/linux-fsdevel/1514086.1605697347@warthog.= procyon.org.uk/ > Link: https://lore.kernel.org/linux-cifs/1794123.1605713481@warthog.pro= cyon.org.uk/ > Link: https://lore.kernel.org/linux-fsdevel/241017.1612263863@warthog.p= rocyon.org.uk/ > Link: https://lore.kernel.org/linux-cifs/270998.1612265397@warthog.proc= yon.org.uk/ >=20 > David > --- > David Howells (27): > =A0=A0=A0=A0=A0=A0iov_iter: Add ITER_XARRAY > =A0=A0=A0=A0=A0=A0mm: Add an unlock function for PG_private_2/PG_fscach= e > =A0=A0=A0=A0=A0=A0mm: Implement readahead_control pageset expansion > =A0=A0=A0=A0=A0=A0vfs: Export rw_verify_area() for use by cachefiles > =A0=A0=A0=A0=A0=A0netfs: Make a netfs helper module > =A0=A0=A0=A0=A0=A0netfs, mm: Move PG_fscache helper funcs to linux/netf= s.h > =A0=A0=A0=A0=A0=A0netfs, mm: Add unlock_page_fscache() and wait_on_page= _fscache() > =A0=A0=A0=A0=A0=A0netfs: Provide readahead and readpage netfs helpers > =A0=A0=A0=A0=A0=A0netfs: Add tracepoints > =A0=A0=A0=A0=A0=A0netfs: Gather stats > =A0=A0=A0=A0=A0=A0netfs: Add write_begin helper > =A0=A0=A0=A0=A0=A0netfs: Define an interface to talk to a cache > =A0=A0=A0=A0=A0=A0netfs: Hold a ref on a page when PG_private_2 is set > =A0=A0=A0=A0=A0=A0fscache, cachefiles: Add alternate API to use kiocb f= or read/write to cache > =A0=A0=A0=A0=A0=A0afs: Disable use of the fscache I/O routines > =A0=A0=A0=A0=A0=A0afs: Pass page into dirty region helpers to provide T= HP size > =A0=A0=A0=A0=A0=A0afs: Print the operation debug_id when logging an une= xpected data version > =A0=A0=A0=A0=A0=A0afs: Move key to afs_read struct > =A0=A0=A0=A0=A0=A0afs: Don't truncate iter during data fetch > =A0=A0=A0=A0=A0=A0afs: Log remote unmarshalling errors > =A0=A0=A0=A0=A0=A0afs: Set up the iov_iter before calling afs_extract_d= ata() > =A0=A0=A0=A0=A0=A0afs: Use ITER_XARRAY for writing > =A0=A0=A0=A0=A0=A0afs: Wait on PG_fscache before modifying/releasing a = page > =A0=A0=A0=A0=A0=A0afs: Extract writeback extension into its own functio= n > =A0=A0=A0=A0=A0=A0afs: Prepare for use of THPs > =A0=A0=A0=A0=A0=A0afs: Use the fs operation ops to handle FetchData com= pletion > =A0=A0=A0=A0=A0=A0afs: Use new fscache read helper API >=20 > Jeff Layton (6): > =A0=A0=A0=A0=A0=A0ceph: disable old fscache readpage handling > =A0=A0=A0=A0=A0=A0ceph: rework PageFsCache handling > =A0=A0=A0=A0=A0=A0ceph: fix fscache invalidation > =A0=A0=A0=A0=A0=A0ceph: convert readpage to fscache read helper > =A0=A0=A0=A0=A0=A0ceph: plug write_begin into read helper > =A0=A0=A0=A0=A0=A0ceph: convert ceph_readpages to ceph_readahead >=20 >=20 > =A0fs/Kconfig | 1 + > =A0fs/Makefile | 1 + > =A0fs/afs/Kconfig | 1 + > =A0fs/afs/dir.c | 225 ++++--- > =A0fs/afs/file.c | 470 ++++--------- > =A0fs/afs/fs_operation.c | 4 +- > =A0fs/afs/fsclient.c | 108 +-- > =A0fs/afs/inode.c | 7 +- > =A0fs/afs/internal.h | 58 +- > =A0fs/afs/rxrpc.c | 150 ++--- > =A0fs/afs/write.c | 610 +++++++++-------- > =A0fs/afs/yfsclient.c | 82 +-- > =A0fs/cachefiles/Makefile | 1 + > =A0fs/cachefiles/interface.c | 5 +- > =A0fs/cachefiles/internal.h | 9 + > =A0fs/cachefiles/rdwr2.c | 412 ++++++++++++ > =A0fs/ceph/Kconfig | 1 + > =A0fs/ceph/addr.c | 535 ++++++--------- > =A0fs/ceph/cache.c | 125 ---- > =A0fs/ceph/cache.h | 101 +-- > =A0fs/ceph/caps.c | 10 +- > =A0fs/ceph/inode.c | 1 + > =A0fs/ceph/super.h | 1 + > =A0fs/fscache/Kconfig | 1 + > =A0fs/fscache/Makefile | 3 +- > =A0fs/fscache/internal.h | 3 + > =A0fs/fscache/page.c | 2 +- > =A0fs/fscache/page2.c | 117 ++++ > =A0fs/fscache/stats.c | 1 + > =A0fs/internal.h | 5 - > =A0fs/netfs/Kconfig | 23 + > =A0fs/netfs/Makefile | 5 + > =A0fs/netfs/internal.h | 97 +++ > =A0fs/netfs/read_helper.c | 1169 +++++++++++++++++++++++++++++++= ++ > =A0fs/netfs/stats.c | 59 ++ > =A0fs/read_write.c | 1 + > =A0include/linux/fs.h | 1 + > =A0include/linux/fscache-cache.h | 4 + > =A0include/linux/fscache.h | 40 +- > =A0include/linux/netfs.h | 195 ++++++ > =A0include/linux/pagemap.h | 3 + > =A0include/net/af_rxrpc.h | 2 +- > =A0include/trace/events/afs.h | 74 +-- > =A0include/trace/events/netfs.h | 201 ++++++ > =A0mm/filemap.c | 20 + > =A0mm/readahead.c | 70 ++ > =A0net/rxrpc/recvmsg.c | 9 +- > =A047 files changed, 3473 insertions(+), 1550 deletions(-) > =A0create mode 100644 fs/cachefiles/rdwr2.c > =A0create mode 100644 fs/fscache/page2.c > =A0create mode 100644 fs/netfs/Kconfig > =A0create mode 100644 fs/netfs/Makefile > =A0create mode 100644 fs/netfs/internal.h > =A0create mode 100644 fs/netfs/read_helper.c > =A0create mode 100644 fs/netfs/stats.c > =A0create mode 100644 include/linux/netfs.h > =A0create mode 100644 include/trace/events/netfs.h >=20 >=20 Thanks David, I did an xfstests run on ceph with a kernel based on this and it seemed to do fine. I'll plan to pull this into the ceph-client/testing branch and run it through the ceph kclient test harness. There are only a few differences from the last run we did, so I'm not expecting big changes, but I'll keep you posted. --=20 Jeff Layton