[RFC,0/3] Rework the gfs2 read and page fault locking
mbox series

Message ID 20191122235324.17245-1-agruenba@redhat.com
Headers show
Series
  • Rework the gfs2 read and page fault locking
Related show

Message

Andreas Gruenbacher Nov. 22, 2019, 11:53 p.m. UTC
Hello,

this patch series moves the glock lock taking in gfs2 from the
->readpage and ->readpages inode operations to the ->read_iter file and
->fault vm operations.  To achieve that, we add flags to the
generic_file_read_iter and filemap_fault generic helpers.

This proposal was triggered by the following discussion:

https://lore.kernel.org/linux-fsdevel/157225677483.3442.4227193290486305330.stgit@buzz/

In that thread, Linus argued that filesystems should make sure the inode
size is sufficiently up-to-date before calling the generic helpers, and
that filesystems can do it themselves if they want more than that.
That's surely doable.  However, implementing those operations properly
at the filesystem level quickly becomes complicated when it gets to
things like readahead.  In addition, those slightly modified copies of
those helpers would surely diverge from their originals over time, and
maintaining them properly would become hard.  So I hope the relatively
small changes to make the original helpers slightly more flexible will
be acceptable instead.

With the IOCB_CACHED flag added by one of the patches in this series,
the code that Konstantin's initial patch adds to
generic_file_buffered_read could be made conditional on the IOCB_CACHED
flag being cleared.  That way, it won't misfire on filesystems that
allow a stale inode size.  (I'm not sure if any filesystems other than
gfs2 are actually affected.)

Some additional explanation:

The cache consistency model of filesystems like gfs2 is such that if
pages are found in an inode's address space, those pages as well as the
inode size are up to date and can be used without taking any filesystem
locks.  If a page is not cached, filesystem locks must be taken before
the page can be read; this will also bring the inode size up to date.

Thus far, gfs2 has taken the filesystem locks inside the ->readpage and
->readpages address space operations.  A better approach seems to be to
take those locks earlier, in the ->read_iter file and ->fault vm
operations.  This would also avoid a lock inversion in ->readpages.

We obviously want to avoid taking the filesystem locks unnecessarily
when the pages we are looking for are already cached; otherwise, we
would cripple performance.  So we need to check if those pages are
present first.  That's actually exactly what the generic_file_read_iter
and filemap_fault helpers do already anyway, except that they will call
into ->readpage and ->readpages when they find pages missing.  Instead
of that, we'd like those helpers to return with an error code that
allows us to retry the operation after taking the filesystem locks.

Thanks,
Andreas

Andreas Gruenbacher (3):
  fs: Add IOCB_CACHED flag for generic_file_read_iter
  fs: Add FAULT_FLAG_CACHED flag for filemap_fault
  gfs2: Rework read and page fault locking

 fs/gfs2/aops.c     | 36 +++++--------------------
 fs/gfs2/file.c     | 66 ++++++++++++++++++++++++++++++++++++++++++++--
 include/linux/fs.h |  1 +
 include/linux/mm.h |  4 ++-
 mm/filemap.c       | 60 ++++++++++++++++++++++++++++++-----------
 5 files changed, 119 insertions(+), 48 deletions(-)

Comments

Kirill A. Shutemov Nov. 25, 2019, 9:17 a.m. UTC | #1
On Sat, Nov 23, 2019 at 12:53:21AM +0100, Andreas Gruenbacher wrote:
> Hello,
> 
> this patch series moves the glock lock taking in gfs2 from the
> ->readpage and ->readpages inode operations to the ->read_iter file and
> ->fault vm operations.  To achieve that, we add flags to the
> generic_file_read_iter and filemap_fault generic helpers.
> 
> This proposal was triggered by the following discussion:
> 
> https://lore.kernel.org/linux-fsdevel/157225677483.3442.4227193290486305330.stgit@buzz/
> 
> In that thread, Linus argued that filesystems should make sure the inode
> size is sufficiently up-to-date before calling the generic helpers, and
> that filesystems can do it themselves if they want more than that.
> That's surely doable.  However, implementing those operations properly
> at the filesystem level quickly becomes complicated when it gets to
> things like readahead.  In addition, those slightly modified copies of
> those helpers would surely diverge from their originals over time, and
> maintaining them properly would become hard.  So I hope the relatively
> small changes to make the original helpers slightly more flexible will
> be acceptable instead.
> 
> With the IOCB_CACHED flag added by one of the patches in this series,
> the code that Konstantin's initial patch adds to
> generic_file_buffered_read could be made conditional on the IOCB_CACHED
> flag being cleared.  That way, it won't misfire on filesystems that
> allow a stale inode size.  (I'm not sure if any filesystems other than
> gfs2 are actually affected.)
> 
> Some additional explanation:
> 
> The cache consistency model of filesystems like gfs2 is such that if
> pages are found in an inode's address space, those pages as well as the
> inode size are up to date and can be used without taking any filesystem
> locks.  If a page is not cached, filesystem locks must be taken before
> the page can be read; this will also bring the inode size up to date.
> 
> Thus far, gfs2 has taken the filesystem locks inside the ->readpage and
> ->readpages address space operations.  A better approach seems to be to
> take those locks earlier, in the ->read_iter file and ->fault vm
> operations.  This would also avoid a lock inversion in ->readpages.
> 
> We obviously want to avoid taking the filesystem locks unnecessarily
> when the pages we are looking for are already cached; otherwise, we
> would cripple performance.  So we need to check if those pages are
> present first.  That's actually exactly what the generic_file_read_iter
> and filemap_fault helpers do already anyway, except that they will call
> into ->readpage and ->readpages when they find pages missing.  Instead
> of that, we'd like those helpers to return with an error code that
> allows us to retry the operation after taking the filesystem locks.

Do you see IOCB_CACHED/FAULT_FLAG_CACHED semantics being usable for
anyting beyond gfs2?