All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Dave Chinner <david@fromorbit.com>
Cc: Dan Williams <dan.j.williams@intel.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	akpm@linux-foundation.org, Matthew Wilcox <willy@infradead.org>,
	Jan Kara <jack@suse.cz>, "Darrick J. Wong" <djwong@kernel.org>,
	Christoph Hellwig <hch@lst.de>,
	John Hubbard <jhubbard@nvidia.com>,
	linux-fsdevel@vger.kernel.org, nvdimm@lists.linux.dev,
	linux-xfs@vger.kernel.org, linux-mm@kvack.org,
	linux-ext4@vger.kernel.org
Subject: Re: [PATCH v2 05/18] xfs: Add xfs_break_layouts() to the inode eviction path
Date: Fri, 23 Sep 2022 11:38:03 +0200	[thread overview]
Message-ID: <20220923093803.nroajmvn7twuptez@quack3> (raw)
In-Reply-To: <20220923021012.GZ3600936@dread.disaster.area>

On Fri 23-09-22 12:10:12, Dave Chinner wrote:
> On Thu, Sep 22, 2022 at 05:41:08PM -0700, Dan Williams wrote:
> > Dave Chinner wrote:
> > > On Wed, Sep 21, 2022 at 07:28:51PM -0300, Jason Gunthorpe wrote:
> > > > On Thu, Sep 22, 2022 at 08:14:16AM +1000, Dave Chinner wrote:
> > > > 
> > > > > Where are these DAX page pins that don't require the pin holder to
> > > > > also hold active references to the filesystem objects coming from?
> > > > 
> > > > O_DIRECT and things like it.
> > > 
> > > O_DIRECT IO to a file holds a reference to a struct file which holds
> > > an active reference to the struct inode. Hence you can't reclaim an
> > > inode while an O_DIRECT IO is in progress to it. 
> > > 
> > > Similarly, file-backed pages pinned from user vmas have the inode
> > > pinned by the VMA having a reference to the struct file passed to
> > > them when they are instantiated. Hence anything using mmap() to pin
> > > file-backed pages (i.e. applications using FSDAX access from
> > > userspace) should also have a reference to the inode that prevents
> > > the inode from being reclaimed.
> > > 
> > > So I'm at a loss to understand what "things like it" might actually
> > > mean. Can you actually describe a situation where we actually permit
> > > (even temporarily) these use-after-free scenarios?
> > 
> > Jason mentioned a scenario here:
> > 
> > https://lore.kernel.org/all/YyuoE8BgImRXVkkO@nvidia.com/
> > 
> > Multi-thread process where thread1 does open(O_DIRECT)+mmap()+read() and
> > thread2 does memunmap()+close() while the read() is inflight.
> 
> And, ah, what production application does this and expects to be
> able to process the result of the read() operation without getting a
> SEGV?
> 
> There's a huge difference between an unlikely scenario which we need
> to work (such as O_DIRECT IO to/from a mmap() buffer at a different
> offset on the same file) and this sort of scenario where even if we
> handle it correctly, the application can't do anything with the
> result and will crash immediately....

I'm not sure I fully follow what we are concerned about here. As you've
written above direct IO holds reference to the inode until it is completed
(through kiocb->file->inode chain). So direct IO should be safe?

I'd be more worried about stuff like vmsplice() that can add file pages
into pipe without holding inode alive in any way and keeping them there for
arbitrarily long time. Didn't we want to add FOLL_LONGTERM to gup executed
from vmsplice() to avoid issues like this?

> > Sounds plausible to me, but I have not tried to trigger it with a focus
> > test.
> 
> If there really are applications this .... broken, then it's not the
> responsibility of the filesystem to paper over the low level page
> reference tracking issues that cause it.
> 
> i.e. The underlying problem here is that memunmap() frees the VMA
> while there are still active task-based references to the pages in
> that VMA. IOWs, the VMA should not be torn down until the O_DIRECT
> read has released all the references to the pages mapped into the
> task address space.
> 
> This just doesn't seem like an issue that we should be trying to fix
> by adding band-aids to the inode life-cycle management.

I agree that freeing VMA while there are pinned pages is ... inconvenient.
But that is just how gup works since the beginning - the moment you have
struct page reference, you completely forget about the mapping you've used
to get to the page. So anything can happen with the mapping after that
moment. And in case of pages mapped by multiple processes I can easily see
that one of the processes decides to unmap the page (and it may well be
that was the initial process that acquired page references) while others
still keep accessing the page using page references stored in some internal
structure (RDMA anyone?). I think it will be rather difficult to come up
with some scheme keeping VMA alive while there are pages pinned without
regressing userspace which over the years became very much tailored to the
peculiar gup behavior.

I can imagine we would keep *inode* referenced while there are its pages
pinned. That should not be that difficult but at least in naive
implementation that would put rather heavy stress on inode refcount under
some loads so I don't think that's useful either.

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  reply	other threads:[~2022-09-23  9:38 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-16  3:35 [PATCH v2 00/18] Fix the DAX-gup mistake Dan Williams
2022-09-16  3:35 ` [PATCH v2 01/18] fsdax: Wait on @page not @page->_refcount Dan Williams
2022-09-20 14:30   ` Jason Gunthorpe
2022-09-16  3:35 ` [PATCH v2 02/18] fsdax: Use dax_page_idle() to document DAX busy page checking Dan Williams
2022-09-20 14:31   ` Jason Gunthorpe
2022-09-16  3:35 ` [PATCH v2 03/18] fsdax: Include unmapped inodes for page-idle detection Dan Williams
2022-09-16  3:35 ` [PATCH v2 04/18] ext4: Add ext4_break_layouts() to the inode eviction path Dan Williams
2022-09-16  3:35 ` [PATCH v2 05/18] xfs: Add xfs_break_layouts() " Dan Williams
2022-09-18 22:57   ` Dave Chinner
2022-09-19 16:11     ` Dan Williams
2022-09-19 21:29       ` Dave Chinner
2022-09-20 16:44         ` Dan Williams
2022-09-21 22:14           ` Dave Chinner
2022-09-21 22:28             ` Jason Gunthorpe
2022-09-23  0:18               ` Dave Chinner
2022-09-23  0:41                 ` Dan Williams
2022-09-23  2:10                   ` Dave Chinner
2022-09-23  9:38                     ` Jan Kara [this message]
2022-09-23 23:06                       ` Dan Williams
2022-09-25 23:54                       ` Dave Chinner
2022-09-26 14:10                         ` Jan Kara
2022-09-29 23:33                           ` Dan Williams
2022-09-30 13:41                             ` Jan Kara
2022-09-30 17:56                               ` Dan Williams
2022-09-30 18:06                                 ` Jason Gunthorpe
2022-09-30 18:46                                   ` Dan Williams
2022-10-03  7:55                                   ` Jan Kara
2022-09-23 12:39                     ` Jason Gunthorpe
2022-09-26  0:34                       ` Dave Chinner
2022-09-26 13:04                         ` Jason Gunthorpe
2022-09-22  0:02             ` Dan Williams
2022-09-22  0:10               ` Jason Gunthorpe
2022-09-16  3:35 ` [PATCH v2 06/18] fsdax: Rework dax_layout_busy_page() to dax_zap_mappings() Dan Williams
2022-09-16  3:35 ` [PATCH v2 07/18] fsdax: Update dax_insert_entry() calling convention to return an error Dan Williams
2022-09-16  3:35 ` [PATCH v2 08/18] fsdax: Cleanup dax_associate_entry() Dan Williams
2022-09-16  3:36 ` [PATCH v2 09/18] fsdax: Rework dax_insert_entry() calling convention Dan Williams
2022-09-16  3:36 ` [PATCH v2 10/18] fsdax: Manage pgmap references at entry insertion and deletion Dan Williams
2022-09-21 14:03   ` Jason Gunthorpe
2022-09-21 15:18     ` Dan Williams
2022-09-21 21:38       ` Dan Williams
2022-09-21 22:07         ` Jason Gunthorpe
2022-09-22  0:14           ` Dan Williams
2022-09-22  0:25             ` Jason Gunthorpe
2022-09-22  2:17               ` Dan Williams
2022-09-22 17:55                 ` Jason Gunthorpe
2022-09-22 21:54                   ` Dan Williams
2022-09-23  1:36                     ` Dave Chinner
2022-09-23  2:01                       ` Dan Williams
2022-09-23 13:24                     ` Jason Gunthorpe
2022-09-23 16:29                       ` Dan Williams
2022-09-23 17:42                         ` Jason Gunthorpe
2022-09-23 19:03                           ` Dan Williams
2022-09-23 19:23                             ` Jason Gunthorpe
2022-09-27  6:07                             ` Alistair Popple
2022-09-27 12:56                               ` Jason Gunthorpe
2022-09-16  3:36 ` [PATCH v2 11/18] devdax: Minor warning fixups Dan Williams
2022-09-16  3:36 ` [PATCH v2 12/18] devdax: Move address_space helpers to the DAX core Dan Williams
2022-09-27  6:20   ` Alistair Popple
2022-09-29 22:38     ` Dan Williams
2022-09-16  3:36 ` [PATCH v2 13/18] dax: Prep mapping helpers for compound pages Dan Williams
2022-09-21 14:06   ` Jason Gunthorpe
2022-09-21 15:19     ` Dan Williams
2022-09-16  3:36 ` [PATCH v2 14/18] devdax: add PUD support to the DAX mapping infrastructure Dan Williams
2022-09-16  3:36 ` [PATCH v2 15/18] devdax: Use dax_insert_entry() + dax_delete_mapping_entry() Dan Williams
2022-09-21 14:10   ` Jason Gunthorpe
2022-09-21 15:48     ` Dan Williams
2022-09-21 22:23       ` Jason Gunthorpe
2022-09-22  0:15         ` Dan Williams
2022-09-16  3:36 ` [PATCH v2 16/18] mm/memremap_pages: Support initializing pages to a zero reference count Dan Williams
2022-09-21 15:24   ` Jason Gunthorpe
2022-09-21 23:45     ` Dan Williams
2022-09-22  0:03       ` Alistair Popple
2022-09-22  0:04       ` Jason Gunthorpe
2022-09-22  0:34         ` Dan Williams
2022-09-22  1:36           ` Alistair Popple
2022-09-22  2:34             ` Dan Williams
2022-09-26  6:17               ` Alistair Popple
2022-09-22  0:13       ` John Hubbard
2022-09-16  3:36 ` [PATCH v2 17/18] fsdax: Delete put_devmap_managed_page_refs() Dan Williams
2022-09-16  3:36 ` [PATCH v2 18/18] mm/gup: Drop DAX pgmap accounting Dan Williams
2022-09-20 14:29 ` [PATCH v2 00/18] Fix the DAX-gup mistake Jason Gunthorpe
2022-09-20 16:50   ` Dan Williams
2022-11-09  0:20 ` Andrew Morton
2022-11-09 11:38   ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220923093803.nroajmvn7twuptez@quack3 \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=david@fromorbit.com \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=nvdimm@lists.linux.dev \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.