All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Theodore Ts'o" <tytso@mit.edu>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Eric Biggers <ebiggers@kernel.org>,
	Lee Jones <lee.jones@linaro.org>,
	linux-ext4@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Dave Chinner <dchinner@redhat.com>,
	Goldwyn Rodrigues <rgoldwyn@suse.com>,
	"Darrick J . Wong" <darrick.wong@oracle.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Damien Le Moal <damien.lemoal@wdc.com>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Ritesh Harjani <riteshh@linux.ibm.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Johannes Thumshirn <jth@kernel.org>,
	linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	cluster-devel@redhat.com, linux-kernel@vger.kernel.org
Subject: Re: [PATCH -v3] ext4: don't BUG if kernel subsystems dirty pages without asking ext4 first
Date: Fri, 25 Feb 2022 20:40:36 -0500	[thread overview]
Message-ID: <YhmFFMwvOaMiNBTQ@mit.edu> (raw)
In-Reply-To: <303059e6-3a33-99cb-2952-82fe8079fa45@nvidia.com>

On Fri, Feb 25, 2022 at 04:41:14PM -0800, John Hubbard wrote:
> 
> > f2fs and btrfs's compressed file write support, by making things work
> > much like the write(2) system call.  Imagine if we had a
> > "pin_user_pages_local()" which calls write_begin(), and a
> > "unpin_user_pages_local()" which calls write_end(), and the
> 
> Right, that would supply the missing connection to the filesystems.
> 
> In fact, maybe these names about right:
> 
>     pin_user_file_pages()
>     unpin_user_file_pages()
> 
> ...and then put them in a filesystem header file, because these are now
> tightly coupled to filesystems, what with the need to call
> .write_begin() and .write_end().

Well, that makes it process_vm_writev()'s is that it needs to know
when to call pin_user_file_pages().  I suspect that for many use cases
--- for example, if this is being used by a debugger to modify a
variable on a stack, or an anonymous page in the program's data
segment, process_vm_writev() *isn't* actually pinning a file.  So they
want some kind of interface that automatically DTRT regardless of
whether the user pages being edited are file-backed or not
file-backed.

So some kind of [un]pin_user_pages_local() which will call
write_{begin,end}() if necessary would be the most convenient for
users such as process_vm_writev().   

And perhaps would it make sense for pin_user_pages to optionally (or
by default?) check for file-backed pages, and if it finds any, return
an error or stop pinning pages at that point, so the system call can
return EOPNOSUPP to the user, instead of silently causing user data to
be lost or corrupted as is currently the case with xfs and btrfs (and
ext4 once I patch it so it doesn't BUG).

I'll note that at least one caller of pin_user_pages, in fs/io_uring.c
takes it upon itself to check for file-backed pages, and returns
EOPNOTSUPP if there are any found.  Many that should be lifted to
pin_user_pages()?

For that matter, maybe pin_user_pages() and friends should take some
new FOLL_ flags to indicate whether file-backed pages should be
rejected, or perhaps they can promise they will only be holding the
pin for a very short amount of time (FOLL_SHORTERM?), and then
pin_user_pages() and unpin_user_pages() can automagically call
write_begin() and write_end() if necessary?  I dunno....

	      	  	      	 	       - Ted

WARNING: multiple messages have this Message-ID (diff)
From: Theodore Ts'o <tytso@mit.edu>
To: cluster-devel.redhat.com
Subject: [Cluster-devel] [PATCH -v3] ext4: don't BUG if kernel subsystems dirty pages without asking ext4 first
Date: Fri, 25 Feb 2022 20:40:36 -0500	[thread overview]
Message-ID: <YhmFFMwvOaMiNBTQ@mit.edu> (raw)
In-Reply-To: <303059e6-3a33-99cb-2952-82fe8079fa45@nvidia.com>

On Fri, Feb 25, 2022 at 04:41:14PM -0800, John Hubbard wrote:
> 
> > f2fs and btrfs's compressed file write support, by making things work
> > much like the write(2) system call.  Imagine if we had a
> > "pin_user_pages_local()" which calls write_begin(), and a
> > "unpin_user_pages_local()" which calls write_end(), and the
> 
> Right, that would supply the missing connection to the filesystems.
> 
> In fact, maybe these names about right:
> 
>     pin_user_file_pages()
>     unpin_user_file_pages()
> 
> ...and then put them in a filesystem header file, because these are now
> tightly coupled to filesystems, what with the need to call
> .write_begin() and .write_end().

Well, that makes it process_vm_writev()'s is that it needs to know
when to call pin_user_file_pages().  I suspect that for many use cases
--- for example, if this is being used by a debugger to modify a
variable on a stack, or an anonymous page in the program's data
segment, process_vm_writev() *isn't* actually pinning a file.  So they
want some kind of interface that automatically DTRT regardless of
whether the user pages being edited are file-backed or not
file-backed.

So some kind of [un]pin_user_pages_local() which will call
write_{begin,end}() if necessary would be the most convenient for
users such as process_vm_writev().   

And perhaps would it make sense for pin_user_pages to optionally (or
by default?) check for file-backed pages, and if it finds any, return
an error or stop pinning pages at that point, so the system call can
return EOPNOSUPP to the user, instead of silently causing user data to
be lost or corrupted as is currently the case with xfs and btrfs (and
ext4 once I patch it so it doesn't BUG).

I'll note that at least one caller of pin_user_pages, in fs/io_uring.c
takes it upon itself to check for file-backed pages, and returns
EOPNOTSUPP if there are any found.  Many that should be lifted to
pin_user_pages()?

For that matter, maybe pin_user_pages() and friends should take some
new FOLL_ flags to indicate whether file-backed pages should be
rejected, or perhaps they can promise they will only be holding the
pin for a very short amount of time (FOLL_SHORTERM?), and then
pin_user_pages() and unpin_user_pages() can automagically call
write_begin() and write_end() if necessary?  I dunno....

	      	  	      	 	       - Ted



  reply	other threads:[~2022-02-26  1:41 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 16:31 [REPORT] kernel BUG at fs/ext4/inode.c:2620 - page_buffers() Lee Jones
2022-02-16 16:31 ` [Cluster-devel] " Lee Jones
2022-02-18  1:06 ` John Hubbard
2022-02-18  1:06   ` [Cluster-devel] " John Hubbard
2022-02-18  4:08   ` Theodore Ts'o
2022-02-18  4:08     ` [Cluster-devel] " Theodore Ts'o
2022-02-18  6:33     ` John Hubbard
2022-02-18  6:33       ` [Cluster-devel] " John Hubbard
2022-02-23 23:31       ` Theodore Ts'o
2022-02-23 23:31         ` [Cluster-devel] " Theodore Ts'o
2022-02-24  0:44         ` John Hubbard
2022-02-24  0:44           ` [Cluster-devel] " John Hubbard
2022-02-24  4:04           ` Theodore Ts'o
2022-02-24  4:04             ` [Cluster-devel] " Theodore Ts'o
2022-02-18  7:51     ` Greg Kroah-Hartman
2022-02-18  7:51       ` [Cluster-devel] " Greg Kroah-Hartman
2022-02-23 23:35       ` Theodore Ts'o
2022-02-23 23:35         ` [Cluster-devel] " Theodore Ts'o
2022-02-24  1:48         ` Dave Chinner
2022-02-24  1:48           ` [Cluster-devel] " Dave Chinner
2022-02-24  3:50           ` Theodore Ts'o
2022-02-24  3:50             ` [Cluster-devel] " Theodore Ts'o
2022-02-24 10:29             ` Dave Chinner
2022-02-24 10:29               ` [Cluster-devel] " Dave Chinner
2022-02-18  2:54 ` Theodore Ts'o
2022-02-18  2:54   ` [Cluster-devel] " Theodore Ts'o
2022-02-18  4:24   ` Matthew Wilcox
2022-02-18  4:24     ` [Cluster-devel] " Matthew Wilcox
2022-02-18  6:03     ` Theodore Ts'o
2022-02-18  6:03       ` [Cluster-devel] " Theodore Ts'o
2022-02-25 19:24 ` [PATCH -v2] ext4: don't BUG if kernel subsystems dirty pages without asking ext4 first Theodore Ts'o
2022-02-25 19:24   ` [Cluster-devel] " Theodore Ts'o
2022-02-25 20:51   ` Eric Biggers
2022-02-25 20:51     ` [Cluster-devel] " Eric Biggers
2022-02-25 21:08     ` Theodore Ts'o
2022-02-25 21:08       ` [Cluster-devel] " Theodore Ts'o
2022-02-25 21:23       ` [PATCH -v3] " Theodore Ts'o
2022-02-25 21:23         ` [Cluster-devel] " Theodore Ts'o
2022-02-25 21:33         ` John Hubbard
2022-02-25 21:33           ` [Cluster-devel] " John Hubbard
2022-02-25 23:21           ` Theodore Ts'o
2022-02-25 23:21             ` [Cluster-devel] " Theodore Ts'o
2022-02-26  0:18             ` Hillf Danton
2022-02-26  0:41             ` John Hubbard
2022-02-26  0:41               ` [Cluster-devel] " John Hubbard
2022-02-26  1:40               ` Theodore Ts'o [this message]
2022-02-26  1:40                 ` Theodore Ts'o
2022-02-26  2:00                 ` Theodore Ts'o
2022-02-26  2:00                   ` [Cluster-devel] " Theodore Ts'o
2022-02-26  2:55                 ` John Hubbard
2022-02-26  2:55                   ` [Cluster-devel] " John Hubbard
2022-03-03  4:26         ` [PATCH -v4] " Theodore Ts'o
2022-03-03  4:26           ` [Cluster-devel] " Theodore Ts'o
2022-03-03  8:21           ` Christoph Hellwig
2022-03-03  8:21             ` [Cluster-devel] " Christoph Hellwig
2022-03-03  9:21           ` Lee Jones
2022-03-03  9:21             ` [Cluster-devel] " Lee Jones
2022-03-03 14:38           ` [PATCH -v5] ext4: don't BUG if someone " Theodore Ts'o
2022-03-03 14:38             ` [Cluster-devel] " Theodore Ts'o

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YhmFFMwvOaMiNBTQ@mit.edu \
    --to=tytso@mit.edu \
    --cc=agruenba@redhat.com \
    --cc=cluster-devel@redhat.com \
    --cc=damien.lemoal@wdc.com \
    --cc=darrick.wong@oracle.com \
    --cc=dchinner@redhat.com \
    --cc=ebiggers@kernel.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=jhubbard@nvidia.com \
    --cc=jth@kernel.org \
    --cc=lee.jones@linaro.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=rgoldwyn@suse.com \
    --cc=riteshh@linux.ibm.com \
    --cc=rpeterso@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.