linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Laight <David.Laight@ACULAB.COM>
To: 'David Howells' <dhowells@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>
Cc: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	Ted Ts'o <tytso@mit.edu>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"willy@infradead.org" <willy@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-ext4@vger.kernel.org" <linux-ext4@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: Do we need to unrevert "fs: do not prefault sys_write() user buffer pages"?
Date: Tue, 22 Jun 2021 21:55:09 +0000	[thread overview]
Message-ID: <7a6d8c55749d46d09f6f6e27a99fde36@AcuMS.aculab.com> (raw)
In-Reply-To: <3225322.1624379221@warthog.procyon.org.uk>

From: David Howells
> Sent: 22 June 2021 17:27
> 
> Al Viro <viro@zeniv.linux.org.uk> wrote:
> 
> > On Tue, Jun 22, 2021 at 04:20:40PM +0100, David Howells wrote:
> >
> > > and wondering if the iov_iter_fault_in_readable() is actually effective.
> > > Yes, it can make sure that the page we're intending to modify is dragged
> > > into the pagecache and marked uptodate so that it can be read from, but is
> > > it possible for the page to then get reclaimed before we get to
> > > iov_iter_copy_from_user_atomic()?  a_ops->write_begin() could potentially
> > > take a long time, say if it has to go and get a lock/lease from a server.
> >
> > Yes, it is.  So what?  We'll just retry.  You *can't* take faults while
> > holding some pages locked; not without shitloads of deadlocks.
> 
> In that case, can we amend the comment immediately above
> iov_iter_fault_in_readable()?
> 
> 	/*
> 	 * Bring in the user page that we will copy from _first_.
> 	 * Otherwise there's a nasty deadlock on copying from the
> 	 * same page as we're writing to, without it being marked
> 	 * up-to-date.
> 	 *
> 	 * Not only is this an optimisation, but it is also required
> 	 * to check that the address is actually valid, when atomic
> 	 * usercopies are used, below.
> 	 */
> 	if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
> 
> The first part suggests this is for deadlock avoidance.  If that's not true,
> then this should perhaps be changed.

I'd say something like:
	/*
	 * The actual copy_from_user() is done with a lock held
	 * so cannot fault in missing pages.
	 * So fault in the pages first.
	 * If they get paged out the inatomic usercopy will fail
	 * and the whole operation is retried.
	 *
	 * Hopefully there are enough memory pages available to
	 * stop this looping forever.
	 */

It is perfectly possible for another application thread to
invalidate one of the buffer fragments after iov_iter_fault_in_readable()
return success - so it will then fail on the second pass.

The maximum number of pages required is twice the maximum number
of iov fragments.
If the system is crawling along with no available memory pages
the same physical page could get used for two user pages.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


  reply	other threads:[~2021-06-22 21:55 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22 15:20 Do we need to unrevert "fs: do not prefault sys_write() user buffer pages"? David Howells
2021-06-22 15:27 ` Al Viro
2021-06-22 15:36   ` Al Viro
2021-06-22 17:25     ` Matthew Wilcox
2021-06-22 17:39       ` Linus Torvalds
2021-06-22 17:55       ` David Howells
2021-06-22 18:04         ` Matthew Wilcox
2021-06-22 18:07           ` Linus Torvalds
2021-06-22 18:16             ` Nadav Amit
2021-06-22 18:23             ` Matthew Wilcox
2021-06-22 18:28               ` Linus Torvalds
2021-06-22 18:36                 ` Matthew Wilcox
2021-06-22 18:51                   ` Nadav Amit
2021-06-22 18:57                     ` Linus Torvalds
2021-06-22 18:23           ` David Howells
2021-06-22 18:32             ` Linus Torvalds
2021-06-22 18:13         ` David Howells
2021-06-22 15:32 ` Linus Torvalds
2021-06-22 15:53   ` Linus Torvalds
2021-06-22 15:32 ` Matthew Wilcox
2021-06-22 16:27 ` David Howells
2021-06-22 21:55   ` David Laight [this message]
2021-06-22 22:04     ` Matthew Wilcox
2021-06-22 22:31       ` David Laight
2021-06-22 22:20     ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7a6d8c55749d46d09f6f6e27a99fde36@AcuMS.aculab.com \
    --to=david.laight@aculab.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@linux.intel.com \
    --cc=dhowells@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).