linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andy Lutomirski <luto@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dave Chinner <david@fromorbit.com>,
	Jiri Kosina <jikos@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jann Horn <jannh@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Linux-MM <linux-mm@kvack.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged
Date: Wed, 9 Jan 2019 21:26:41 -0800	[thread overview]
Message-ID: <CALCETrWxwaBUYMg=aLySJByMgXzuzV4gHS0n6O6Oet2Jm6SAbw@mail.gmail.com> (raw)
In-Reply-To: <CAHk-=wg1jSQ-gq-M3+HeTBbDs1VCjyiwF4gqnnBhHeWizyrigg@mail.gmail.com>

On Wed, Jan 9, 2019 at 5:18 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Wed, Jan 9, 2019 at 4:44 PM Dave Chinner <david@fromorbit.com> wrote:
> >
> > I wouldn't look at ext4 as an example of a reliable, problem free
> > direct IO implementation because, historically speaking, it's been a
> > series of nasty hacks (*cough* mount -o dioread_nolock *cough*) and
> > been far worse than XFS from data integrity, performance and
> > reliability perspectives.
>
> That's some big words from somebody who just admitted to much worse hacks.
>
> Seriously. XFS is buggy in this regard, ext4 apparently isn't.
>
> Thinking that it's better to just invalidate the cache  for direct IO
> reads is all kinds of odd.
>

This whole discussion seems to have gone a little bit off the rails...

Linus, I think I agree with Dave's overall sentiment, though, and I
think you should consider reverting your patch.  Here's why.  The
basic idea behind the attack is that the authors found efficient ways
to do two things: evict a page from page cache and detect, *without
filling the cache*, whether a page is cached.  The combination lets an
attacker efficiently tell when another process reads a page.  We need
to keep in mind that this attack is a sophisticated attack, and anyone
using it won't have any problem using a nontrivial way to detect
whether a page is in page cache.

So, unless we're going to try for real to make it hard to tell whether
a page is cached without causing that page to become cached, it's not
worth playing whack-a-mole.  And, right now, mincore is whacking a
mole.  RWF_NOWAIT appears to do essentially the same thing at very
little cost.  I haven't really dug in, but I assume that various
prefaulting tricks combined with various pagetable probing tricks can
do similar things, but that's at least a *lot* more complicated.

So unless we're going to lock down RWF_NOWAIT as well, I see no reason
to lock down mincore().  Direct IO is a red herring -- O_DIRECT is
destructive enough that it seems likely to make the attacks a lot less
efficient.


--- begin digression ---

Since direct IO has been brought up, I have a question.  I've wondered
for years why direct IO works the way it does.  If I were implementing
it from scratch, my first inclination would be to use the page cache
instead of fighting it.  To do a single-page direct read, I would look
that page up in the page cache (i.e. i_pages these days).  If the page
is there, I would do a normal buffered read.  If the page is not
there, I would insert a record into i_pages indicating that direct IO
is in progress and then I would do the IO into the destination page.
If any other read, direct or otherwise, sees a record saying "under
direct IO", it would wait.

To do a single-page direct write, I would look it up in i_pages.  If
it's there, I would do a buffered write followed by a sync (because
applications expect a sync).  If it's not there, I would again add a
record saying "under direct IO" and do the IO.

The idea is that this works as much like buffered IO as possible,
except that the pages backing the IO aren't normal sharable page cache
pages.

The multi-page case would be just an optimization on top of the
single-page case.  The idea would be to somehow mark i_pages with
entire extents under direct IO.  It's a radix tree -- this can, at
least in theory, be done efficiently.  As long as all direct IO
operations run in increasing order of offset, there shouldn't be lock
ordering problems.

Other than history and possibly performance, is there any reason that
direct IO doesn't work this way?

P.S. What, if anything, prevents direct writes from causing trouble
when the underlying FS or backing store needs stable pages?
Similarly, what, if anything, prevents direct reads from temporarily
exposing unintended data to user code if the fs or underlying device
transforms the data during the read process (e.g. by decrypting
something)?

--- end digression ---

  parent reply	other threads:[~2019-01-10  5:26 UTC|newest]

Thread overview: 215+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-05 17:27 [PATCH] mm/mincore: allow for making sys_mincore() privileged Jiri Kosina
2019-01-05 17:27 ` Jiri Kosina
2019-01-05 19:14 ` Vlastimil Babka
2019-01-05 19:24   ` Jiri Kosina
2019-01-05 19:24     ` Jiri Kosina
2019-01-05 19:38     ` Vlastimil Babka
2019-01-08  9:14       ` Bernd Petrovitsch
2019-01-08 11:37         ` Jiri Kosina
2019-01-08 11:37           ` Jiri Kosina
2019-01-08 13:53           ` Bernd Petrovitsch
2019-01-08 14:08             ` Kirill A. Shutemov
2019-01-05 19:44 ` kbuild test robot
2019-01-05 19:46 ` Linus Torvalds
2019-01-05 19:46   ` Linus Torvalds
2019-01-05 20:12   ` Jiri Kosina
2019-01-05 20:12     ` Jiri Kosina
2019-01-05 20:17     ` Linus Torvalds
2019-01-05 20:17       ` Linus Torvalds
2019-01-05 20:43       ` Jiri Kosina
2019-01-05 20:43         ` Jiri Kosina
2019-01-05 21:54         ` Linus Torvalds
2019-01-05 21:54           ` Linus Torvalds
2019-01-06 11:33           ` Kevin Easton
2019-01-08  8:50           ` Kevin Easton
2019-01-18 14:23           ` Tejun Heo
2019-01-05 20:13   ` Linus Torvalds
2019-01-05 20:13     ` Linus Torvalds
2019-01-05 19:56 ` kbuild test robot
2019-01-05 22:54 ` Jann Horn
2019-01-05 22:54   ` Jann Horn
2019-01-05 23:05   ` Linus Torvalds
2019-01-05 23:05     ` Linus Torvalds
2019-01-05 23:16     ` Linus Torvalds
2019-01-05 23:16       ` Linus Torvalds
2019-01-05 23:28       ` Linus Torvalds
2019-01-05 23:28         ` Linus Torvalds
2019-01-05 23:39       ` Linus Torvalds
2019-01-05 23:39         ` Linus Torvalds
2019-01-06  0:11         ` Matthew Wilcox
2019-01-06  0:22           ` Linus Torvalds
2019-01-06  0:22             ` Linus Torvalds
2019-01-06  1:50             ` Linus Torvalds
2019-01-06  1:50               ` Linus Torvalds
2019-01-06 21:46               ` Linus Torvalds
2019-01-06 21:46                 ` Linus Torvalds
2019-01-08  4:43                 ` Dave Chinner
2019-01-08 17:57                   ` Linus Torvalds
2019-01-08 17:57                     ` Linus Torvalds
2019-01-09  2:24                     ` Dave Chinner
2019-01-09  2:31                       ` Jiri Kosina
2019-01-09  2:31                         ` Jiri Kosina
2019-01-09  4:39                         ` Dave Chinner
2019-01-09 10:08                           ` Jiri Kosina
2019-01-09 10:08                             ` Jiri Kosina
2019-01-10  1:15                             ` Dave Chinner
2019-01-10  7:54                               ` Jiri Kosina
2019-01-10  7:54                                 ` Jiri Kosina
2019-01-09 18:25                           ` Linus Torvalds
2019-01-09 18:25                             ` Linus Torvalds
2019-01-10  0:44                             ` Dave Chinner
2019-01-10  1:18                               ` Linus Torvalds
2019-01-10  1:18                                 ` Linus Torvalds
2019-01-10  5:26                                 ` Andy Lutomirski [this message]
2019-01-10  5:26                                   ` Andy Lutomirski
2019-01-10 14:47                                   ` Matthew Wilcox
2019-01-10 21:44                                     ` Dave Chinner
2019-01-10 21:59                                       ` Linus Torvalds
2019-01-10 21:59                                         ` Linus Torvalds
2019-01-11  1:47                                   ` Dave Chinner
2019-01-10  7:03                                 ` Dave Chinner
2019-01-10 11:47                                   ` Linus Torvalds
2019-01-10 11:47                                     ` Linus Torvalds
2019-01-10 12:24                                     ` Dominique Martinet
2019-01-10 12:24                                       ` Dominique Martinet
2019-01-10 22:11                                       ` Linus Torvalds
2019-01-10 22:11                                         ` Linus Torvalds
2019-01-11  2:03                                         ` Dave Chinner
2019-01-11  2:18                                           ` Linus Torvalds
2019-01-11  2:18                                             ` Linus Torvalds
2019-01-11  4:04                                             ` Dave Chinner
2019-01-11  4:08                                               ` Andy Lutomirski
2019-01-11  7:20                                                 ` Dave Chinner
2019-01-11  7:08                                               ` Linus Torvalds
2019-01-11  7:08                                                 ` Linus Torvalds
2019-01-11  7:36                                                 ` Dave Chinner
2019-01-11 16:26                                                   ` Linus Torvalds
2019-01-11 16:26                                                     ` Linus Torvalds
2019-01-15 23:45                                                     ` Dave Chinner
2019-01-16  4:54                                                       ` Linus Torvalds
2019-01-16  4:54                                                         ` Linus Torvalds
2019-01-16  5:49                                                         ` Linus Torvalds
2019-01-16  5:49                                                           ` Linus Torvalds
2019-01-17  1:26                                                         ` Dave Chinner
2019-02-20 15:49                                                     ` Nicolai Stange
2019-01-11  4:57                                         ` Dominique Martinet
2019-01-11  7:11                                           ` Linus Torvalds
2019-01-11  7:11                                             ` Linus Torvalds
2019-01-11  7:32                                             ` Dominique Martinet
2019-01-16  0:42                                         ` Josh Snyder
2019-01-16  5:00                                           ` Linus Torvalds
2019-01-16  5:00                                             ` Linus Torvalds
2019-01-16  5:25                                             ` Andy Lutomirski
2019-01-16  5:34                                               ` Linus Torvalds
2019-01-16  5:34                                                 ` Linus Torvalds
2019-01-16  5:46                                                 ` Dominique Martinet
2019-01-16  5:58                                                   ` Linus Torvalds
2019-01-16  5:58                                                     ` Linus Torvalds
2019-01-16  6:34                                                     ` Dominique Martinet
2019-01-16  7:52                                                       ` Josh Snyder
2019-01-16  7:52                                                         ` Josh Snyder
2019-01-16 12:18                                                         ` Kevin Easton
2019-01-17 21:45                                                         ` Vlastimil Babka
2019-01-18  4:49                                                           ` Linus Torvalds
2019-01-18  4:49                                                             ` Linus Torvalds
2019-01-18 18:58                                                             ` Vlastimil Babka
2019-01-16 16:12                                                     ` Jiri Kosina
2019-01-16 16:12                                                       ` Jiri Kosina
2019-01-16 17:48                                                       ` Linus Torvalds
2019-01-16 17:48                                                         ` Linus Torvalds
2019-01-16 20:23                                                         ` Jiri Kosina
2019-01-16 20:23                                                           ` Jiri Kosina
2019-01-16 21:37                                                           ` Matthew Wilcox
2019-01-16 21:41                                                             ` Jiri Kosina
2019-01-16 21:41                                                               ` Jiri Kosina
2019-01-17  9:52                                                               ` Cyril Hrubis
2019-01-28 13:49                                                               ` Cyril Hrubis
2019-01-17  4:51                                                             ` Linus Torvalds
2019-01-17  4:51                                                               ` Linus Torvalds
2019-01-18  4:54                                                               ` Linus Torvalds
2019-01-18  4:54                                                                 ` Linus Torvalds
2019-01-17  1:49                                                           ` Dominique Martinet
2019-01-23 20:27                                                           ` Linus Torvalds
2019-01-23 20:27                                                             ` Linus Torvalds
2019-01-23 20:35                                                             ` Linus Torvalds
2019-01-23 20:35                                                               ` Linus Torvalds
2019-01-23 23:12                                                               ` Jiri Kosina
2019-01-23 23:12                                                                 ` Jiri Kosina
2019-01-24  0:20                                                                 ` Linus Torvalds
2019-01-24  0:20                                                                   ` Linus Torvalds
2019-01-24  0:24                                                             ` Dominique Martinet
2019-01-24 12:45                                                               ` Dominique Martinet
2019-01-24 14:25                                                                 ` Jiri Kosina
2019-01-24 14:25                                                                   ` Jiri Kosina
2019-01-27 22:35                                                                   ` Jiri Kosina
2019-01-27 22:35                                                                     ` Jiri Kosina
2019-01-28  0:05                                                                     ` Dominique Martinet
2019-01-29 23:52                                                                       ` Jiri Kosina
2019-01-30  9:09                                                                         ` Michal Hocko
2019-01-30 12:29                                                                           ` Jiri Kosina
2019-01-16 12:36                                             ` Matthew Wilcox
2019-01-10 14:50                               ` Matthew Wilcox
2019-01-11  7:36                               ` Jiri Kosina
2019-01-11  7:36                                 ` Jiri Kosina
2019-01-17  2:22                                 ` Dave Chinner
2019-01-17  8:18                                   ` Jiri Kosina
2019-01-17  8:18                                     ` Jiri Kosina
2019-01-17 21:06                                     ` Dave Chinner
2019-01-07  4:32             ` Dominique Martinet
2019-01-07 10:33               ` Vlastimil Babka
2019-01-07 11:08                 ` Dominique Martinet
2019-01-07 11:59                   ` Vlastimil Babka
2019-01-07 13:29                   ` Daniel Gruss
2019-01-07 10:10         ` Michael Ellerman
2019-01-05 23:09   ` Jiri Kosina
2019-01-05 23:09     ` Jiri Kosina
2019-01-30 12:44 ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Vlastimil Babka
2019-01-30 12:44   ` [PATCH 1/3] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-01-31  9:43     ` Michal Hocko
2019-01-31  9:51       ` Dominique Martinet
2019-01-31 17:46       ` Josh Snyder
2019-02-01  8:56     ` Vlastimil Babka
2019-03-06 23:13     ` Andrew Morton
2019-03-07  0:01       ` Jiri Kosina
2019-03-07  0:40         ` Dominique Martinet
2019-03-07  5:46           ` Jiri Kosina
2019-01-30 12:44   ` [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O Vlastimil Babka
2019-01-30 15:04     ` Florian Weimer
2019-01-30 15:15       ` Jiri Kosina
2019-01-31 10:47         ` Florian Weimer
2019-01-31 11:34           ` Jiri Kosina
2019-01-31  9:56     ` Michal Hocko
2019-01-31 10:15       ` Jiri Kosina
2019-01-31 10:23         ` Michal Hocko
2019-01-31 10:30           ` Jiri Kosina
2019-01-31 11:32             ` Michal Hocko
2019-01-31 17:54           ` Linus Torvalds
2019-02-01  5:13             ` Dave Chinner
2019-02-01  7:05               ` Linus Torvalds
2019-02-01  7:21                 ` Linus Torvalds
2019-02-01  1:44       ` Dave Chinner
2019-02-12 15:48         ` Jiri Kosina
2019-01-31 12:04     ` Daniel Gruss
2019-01-31 12:06       ` Vlastimil Babka
2019-01-31 12:08       ` Jiri Kosina
2019-01-31 12:57         ` Daniel Gruss
2019-01-30 12:44   ` [PATCH 3/3] mm/mincore: provide mapped status when cached status is not allowed Vlastimil Babka
2019-01-31 10:09     ` Michal Hocko
2019-02-01  9:04       ` Vlastimil Babka
2019-02-01  9:11         ` Michal Hocko
2019-02-01  9:27           ` Vlastimil Babka
2019-02-06 20:14             ` Jiri Kosina
2019-02-12  3:44         ` Jiri Kosina
2019-02-12  6:36           ` Michal Hocko
2019-02-12 13:09             ` Jiri Kosina
2019-02-12 14:01               ` Michal Hocko
2019-03-06 12:11   ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Jiri Kosina
2019-03-06 22:35     ` Andrew Morton
2019-03-06 22:48       ` Jiri Kosina
2019-03-06 23:23         ` Andrew Morton
2019-03-06 23:32           ` Dominique Martinet
2019-03-06 23:38             ` Andrew Morton
2019-03-09 16:53               ` Linus Torvalds
2019-03-12 14:17   ` [PATCH v2 0/2] prevent mincore() page cache leaks Vlastimil Babka
2019-03-12 14:17     ` [PATCH v2 1/2] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-03-12 14:17     ` [PATCH v2 2/2] mm/mincore: provide mapped status when cached status is not allowed Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CALCETrWxwaBUYMg=aLySJByMgXzuzV4gHS0n6O6Oet2Jm6SAbw@mail.gmail.com' \
    --to=luto@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=jikos@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).