linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dominique Martinet <asmadeus@codewreck.org>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jann Horn <jannh@google.com>, Jiri Kosina <jikos@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Linux-MM <linux-mm@kvack.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	daniel@gruss.cc
Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged
Date: Mon, 7 Jan 2019 12:08:27 +0100	[thread overview]
Message-ID: <20190107110827.GA15249@nautica> (raw)
In-Reply-To: <151b4ac8-5cfc-ed30-db30-e4d67a324c4b@suse.cz>

Vlastimil Babka wrote on Mon, Jan 07, 2019:
> On 1/7/19 5:32 AM, Dominique Martinet wrote:
> > Linus Torvalds wrote on Sat, Jan 05, 2019:
> >> But I think my patch to just rip out all that page lookup, and just
> >> base it on the page table state has the fundamental advantage that it
> >> gets rid of code. Maybe I should jst commit it, and see if anything
> >> breaks? We do have options in case things break, and then we'd at
> >> least know who cares (and perhaps a lot more information of _why_ they
> >> care).
> > 
> > There actually are many tools like fincore which depend on mincore to
> > try to tell whether a file is "loaded in cache" or not (I personally use
> > vmtouch[1], but I know of at least nocache[2] uses it as well to only
> > try to evict used pages)
> 
> nocache could probably do fine without mincore. IIUC the point is to not
> evict anything that was already resident prior to running some command
> wrapped in nocache. Without the mincore checks,
> posix_fadvise(POSIX_FADV_DONTNEED) will still not drop anything that
> others have mapped. That means without mincore() it will drop data
> that's in cache but not currently in use by anybody, which shouldn't
> cause large performance regressions?

Yes, I sent them a patch to allow this behaviour (which got merged ages
ago); but the default nocache usage will try to preserve pages that were
mapped before even if the user mapped it themselves so it's a little bit
more robust that just trusting linux to do the right thing. I did
various tests with fadvise dontneed and honestly the way it works is far
from obvious when operating as a single user at least; I didn't test
multi-users as that didn't really concern me.

With the current mincore change, it will think everything was "in core"
and not flush anything unless my option to just fadvise dontneed
everything is passed though ; so even if we can make it work it is a
change of behaviour that is breaking an existing application, and it has
no way of telling it didn't work.


Honestly though, as I said, mincore() is much more useful for debugging
for me ; the application can be changed if required. I just pointed it
out as it'll need changing, and it has no obvious way of testing at
runtime if the syscall works (except dumb kernel version check, but that
won't work with stable backports); so it's not that obvious.

> > FWIW I personally don't care much about "only for owner" or depending on
> > mmap options; I don't understand much of the security implications
> > honestly so I'm not sure how these limitations actually help.
> > On the other hand, a simple CAP_SYS_ADMIN check making the call take
> > either behaviour should be safe and would cover what I described above.
> 
> So without CAP_SYS_ADMIN, mincore() would return mapping status, and
> with CAP_SYS_ADMIN, it would return cache residency status? Very clumsy
> :( Maybe if we introduced mincore2() with flags similar to BSD mentioned
> earlier in the thread, and the cache residency flag would require
> CAP_SYS_ADMIN or something similar.

I agree, that's rather clumsy... Or rather might lead to some unexpected
behaviours. I'm open to other ideas.
I'm not sure how the BSD flags help though?

> > (by the way, while we are discussing permissions, a regular user can use
> > fadvise dontneed on files it doesn't own as well as long as it can open
> > them for reading; I'm not sure if that would need restricting as well in
> > the context of the security issue.
> 
> Probably not, as I've mentioned it won't evict what's mapped by somebody
> else. And eviction is also possible via controlling LRU, which is what
> the paper [1] does anyway (and also mentions that DONTNEED doesn't
> work). Being able to evict somebody's page is AFAIU not sufficient for
> attack, the side channel is about knowing that somebody brought that
> page back to RAM by touching it.

Thanks for the link to the paper, I hadn't taken the time to extract it
from the news article but it's much more interesting indeed.
Some of what's been said here makes more sense to me now (checking
about ownership and similar might indeed be good enough).

For evicting page I agree most targets would be different users so I
guess that makes sense as well; there seem to be other ways of
efficiently evicting a page from cache anyway.

(BTW, this was brought up earlier, the paper also mentions
/proc/self/pagemap as not being useful.)


> > Frankly even with mincore someone
> > could likely tell the difference through timing, if they just do it a
> > few times. Do magic, probe, flush out, repeat until satisfied.)

(I meant "without mincore", obviously, as you understood properly)

> That's my bigger concern here. In [1] there's described a remote attack
> (on webserver) using the page fault timing differences for present/not
> present page cache pages. Noisy but works, and I expect locally it to be
> much less noisy. Yet the countermeasures section only mentions
> restricting mincore() as if it was sufficient (and also how to make
> evictions harder, but that's secondary IMHO).

I'd suggest making clock rougher for non-root users but javascript tried
that and it wasn't enough... :)
Honestly won't be of much help there, good luck?

Thanks,
-- 
Dominique

  reply	other threads:[~2019-01-07 11:08 UTC|newest]

Thread overview: 161+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-05 17:27 [PATCH] mm/mincore: allow for making sys_mincore() privileged Jiri Kosina
2019-01-05 19:14 ` Vlastimil Babka
2019-01-05 19:24   ` Jiri Kosina
2019-01-05 19:38     ` Vlastimil Babka
2019-01-08  9:14       ` Bernd Petrovitsch
2019-01-08 11:37         ` Jiri Kosina
2019-01-08 13:53           ` Bernd Petrovitsch
2019-01-08 14:08             ` Kirill A. Shutemov
2019-01-05 19:44 ` kbuild test robot
2019-01-05 19:46 ` Linus Torvalds
2019-01-05 20:12   ` Jiri Kosina
2019-01-05 20:17     ` Linus Torvalds
2019-01-05 20:43       ` Jiri Kosina
2019-01-05 21:54         ` Linus Torvalds
2019-01-06 11:33           ` Kevin Easton
2019-01-08  8:50           ` Kevin Easton
2019-01-18 14:23           ` Tejun Heo
2019-01-05 20:13   ` Linus Torvalds
2019-01-05 19:56 ` kbuild test robot
2019-01-05 22:54 ` Jann Horn
2019-01-05 23:05   ` Linus Torvalds
2019-01-05 23:16     ` Linus Torvalds
2019-01-05 23:28       ` Linus Torvalds
2019-01-05 23:39       ` Linus Torvalds
2019-01-06  0:11         ` Matthew Wilcox
2019-01-06  0:22           ` Linus Torvalds
2019-01-06  1:50             ` Linus Torvalds
2019-01-06 21:46               ` Linus Torvalds
2019-01-08  4:43                 ` Dave Chinner
2019-01-08 17:57                   ` Linus Torvalds
2019-01-09  2:24                     ` Dave Chinner
2019-01-09  2:31                       ` Jiri Kosina
2019-01-09  4:39                         ` Dave Chinner
2019-01-09 10:08                           ` Jiri Kosina
2019-01-10  1:15                             ` Dave Chinner
2019-01-10  7:54                               ` Jiri Kosina
2019-01-09 18:25                           ` Linus Torvalds
2019-01-10  0:44                             ` Dave Chinner
2019-01-10  1:18                               ` Linus Torvalds
2019-01-10  5:26                                 ` Andy Lutomirski
2019-01-10 14:47                                   ` Matthew Wilcox
2019-01-10 21:44                                     ` Dave Chinner
2019-01-10 21:59                                       ` Linus Torvalds
2019-01-11  1:47                                   ` Dave Chinner
2019-01-10  7:03                                 ` Dave Chinner
2019-01-10 11:47                                   ` Linus Torvalds
2019-01-10 12:24                                     ` Dominique Martinet
2019-01-10 22:11                                       ` Linus Torvalds
2019-01-11  2:03                                         ` Dave Chinner
2019-01-11  2:18                                           ` Linus Torvalds
2019-01-11  4:04                                             ` Dave Chinner
2019-01-11  4:08                                               ` Andy Lutomirski
2019-01-11  7:20                                                 ` Dave Chinner
2019-01-11  7:08                                               ` Linus Torvalds
2019-01-11  7:36                                                 ` Dave Chinner
2019-01-11 16:26                                                   ` Linus Torvalds
2019-01-15 23:45                                                     ` Dave Chinner
2019-01-16  4:54                                                       ` Linus Torvalds
2019-01-16  5:49                                                         ` Linus Torvalds
2019-01-17  1:26                                                         ` Dave Chinner
2019-02-20 15:49                                                     ` Nicolai Stange
2019-01-11  4:57                                         ` Dominique Martinet
2019-01-11  7:11                                           ` Linus Torvalds
2019-01-11  7:32                                             ` Dominique Martinet
2019-01-16  0:42                                         ` Josh Snyder
2019-01-16  5:00                                           ` Linus Torvalds
2019-01-16  5:25                                             ` Andy Lutomirski
2019-01-16  5:34                                               ` Linus Torvalds
2019-01-16  5:46                                                 ` Dominique Martinet
2019-01-16  5:58                                                   ` Linus Torvalds
2019-01-16  6:34                                                     ` Dominique Martinet
2019-01-16  7:52                                                       ` Josh Snyder
2019-01-16 12:18                                                         ` Kevin Easton
2019-01-17 21:45                                                         ` Vlastimil Babka
2019-01-18  4:49                                                           ` Linus Torvalds
2019-01-18 18:58                                                             ` Vlastimil Babka
2019-01-16 16:12                                                     ` Jiri Kosina
2019-01-16 17:48                                                       ` Linus Torvalds
2019-01-16 20:23                                                         ` Jiri Kosina
2019-01-16 21:37                                                           ` Matthew Wilcox
2019-01-16 21:41                                                             ` Jiri Kosina
2019-01-17  9:52                                                               ` Cyril Hrubis
2019-01-28 13:49                                                               ` Cyril Hrubis
2019-01-17  4:51                                                             ` Linus Torvalds
2019-01-18  4:54                                                               ` Linus Torvalds
2019-01-17  1:49                                                           ` Dominique Martinet
2019-01-23 20:27                                                           ` Linus Torvalds
2019-01-23 20:35                                                             ` Linus Torvalds
2019-01-23 23:12                                                               ` Jiri Kosina
2019-01-24  0:20                                                                 ` Linus Torvalds
2019-01-24  0:24                                                             ` Dominique Martinet
2019-01-24 12:45                                                               ` Dominique Martinet
2019-01-24 14:25                                                                 ` Jiri Kosina
2019-01-27 22:35                                                                   ` Jiri Kosina
2019-01-28  0:05                                                                     ` Dominique Martinet
2019-01-29 23:52                                                                       ` Jiri Kosina
2019-01-30  9:09                                                                         ` Michal Hocko
2019-01-30 12:29                                                                           ` Jiri Kosina
2019-01-16 12:36                                             ` Matthew Wilcox
2019-01-10 14:50                               ` Matthew Wilcox
2019-01-11  7:36                               ` Jiri Kosina
2019-01-17  2:22                                 ` Dave Chinner
2019-01-17  8:18                                   ` Jiri Kosina
2019-01-17 21:06                                     ` Dave Chinner
2019-01-07  4:32             ` Dominique Martinet
2019-01-07 10:33               ` Vlastimil Babka
2019-01-07 11:08                 ` Dominique Martinet [this message]
2019-01-07 11:59                   ` Vlastimil Babka
2019-01-07 13:29                   ` Daniel Gruss
2019-01-07 10:10         ` Michael Ellerman
2019-01-05 23:09   ` Jiri Kosina
2019-01-30 12:44 ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Vlastimil Babka
2019-01-30 12:44   ` [PATCH 1/3] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-01-31  9:43     ` Michal Hocko
2019-01-31  9:51       ` Dominique Martinet
2019-01-31 17:46       ` Josh Snyder
2019-02-01  8:56     ` Vlastimil Babka
2019-03-06 23:13     ` Andrew Morton
2019-03-07  0:01       ` Jiri Kosina
2019-03-07  0:40         ` Dominique Martinet
2019-03-07  5:46           ` Jiri Kosina
2019-01-30 12:44   ` [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O Vlastimil Babka
2019-01-30 15:04     ` Florian Weimer
2019-01-30 15:15       ` Jiri Kosina
2019-01-31 10:47         ` Florian Weimer
2019-01-31 11:34           ` Jiri Kosina
2019-01-31  9:56     ` Michal Hocko
2019-01-31 10:15       ` Jiri Kosina
2019-01-31 10:23         ` Michal Hocko
2019-01-31 10:30           ` Jiri Kosina
2019-01-31 11:32             ` Michal Hocko
2019-01-31 17:54           ` Linus Torvalds
2019-02-01  5:13             ` Dave Chinner
2019-02-01  7:05               ` Linus Torvalds
2019-02-01  7:21                 ` Linus Torvalds
2019-02-01  1:44       ` Dave Chinner
2019-02-12 15:48         ` Jiri Kosina
2019-01-31 12:04     ` Daniel Gruss
2019-01-31 12:06       ` Vlastimil Babka
2019-01-31 12:08       ` Jiri Kosina
2019-01-31 12:57         ` Daniel Gruss
2019-01-30 12:44   ` [PATCH 3/3] mm/mincore: provide mapped status when cached status is not allowed Vlastimil Babka
2019-01-31 10:09     ` Michal Hocko
2019-02-01  9:04       ` Vlastimil Babka
2019-02-01  9:11         ` Michal Hocko
2019-02-01  9:27           ` Vlastimil Babka
2019-02-06 20:14             ` Jiri Kosina
2019-02-12  3:44         ` Jiri Kosina
2019-02-12  6:36           ` Michal Hocko
2019-02-12 13:09             ` Jiri Kosina
2019-02-12 14:01               ` Michal Hocko
2019-03-06 12:11   ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Jiri Kosina
2019-03-06 22:35     ` Andrew Morton
2019-03-06 22:48       ` Jiri Kosina
2019-03-06 23:23         ` Andrew Morton
2019-03-06 23:32           ` Dominique Martinet
2019-03-06 23:38             ` Andrew Morton
2019-03-09 16:53               ` Linus Torvalds
2019-03-12 14:17   ` [PATCH v2 0/2] prevent mincore() page cache leaks Vlastimil Babka
2019-03-12 14:17     ` [PATCH v2 1/2] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-03-12 14:17     ` [PATCH v2 2/2] mm/mincore: provide mapped status when cached status is not allowed Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190107110827.GA15249@nautica \
    --to=asmadeus@codewreck.org \
    --cc=akpm@linux-foundation.org \
    --cc=daniel@gruss.cc \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=jikos@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).