linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Josh Snyder <joshs@netflix.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Dominique Martinet <asmadeus@codewreck.org>,
	Dave Chinner <david@fromorbit.com>,
	Jiri Kosina <jikos@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Jann Horn <jannh@google.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Michal Hocko <mhocko@suse.com>, Linux-MM <linux-mm@kvack.org>,
	kernel list <linux-kernel@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged
Date: Tue, 15 Jan 2019 16:42:08 -0800	[thread overview]
Message-ID: <5c3e7de6.1c69fb81.4aebb.3fec@mx.google.com> (raw)
In-Reply-To: <CAHk-=wip2CPrdOwgF0z4n2tsdW7uu+Egtcx9Mxxe3gPfPW_JmQ@mail.gmail.com>

Linus Torvalds wrote on Thu, Jan 10, 2019:
> So right now, I consider the mincore change to be a "try to probe the
> state of mincore users", and we haven't really gotten a lot of
> information back yet.

<raises hand>

For Netflix, losing accurate information from the mincore syscall would
lengthen database cluster maintenance operations from days to months.  We
rely on cross-process mincore to migrate the contents of a page cache from
machine to machine, and across reboots.

To do this, I wrote and maintain happycache [1], a page cache dumper/loader
tool. It is quite similar in architecture to pgfincore, except that it is
agnostic to workload. The gist of happycache's operation is "produce a dump
of residence status for each page, do some operation, then reload exactly
the same pages which were present before." happycache is entirely dependent
on accurate reporting of the in-core status of file-backed pages, as
accessed by another process.

We primarily use happycache with Cassandra, which (like Postgres +
pgfincore) relies heavily on OS page cache to reduce disk accesses. Because
our workloads never experience a cold page cache, we are able to provision
hardware for a peak utilization level that is far lower than the
hypothetical "every query is a cache miss" peak.

A database warmed by happycache can be ready for service in seconds
(bounded only by the performance of the drives and the I/O subsystem), with
no period of in-service degradation. By contrast, putting a database in
service without a page cache entails a potentially unbounded period of
degradation (at Netflix, the time to populate a single node's cache via
natural cache misses varies by workload from hours to weeks). If a single
node upgrade were to take weeks, then upgrading an entire cluster would
take months. Since we want to apply security upgrades (and other things) on
a somewhat tighter schedule, we would have to develop more complex
solutions to provide the same functionality already provided by mincore.

At the bottom line, happycache is designed to benignly exploit the same
information leak documented in the paper [2]. I think it makes perfect
sense to remove cross-process mincore functionality from unprivileged
users, but not to remove it entirely.

Josh Snyder
Netflix Cloud Database Engineering

[1] https://github.com/hashbrowncipher/happycache
[2] https://arxiv.org/abs/1901.01161

  parent reply	other threads:[~2019-01-16  0:42 UTC|newest]

Thread overview: 215+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-05 17:27 [PATCH] mm/mincore: allow for making sys_mincore() privileged Jiri Kosina
2019-01-05 17:27 ` Jiri Kosina
2019-01-05 19:14 ` Vlastimil Babka
2019-01-05 19:24   ` Jiri Kosina
2019-01-05 19:24     ` Jiri Kosina
2019-01-05 19:38     ` Vlastimil Babka
2019-01-08  9:14       ` Bernd Petrovitsch
2019-01-08 11:37         ` Jiri Kosina
2019-01-08 11:37           ` Jiri Kosina
2019-01-08 13:53           ` Bernd Petrovitsch
2019-01-08 14:08             ` Kirill A. Shutemov
2019-01-05 19:44 ` kbuild test robot
2019-01-05 19:46 ` Linus Torvalds
2019-01-05 19:46   ` Linus Torvalds
2019-01-05 20:12   ` Jiri Kosina
2019-01-05 20:12     ` Jiri Kosina
2019-01-05 20:17     ` Linus Torvalds
2019-01-05 20:17       ` Linus Torvalds
2019-01-05 20:43       ` Jiri Kosina
2019-01-05 20:43         ` Jiri Kosina
2019-01-05 21:54         ` Linus Torvalds
2019-01-05 21:54           ` Linus Torvalds
2019-01-06 11:33           ` Kevin Easton
2019-01-08  8:50           ` Kevin Easton
2019-01-18 14:23           ` Tejun Heo
2019-01-05 20:13   ` Linus Torvalds
2019-01-05 20:13     ` Linus Torvalds
2019-01-05 19:56 ` kbuild test robot
2019-01-05 22:54 ` Jann Horn
2019-01-05 22:54   ` Jann Horn
2019-01-05 23:05   ` Linus Torvalds
2019-01-05 23:05     ` Linus Torvalds
2019-01-05 23:16     ` Linus Torvalds
2019-01-05 23:16       ` Linus Torvalds
2019-01-05 23:28       ` Linus Torvalds
2019-01-05 23:28         ` Linus Torvalds
2019-01-05 23:39       ` Linus Torvalds
2019-01-05 23:39         ` Linus Torvalds
2019-01-06  0:11         ` Matthew Wilcox
2019-01-06  0:22           ` Linus Torvalds
2019-01-06  0:22             ` Linus Torvalds
2019-01-06  1:50             ` Linus Torvalds
2019-01-06  1:50               ` Linus Torvalds
2019-01-06 21:46               ` Linus Torvalds
2019-01-06 21:46                 ` Linus Torvalds
2019-01-08  4:43                 ` Dave Chinner
2019-01-08 17:57                   ` Linus Torvalds
2019-01-08 17:57                     ` Linus Torvalds
2019-01-09  2:24                     ` Dave Chinner
2019-01-09  2:31                       ` Jiri Kosina
2019-01-09  2:31                         ` Jiri Kosina
2019-01-09  4:39                         ` Dave Chinner
2019-01-09 10:08                           ` Jiri Kosina
2019-01-09 10:08                             ` Jiri Kosina
2019-01-10  1:15                             ` Dave Chinner
2019-01-10  7:54                               ` Jiri Kosina
2019-01-10  7:54                                 ` Jiri Kosina
2019-01-09 18:25                           ` Linus Torvalds
2019-01-09 18:25                             ` Linus Torvalds
2019-01-10  0:44                             ` Dave Chinner
2019-01-10  1:18                               ` Linus Torvalds
2019-01-10  1:18                                 ` Linus Torvalds
2019-01-10  5:26                                 ` Andy Lutomirski
2019-01-10  5:26                                   ` Andy Lutomirski
2019-01-10 14:47                                   ` Matthew Wilcox
2019-01-10 21:44                                     ` Dave Chinner
2019-01-10 21:59                                       ` Linus Torvalds
2019-01-10 21:59                                         ` Linus Torvalds
2019-01-11  1:47                                   ` Dave Chinner
2019-01-10  7:03                                 ` Dave Chinner
2019-01-10 11:47                                   ` Linus Torvalds
2019-01-10 11:47                                     ` Linus Torvalds
2019-01-10 12:24                                     ` Dominique Martinet
2019-01-10 12:24                                       ` Dominique Martinet
2019-01-10 22:11                                       ` Linus Torvalds
2019-01-10 22:11                                         ` Linus Torvalds
2019-01-11  2:03                                         ` Dave Chinner
2019-01-11  2:18                                           ` Linus Torvalds
2019-01-11  2:18                                             ` Linus Torvalds
2019-01-11  4:04                                             ` Dave Chinner
2019-01-11  4:08                                               ` Andy Lutomirski
2019-01-11  7:20                                                 ` Dave Chinner
2019-01-11  7:08                                               ` Linus Torvalds
2019-01-11  7:08                                                 ` Linus Torvalds
2019-01-11  7:36                                                 ` Dave Chinner
2019-01-11 16:26                                                   ` Linus Torvalds
2019-01-11 16:26                                                     ` Linus Torvalds
2019-01-15 23:45                                                     ` Dave Chinner
2019-01-16  4:54                                                       ` Linus Torvalds
2019-01-16  4:54                                                         ` Linus Torvalds
2019-01-16  5:49                                                         ` Linus Torvalds
2019-01-16  5:49                                                           ` Linus Torvalds
2019-01-17  1:26                                                         ` Dave Chinner
2019-02-20 15:49                                                     ` Nicolai Stange
2019-01-11  4:57                                         ` Dominique Martinet
2019-01-11  7:11                                           ` Linus Torvalds
2019-01-11  7:11                                             ` Linus Torvalds
2019-01-11  7:32                                             ` Dominique Martinet
2019-01-16  0:42                                         ` Josh Snyder [this message]
2019-01-16  5:00                                           ` Linus Torvalds
2019-01-16  5:00                                             ` Linus Torvalds
2019-01-16  5:25                                             ` Andy Lutomirski
2019-01-16  5:34                                               ` Linus Torvalds
2019-01-16  5:34                                                 ` Linus Torvalds
2019-01-16  5:46                                                 ` Dominique Martinet
2019-01-16  5:58                                                   ` Linus Torvalds
2019-01-16  5:58                                                     ` Linus Torvalds
2019-01-16  6:34                                                     ` Dominique Martinet
2019-01-16  7:52                                                       ` Josh Snyder
2019-01-16  7:52                                                         ` Josh Snyder
2019-01-16 12:18                                                         ` Kevin Easton
2019-01-17 21:45                                                         ` Vlastimil Babka
2019-01-18  4:49                                                           ` Linus Torvalds
2019-01-18  4:49                                                             ` Linus Torvalds
2019-01-18 18:58                                                             ` Vlastimil Babka
2019-01-16 16:12                                                     ` Jiri Kosina
2019-01-16 16:12                                                       ` Jiri Kosina
2019-01-16 17:48                                                       ` Linus Torvalds
2019-01-16 17:48                                                         ` Linus Torvalds
2019-01-16 20:23                                                         ` Jiri Kosina
2019-01-16 20:23                                                           ` Jiri Kosina
2019-01-16 21:37                                                           ` Matthew Wilcox
2019-01-16 21:41                                                             ` Jiri Kosina
2019-01-16 21:41                                                               ` Jiri Kosina
2019-01-17  9:52                                                               ` Cyril Hrubis
2019-01-28 13:49                                                               ` Cyril Hrubis
2019-01-17  4:51                                                             ` Linus Torvalds
2019-01-17  4:51                                                               ` Linus Torvalds
2019-01-18  4:54                                                               ` Linus Torvalds
2019-01-18  4:54                                                                 ` Linus Torvalds
2019-01-17  1:49                                                           ` Dominique Martinet
2019-01-23 20:27                                                           ` Linus Torvalds
2019-01-23 20:27                                                             ` Linus Torvalds
2019-01-23 20:35                                                             ` Linus Torvalds
2019-01-23 20:35                                                               ` Linus Torvalds
2019-01-23 23:12                                                               ` Jiri Kosina
2019-01-23 23:12                                                                 ` Jiri Kosina
2019-01-24  0:20                                                                 ` Linus Torvalds
2019-01-24  0:20                                                                   ` Linus Torvalds
2019-01-24  0:24                                                             ` Dominique Martinet
2019-01-24 12:45                                                               ` Dominique Martinet
2019-01-24 14:25                                                                 ` Jiri Kosina
2019-01-24 14:25                                                                   ` Jiri Kosina
2019-01-27 22:35                                                                   ` Jiri Kosina
2019-01-27 22:35                                                                     ` Jiri Kosina
2019-01-28  0:05                                                                     ` Dominique Martinet
2019-01-29 23:52                                                                       ` Jiri Kosina
2019-01-30  9:09                                                                         ` Michal Hocko
2019-01-30 12:29                                                                           ` Jiri Kosina
2019-01-16 12:36                                             ` Matthew Wilcox
2019-01-10 14:50                               ` Matthew Wilcox
2019-01-11  7:36                               ` Jiri Kosina
2019-01-11  7:36                                 ` Jiri Kosina
2019-01-17  2:22                                 ` Dave Chinner
2019-01-17  8:18                                   ` Jiri Kosina
2019-01-17  8:18                                     ` Jiri Kosina
2019-01-17 21:06                                     ` Dave Chinner
2019-01-07  4:32             ` Dominique Martinet
2019-01-07 10:33               ` Vlastimil Babka
2019-01-07 11:08                 ` Dominique Martinet
2019-01-07 11:59                   ` Vlastimil Babka
2019-01-07 13:29                   ` Daniel Gruss
2019-01-07 10:10         ` Michael Ellerman
2019-01-05 23:09   ` Jiri Kosina
2019-01-05 23:09     ` Jiri Kosina
2019-01-30 12:44 ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Vlastimil Babka
2019-01-30 12:44   ` [PATCH 1/3] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-01-31  9:43     ` Michal Hocko
2019-01-31  9:51       ` Dominique Martinet
2019-01-31 17:46       ` Josh Snyder
2019-02-01  8:56     ` Vlastimil Babka
2019-03-06 23:13     ` Andrew Morton
2019-03-07  0:01       ` Jiri Kosina
2019-03-07  0:40         ` Dominique Martinet
2019-03-07  5:46           ` Jiri Kosina
2019-01-30 12:44   ` [PATCH 2/3] mm/filemap: initiate readahead even if IOCB_NOWAIT is set for the I/O Vlastimil Babka
2019-01-30 15:04     ` Florian Weimer
2019-01-30 15:15       ` Jiri Kosina
2019-01-31 10:47         ` Florian Weimer
2019-01-31 11:34           ` Jiri Kosina
2019-01-31  9:56     ` Michal Hocko
2019-01-31 10:15       ` Jiri Kosina
2019-01-31 10:23         ` Michal Hocko
2019-01-31 10:30           ` Jiri Kosina
2019-01-31 11:32             ` Michal Hocko
2019-01-31 17:54           ` Linus Torvalds
2019-02-01  5:13             ` Dave Chinner
2019-02-01  7:05               ` Linus Torvalds
2019-02-01  7:21                 ` Linus Torvalds
2019-02-01  1:44       ` Dave Chinner
2019-02-12 15:48         ` Jiri Kosina
2019-01-31 12:04     ` Daniel Gruss
2019-01-31 12:06       ` Vlastimil Babka
2019-01-31 12:08       ` Jiri Kosina
2019-01-31 12:57         ` Daniel Gruss
2019-01-30 12:44   ` [PATCH 3/3] mm/mincore: provide mapped status when cached status is not allowed Vlastimil Babka
2019-01-31 10:09     ` Michal Hocko
2019-02-01  9:04       ` Vlastimil Babka
2019-02-01  9:11         ` Michal Hocko
2019-02-01  9:27           ` Vlastimil Babka
2019-02-06 20:14             ` Jiri Kosina
2019-02-12  3:44         ` Jiri Kosina
2019-02-12  6:36           ` Michal Hocko
2019-02-12 13:09             ` Jiri Kosina
2019-02-12 14:01               ` Michal Hocko
2019-03-06 12:11   ` [PATCH 0/3] mincore() and IOCB_NOWAIT adjustments Jiri Kosina
2019-03-06 22:35     ` Andrew Morton
2019-03-06 22:48       ` Jiri Kosina
2019-03-06 23:23         ` Andrew Morton
2019-03-06 23:32           ` Dominique Martinet
2019-03-06 23:38             ` Andrew Morton
2019-03-09 16:53               ` Linus Torvalds
2019-03-12 14:17   ` [PATCH v2 0/2] prevent mincore() page cache leaks Vlastimil Babka
2019-03-12 14:17     ` [PATCH v2 1/2] mm/mincore: make mincore() more conservative Vlastimil Babka
2019-03-12 14:17     ` [PATCH v2 2/2] mm/mincore: provide mapped status when cached status is not allowed Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c3e7de6.1c69fb81.4aebb.3fec@mx.google.com \
    --to=joshs@netflix.com \
    --cc=akpm@linux-foundation.org \
    --cc=asmadeus@codewreck.org \
    --cc=david@fromorbit.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=jannh@google.com \
    --cc=jikos@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).