From: Matthew Wilcox <willy@infradead.org>
To: Daniel Colascione <dancol@google.com>
Cc: dave.hansen@intel.com, linux-mm@kvack.org,
Tim Murray <timmurray@google.com>,
Minchan Kim <minchan@kernel.org>
Subject: Re: Why do we let munmap fail?
Date: Mon, 21 May 2018 18:22:06 -0700 [thread overview]
Message-ID: <20180522012206.GB4860@bombadil.infradead.org> (raw)
In-Reply-To: <CAKOZuevBprpJ-fVKGCmuQz3dTMjKRfqp-cUuCyUzdkuQTQRNoQ@mail.gmail.com>
On Mon, May 21, 2018 at 05:38:06PM -0700, Daniel Colascione wrote:
> On Mon, May 21, 2018 at 5:22 PM Matthew Wilcox <willy@infradead.org> wrote:
> > On Mon, May 21, 2018 at 05:00:47PM -0700, Daniel Colascione wrote:
> > > On Mon, May 21, 2018 at 4:32 PM Dave Hansen <dave.hansen@intel.com>
> wrote:
> > > > I think there's still a potential dead-end here. "Deallocation" does
> > > > not always free resources.
> > >
> > > Sure, but the general principle applies: reserve resources when you
> *can*
> > > fail so that you don't fail where you can't fail.
>
> > Umm. OK. But you want an mmap of 4TB to succeed, right? That implies
> > preallocating one billion * sizeof(*vma). That's, what, dozens of
> > gigabytes right there?
>
> That's not what I'm proposing here. I'd hoped to make that clear in the
> remainder of the email to which you've replied.
>
> > I'm sympathetic to wanting to keep both vma-merging and
> > unmap-anything-i-mapped working, but your proposal isn't going to fix it.
>
> > You need to handle the attacker writing a program which mmaps 46 bits
> > of address space and then munmaps alternate pages. That program needs
> > to be detected and stopped.
>
> Let's look at why it's bad to mmap 46 bits of address space and munmap
> alternate pages. It can't be that doing so would just use too much memory:
> you can mmap 46 bits of address space *already* and touch each page, one by
> one, until the kernel gets fed up and the OOM killer kills you.
If it's anonymous memory, sure, the kernel will kill you. If it's
file-backed memory, the kernel will page it out again. Sure, page
table consumption might also kill you, but 8 bytes per page is a lot
less memory consumption than ~200 bytes per page!
> So it's not because we'd allocate a lot of memory that having a huge VMA
> tree is bad, because we already let processes allocate globs of memory in
> other ways. The badness comes, AIUI, from the asymptotic behavior of the
> address lookup algorithm in a tree that big.
There's an order of magnitude difference in memory consumption though.
> One approach to dealing with this badness, the one I proposed earlier, is
> to prevent that giant mmap from appearing in the first place (because we'd
> cap vsize). If that giant mmap never appears, you can't generate a huge VMA
> tree by splitting it.
I have 16GB of memory in this laptop. At 200 bytes per page, allocating
10% of my memory to vm_area_structs (a ridiculously high overhead),
restricts the total amount I can mmap (spread between all processes)
at 8 million pages, 32GB. Firefox alone is taking 3.6GB; gnome-shell
is taking another 4.4GB, even gnome-shell is taking 4GB. Your proposal
just doesn't work.
> Maybe that's not a good approach. Maybe processes really need mappings that
> big. If they do, then maybe the right approach is to just make 8 billion
> VMAs not "DoS the system". What actually goes wrong if we just let the VMA
> tree grow that large? So what if VMA lookup ends up taking a while --- the
> process with the pathological allocation pattern is paying the cost, right?
There's a per-inode tree of every mapping of that file, so if I mmap
libc and then munmap alternate pages, every user of libc pays the price.
next prev parent reply other threads:[~2018-05-22 1:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-05-21 22:07 Why do we let munmap fail? Daniel Colascione
2018-05-21 22:12 ` Dave Hansen
2018-05-21 22:20 ` Daniel Colascione
2018-05-21 22:29 ` Dave Hansen
2018-05-21 22:35 ` Daniel Colascione
2018-05-21 22:48 ` Dave Hansen
2018-05-21 22:54 ` Daniel Colascione
2018-05-21 23:02 ` Dave Hansen
2018-05-21 23:16 ` Daniel Colascione
2018-05-21 23:32 ` Dave Hansen
2018-05-22 0:00 ` Daniel Colascione
2018-05-22 0:22 ` Matthew Wilcox
2018-05-22 0:38 ` Daniel Colascione
2018-05-22 1:19 ` Theodore Y. Ts'o
2018-05-22 1:41 ` Daniel Colascione
2018-05-22 2:09 ` Daniel Colascione
2018-05-22 2:11 ` Matthew Wilcox
2018-05-22 1:22 ` Matthew Wilcox [this message]
2018-05-22 5:34 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180522012206.GB4860@bombadil.infradead.org \
--to=willy@infradead.org \
--cc=dancol@google.com \
--cc=dave.hansen@intel.com \
--cc=linux-mm@kvack.org \
--cc=minchan@kernel.org \
--cc=timmurray@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).