linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: yu-cheng.yu@intel.com, Andy Lutomirski <luto@amacapital.net>
Cc: the arch/x86 maintainers <x86@kernel.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	linux-doc@vger.kernel.org, Linux-MM <linux-mm@kvack.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Balbir Singh <bsingharora@gmail.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Eugene Syromiatnikov <esyr@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	hjl.tools@gmail.com, Jonathan Corbet <corbet@lwn.net>,
	Kees Cook <keescook@chromium.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Nadav Amit <nadav.amit@gmail.com>, Oleg Nesterov <oleg@redhat.c>
Subject: Re: [PATCH v5 07/27] mm/mmap: Create a guard area between VMAs
Date: Thu, 11 Oct 2018 22:39:24 +0200	[thread overview]
Message-ID: <CAG48ez3R7XL8MX_sjff1FFYuARX_58wA_=ACbv2im-XJKR8tvA@mail.gmail.com> (raw)
In-Reply-To: <20181011151523.27101-8-yu-cheng.yu@intel.com>

On Thu, Oct 11, 2018 at 5:20 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> Create a guard area between VMAs to detect memory corruption.
[...]
> +config VM_AREA_GUARD
> +       bool "VM area guard"
> +       default n
> +       help
> +         Create a guard area between VM areas so that access beyond
> +         limit can be detected.
> +
>  endmenu

Sorry to bring this up so late, but Daniel Micay pointed out to me
that, given that VMA guards will raise the number of VMAs by
inhibiting vma_merge(), people are more likely to run into
/proc/sys/vm/max_map_count (which limits the number of VMAs to ~65k by
default, and can't easily be raised without risking an overflow of
page->_mapcount on systems with over ~800GiB of RAM, see
https://lore.kernel.org/lkml/20180208021112.GB14918@bombadil.infradead.org/
and replies) with this change.

Playing with glibc's memory allocator, it looks like glibc will use
mmap() for 128KB allocations; so at 65530*128KB=8GB of memory usage in
128KB chunks, an application could run out of VMAs.

People already run into that limit sometimes when mapping files, and
recommend raising it:

https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
http://docs.actian.com/vector/4.2/User/Increase_max_map_count_Kernel_Parameter_(Linux).htm
https://www.suse.com/de-de/support/kb/doc/?id=7000830 (they actually
ran into ENOMEM on **munmap**, because you can't split VMAs once the
limit is reached): "A custom application was failing on a SLES server
with ENOMEM errors when attempting to release memory using an munmap
call. This resulted in memory failing to be released, and the system
load and swap use increasing until the SLES machine ultimately crashed
or hung."
https://access.redhat.com/solutions/99913
https://forum.manjaro.org/t/resolved-how-to-set-vm-max-map-count-during-boot/43360

Arguably the proper solution to this would be to raise the default
max_map_count to be much higher; but then that requires fixing the
mapcount overflow.

WARNING: multiple messages have this Message-ID (diff)
From: Jann Horn <jannh@google.com>
To: yu-cheng.yu@intel.com, Andy Lutomirski <luto@amacapital.net>
Cc: the arch/x86 maintainers <x86@kernel.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	linux-doc@vger.kernel.org, Linux-MM <linux-mm@kvack.org>,
	linux-arch <linux-arch@vger.kernel.org>,
	Linux API <linux-api@vger.kernel.org>,
	Arnd Bergmann <arnd@arndb.de>,
	Balbir Singh <bsingharora@gmail.com>,
	Cyrill Gorcunov <gorcunov@gmail.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Eugene Syromiatnikov <esyr@redhat.com>,
	Florian Weimer <fweimer@redhat.com>,
	hjl.tools@gmail.com, Jonathan Corbet <corbet@lwn.net>,
	Kees Cook <keescook@chromium.org>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Nadav Amit <nadav.amit@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>, Pavel Machek <pavel@ucw.cz>,
	Peter Zijlstra <peterz@infradead.org>,
	rdunlap@infradead.org, ravi.v.shankar@intel.com,
	vedvyas.shanbhogue@intel.com,
	Daniel Micay <danielmicay@gmail.com>
Subject: Re: [PATCH v5 07/27] mm/mmap: Create a guard area between VMAs
Date: Thu, 11 Oct 2018 22:39:24 +0200	[thread overview]
Message-ID: <CAG48ez3R7XL8MX_sjff1FFYuARX_58wA_=ACbv2im-XJKR8tvA@mail.gmail.com> (raw)
Message-ID: <20181011203924.O6RBOrytNkXZtN8RgcMni5aYEtJ2IoLDdaxMTH0himI@z> (raw)
In-Reply-To: <20181011151523.27101-8-yu-cheng.yu@intel.com>

On Thu, Oct 11, 2018 at 5:20 PM Yu-cheng Yu <yu-cheng.yu@intel.com> wrote:
> Create a guard area between VMAs to detect memory corruption.
[...]
> +config VM_AREA_GUARD
> +       bool "VM area guard"
> +       default n
> +       help
> +         Create a guard area between VM areas so that access beyond
> +         limit can be detected.
> +
>  endmenu

Sorry to bring this up so late, but Daniel Micay pointed out to me
that, given that VMA guards will raise the number of VMAs by
inhibiting vma_merge(), people are more likely to run into
/proc/sys/vm/max_map_count (which limits the number of VMAs to ~65k by
default, and can't easily be raised without risking an overflow of
page->_mapcount on systems with over ~800GiB of RAM, see
https://lore.kernel.org/lkml/20180208021112.GB14918@bombadil.infradead.org/
and replies) with this change.

Playing with glibc's memory allocator, it looks like glibc will use
mmap() for 128KB allocations; so at 65530*128KB=8GB of memory usage in
128KB chunks, an application could run out of VMAs.

People already run into that limit sometimes when mapping files, and
recommend raising it:

https://www.elastic.co/guide/en/elasticsearch/reference/current/vm-max-map-count.html
http://docs.actian.com/vector/4.2/User/Increase_max_map_count_Kernel_Parameter_(Linux).htm
https://www.suse.com/de-de/support/kb/doc/?id=7000830 (they actually
ran into ENOMEM on **munmap**, because you can't split VMAs once the
limit is reached): "A custom application was failing on a SLES server
with ENOMEM errors when attempting to release memory using an munmap
call. This resulted in memory failing to be released, and the system
load and swap use increasing until the SLES machine ultimately crashed
or hung."
https://access.redhat.com/solutions/99913
https://forum.manjaro.org/t/resolved-how-to-set-vm-max-map-count-during-boot/43360

Arguably the proper solution to this would be to raise the default
max_map_count to be much higher; but then that requires fixing the
mapcount overflow.

  parent reply	other threads:[~2018-10-11 20:39 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-11 15:14 [PATCH v5 00/27] Control Flow Enforcement: Shadow Stack Yu-cheng Yu
2018-10-11 15:14 ` Yu-cheng Yu
2018-10-11 15:14 ` [PATCH v5 01/27] x86/cpufeatures: Add CPUIDs for Control Flow Enforcement Technology (CET) Yu-cheng Yu
2018-10-11 15:14   ` Yu-cheng Yu
2018-10-11 16:43   ` Borislav Petkov
2018-10-11 16:43     ` Borislav Petkov
2018-10-11 16:45     ` Yu-cheng Yu
2018-10-11 16:45       ` Yu-cheng Yu
2018-10-11 15:14 ` [PATCH v5 02/27] x86/fpu/xstate: Change names to separate XSAVES system and user states Yu-cheng Yu
2018-10-11 15:14   ` Yu-cheng Yu
2018-10-15 17:03   ` Borislav Petkov
2018-10-15 17:03     ` Borislav Petkov
2018-10-11 15:14 ` [PATCH v5 03/27] x86/fpu/xstate: Introduce XSAVES system states Yu-cheng Yu
2018-10-11 15:14   ` Yu-cheng Yu
2018-10-17 10:41   ` Borislav Petkov
2018-10-17 10:41     ` Borislav Petkov
2018-10-17 22:39     ` Randy Dunlap
2018-10-17 22:39       ` Randy Dunlap
2018-10-17 22:58       ` Borislav Petkov
2018-10-17 22:58         ` Borislav Petkov
2018-10-17 23:17         ` Randy Dunlap
2018-10-17 23:17           ` Randy Dunlap
2018-10-18  9:26           ` Borislav Petkov
2018-10-18  9:26             ` Borislav Petkov
2018-10-18  9:31             ` Pavel Machek
2018-10-18  9:31               ` Pavel Machek
2018-10-18 12:10               ` Borislav Petkov
2018-10-18 12:10                 ` Borislav Petkov
2018-10-18 18:33             ` Randy Dunlap
2018-10-18 18:33               ` Randy Dunlap
2018-10-18  9:24         ` Pavel Machek
2018-10-18  9:24           ` Pavel Machek
2018-10-11 15:15 ` [PATCH v5 04/27] x86/fpu/xstate: Add XSAVES system states for shadow stack Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-11-08 18:40   ` Borislav Petkov
2018-11-08 18:40     ` Borislav Petkov
2018-11-08 20:40     ` Yu-cheng Yu
2018-11-08 20:40       ` Yu-cheng Yu
2018-11-08 23:52       ` Borislav Petkov
2018-11-08 23:52         ` Borislav Petkov
2018-11-11 11:31       ` Pavel Machek
2018-11-11 11:31         ` Pavel Machek
2018-11-11 11:31     ` Pavel Machek
2018-11-11 11:31       ` Pavel Machek
2018-11-11 14:59       ` Andy Lutomirski
2018-11-11 14:59         ` Andy Lutomirski
2018-11-11 19:02         ` Pavel Machek
2018-11-11 19:02           ` Pavel Machek
2018-11-08 20:46   ` Andy Lutomirski
2018-11-08 20:46     ` Andy Lutomirski
2018-11-08 21:01     ` Yu-cheng Yu
2018-11-08 21:01       ` Yu-cheng Yu
2018-11-08 21:22       ` Andy Lutomirski
2018-11-08 21:22         ` Andy Lutomirski
2018-11-08 21:31         ` Cyrill Gorcunov
2018-11-08 21:31           ` Cyrill Gorcunov
2018-11-08 22:01           ` Andy Lutomirski
2018-11-08 22:01             ` Andy Lutomirski
2018-11-08 22:18             ` Cyrill Gorcunov
2018-11-08 22:18               ` Cyrill Gorcunov
2018-11-08 21:48         ` Dave Hansen
2018-11-08 21:48           ` Dave Hansen
2018-11-08 22:00           ` Matthew Wilcox
2018-11-08 22:00             ` Matthew Wilcox
2018-11-08 23:35             ` Dave Hansen
2018-11-08 23:35               ` Dave Hansen
2018-11-09  0:32               ` Matthew Wilcox
2018-11-09  0:32                 ` Matthew Wilcox
2018-11-09  0:45                 ` Andy Lutomirski
2018-11-09  0:45                   ` Andy Lutomirski
2018-11-09 17:13                 ` Dave Hansen
2018-11-09 17:13                   ` Dave Hansen
2018-11-09 17:17                   ` Matthew Wilcox
2018-11-09 17:17                     ` Matthew Wilcox
2018-11-09 17:20                     ` Dave Hansen
2018-11-09 17:20                       ` Dave Hansen
2018-11-09 17:28                       ` Dave Hansen
2018-11-09 17:28                         ` Dave Hansen
2018-11-11 11:31         ` Pavel Machek
2018-11-11 11:31           ` Pavel Machek
2018-10-11 15:15 ` [PATCH v5 05/27] Documentation/x86: Add CET description Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-11-13 18:43   ` Borislav Petkov
2018-11-13 18:43     ` Borislav Petkov
2018-11-13 21:02     ` Yu-cheng Yu
2018-11-13 21:02       ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 06/27] x86/cet: Control protection exception handler Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-11-14 18:44   ` Borislav Petkov
2018-11-14 18:44     ` Borislav Petkov
2018-11-14 20:19     ` Yu-cheng Yu
2018-11-14 20:19       ` Yu-cheng Yu
2018-11-14 20:28       ` Borislav Petkov
2018-11-14 20:28         ` Borislav Petkov
2018-10-11 15:15 ` [PATCH v5 07/27] mm/mmap: Create a guard area between VMAs Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 20:39   ` Jann Horn [this message]
2018-10-11 20:39     ` Jann Horn
2018-10-11 20:49     ` Yu-cheng Yu
2018-10-11 20:49       ` Yu-cheng Yu
2018-10-11 20:55     ` Andy Lutomirski
2018-10-11 20:55       ` Andy Lutomirski
2018-10-12 21:49       ` Yu-cheng Yu
2018-10-12 21:49         ` Yu-cheng Yu
2018-10-12 13:17     ` Matthew Wilcox
2018-10-12 13:17       ` Matthew Wilcox
2018-10-11 20:49   ` Dave Hansen
2018-10-11 20:49     ` Dave Hansen
2018-10-12 10:24     ` Florian Weimer
2018-10-12 10:24       ` Florian Weimer
2018-10-11 15:15 ` [PATCH v5 08/27] x86/cet/shstk: Add Kconfig option for user-mode shadow stack Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 09/27] mm: Introduce VM_SHSTK for shadow stack memory Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 10/27] mm/mmap: Prevent Shadow Stack VMA merges Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 11/27] x86/mm: Change _PAGE_DIRTY to _PAGE_DIRTY_HW Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 12/27] x86/mm: Introduce _PAGE_DIRTY_SW Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 13/27] drm/i915/gvt: Update _PAGE_DIRTY to _PAGE_DIRTY_BITS Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 14/27] x86/mm: Modify ptep_set_wrprotect and pmdp_set_wrprotect for _PAGE_DIRTY_SW Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 15/27] x86/mm: Shadow stack page fault error checking Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 16/27] mm: Handle shadow stack page fault Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 17/27] mm: Handle THP/HugeTLB " Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 18/27] mm: Update can_follow_write_pte/pmd for shadow stack Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 19/27] mm: Introduce do_mmap_locked() Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 20/27] x86/cet/shstk: User-mode shadow stack support Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 21/27] x86/cet/shstk: Introduce WRUSS instruction Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-11-06 18:43   ` Dave Hansen
2018-11-06 18:43     ` Dave Hansen
2018-11-06 18:55     ` Andy Lutomirski
2018-11-06 18:55       ` Andy Lutomirski
2018-11-06 20:21     ` Yu-cheng Yu
2018-11-06 20:21       ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 22/27] x86/cet/shstk: Signal handling for shadow stack Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 23/27] x86/cet/shstk: ELF header parsing of Shadow Stack Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 24/27] x86/cet/shstk: Handle thread shadow stack Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 25/27] mm/mmap: Add Shadow stack pages to memory accounting Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 26/27] x86/cet/shstk: Add arch_prctl functions for Shadow Stack Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 15:15 ` [PATCH v5 27/27] x86/cet/shstk: Add Shadow Stack instructions to opcode map Yu-cheng Yu
2018-10-11 15:15   ` Yu-cheng Yu
2018-10-11 19:21 ` [PATCH v5 00/27] Control Flow Enforcement: Shadow Stack Dave Hansen
2018-10-11 19:21   ` Dave Hansen
2018-10-11 19:29   ` Yu-cheng Yu
2018-10-11 19:29     ` Yu-cheng Yu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez3R7XL8MX_sjff1FFYuARX_58wA_=ACbv2im-XJKR8tvA@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=arnd@arndb.de \
    --cc=bsingharora@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=esyr@redhat.com \
    --cc=fweimer@redhat.com \
    --cc=gorcunov@gmail.com \
    --cc=hjl.tools@gmail.com \
    --cc=hpa@zytor.com \
    --cc=keescook@chromium.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=nadav.amit@gmail.com \
    --cc=oleg@redhat.c \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    --cc=yu-cheng.yu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).