All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jann Horn <jannh@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC] Warn the user when they could overflow mapcount
Date: Thu, 8 Feb 2018 03:56:26 +0100	[thread overview]
Message-ID: <CAG48ez2-MTJ2YrS5fPZi19RY6P_6NWuK1U5CcQpJ25=xrGSy_A@mail.gmail.com> (raw)
In-Reply-To: <20180208021112.GB14918@bombadil.infradead.org>

On Thu, Feb 8, 2018 at 3:11 AM, Matthew Wilcox <willy@infradead.org> wrote:
> Kirill and I were talking about trying to overflow page->_mapcount
> the other day and realised that the default settings of pid_max and
> max_map_count prevent it [1].  But there isn't even documentation to
> warn a sysadmin that they've just opened themselves up to the possibility
> that they've opened their system up to a sufficiently-determined attacker.
>
> I'm not sufficiently wise in the ways of the MM to understand exactly
> what goes wrong if we do wrap mapcount.  Kirill says:
>
>   rmap depends on mapcount to decide when the page is not longer mapped.
>   If it sees page_mapcount() == 0 due to 32-bit wrap we are screwed;
>   data corruption, etc.

How much memory would you need to trigger this? You need one
vm_area_struct per increment, and those are 200 bytes? So at least
800GiB of memory for the vm_area_structs, and maybe more for other
data structures?

I wouldn't be too surprised if there are more 32-bit overflows that
start being realistic once you put something on the order of terabytes
of memory into one machine, given that refcount_t is 32 bits wide -
for example, the i_count. See
https://bugs.chromium.org/p/project-zero/issues/detail?id=809 for an
example where, given a sufficiently high RLIMIT_MEMLOCK, it was
possible to overflow a 32-bit refcounter on a system with just ~32GiB
of free memory (minimum required to store 2^32 64-bit pointers).

On systems with RAM on the order of terabytes, it's probably a good
idea to turn on refcount hardening to make issues like that
non-exploitable for now.

> That seems pretty bad.  So here's a patch which adds documentation to the
> two sysctls that a sysadmin could use to shoot themselves in the foot,
> and adds a warning if they change either of them to a dangerous value.

I have negative feelings about this patch, mostly because AFAICS:

 - It documents an issue instead of fixing it.
 - It likely only addresses a small part of the actual problem.

> It's possible to get into a dangerous situation without triggering this
> warning (already have the file mapped a lot of times, then lower pid_max,
> then raise max_map_count, then map the file a lot more times), but it's
> unlikely to happen.
>
> Comments?
>
> [1] map_count counts the number of times that a page is mapped to
> userspace; max_map_count restricts the number of times a process can
> map a page and pid_max restricts the number of processes that can exist.
> So map_count can never be larger than pid_max * max_map_count.
[...]
> +int sysctl_pid_max(struct ctl_table *table, int write,
> +                  void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +       struct ctl_table t;
> +       int ret;
> +
> +       t = *table;
> +       t.data = &pid_max;
> +       t.extra1 = &pid_max_min;
> +       t.extra2 = &pid_max_max;
> +
> +       ret = proc_douintvec_minmax(&t, write, buffer, lenp, ppos);
> +       if (ret || !write)
> +               return ret;
> +
> +       if ((INT_MAX / max_map_count) > pid_max)
> +               pr_warn("pid_max is dangerously large\n");

This in reordered is "if (pid_max * max_map_count < INT_MAX)
pr_warn(...);", no? That doesn't make sense to me. Same thing again
further down.

[...]
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d3c922ea1a1..5b66a4a48192 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -147,7 +147,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
>         *prev = vma;
>
>         if (start != vma->vm_start) {
> -               if (unlikely(mm->map_count >= sysctl_max_map_count)) {
> +               if (unlikely(mm->map_count >= max_map_count)) {

Why the renaming?

WARNING: multiple messages have this Message-ID (diff)
From: Jann Horn <jannh@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org,
	Kernel Hardening <kernel-hardening@lists.openwall.com>,
	kernel list <linux-kernel@vger.kernel.org>,
	"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Subject: Re: [RFC] Warn the user when they could overflow mapcount
Date: Thu, 8 Feb 2018 03:56:26 +0100	[thread overview]
Message-ID: <CAG48ez2-MTJ2YrS5fPZi19RY6P_6NWuK1U5CcQpJ25=xrGSy_A@mail.gmail.com> (raw)
In-Reply-To: <20180208021112.GB14918@bombadil.infradead.org>

On Thu, Feb 8, 2018 at 3:11 AM, Matthew Wilcox <willy@infradead.org> wrote:
> Kirill and I were talking about trying to overflow page->_mapcount
> the other day and realised that the default settings of pid_max and
> max_map_count prevent it [1].  But there isn't even documentation to
> warn a sysadmin that they've just opened themselves up to the possibility
> that they've opened their system up to a sufficiently-determined attacker.
>
> I'm not sufficiently wise in the ways of the MM to understand exactly
> what goes wrong if we do wrap mapcount.  Kirill says:
>
>   rmap depends on mapcount to decide when the page is not longer mapped.
>   If it sees page_mapcount() == 0 due to 32-bit wrap we are screwed;
>   data corruption, etc.

How much memory would you need to trigger this? You need one
vm_area_struct per increment, and those are 200 bytes? So at least
800GiB of memory for the vm_area_structs, and maybe more for other
data structures?

I wouldn't be too surprised if there are more 32-bit overflows that
start being realistic once you put something on the order of terabytes
of memory into one machine, given that refcount_t is 32 bits wide -
for example, the i_count. See
https://bugs.chromium.org/p/project-zero/issues/detail?id=809 for an
example where, given a sufficiently high RLIMIT_MEMLOCK, it was
possible to overflow a 32-bit refcounter on a system with just ~32GiB
of free memory (minimum required to store 2^32 64-bit pointers).

On systems with RAM on the order of terabytes, it's probably a good
idea to turn on refcount hardening to make issues like that
non-exploitable for now.

> That seems pretty bad.  So here's a patch which adds documentation to the
> two sysctls that a sysadmin could use to shoot themselves in the foot,
> and adds a warning if they change either of them to a dangerous value.

I have negative feelings about this patch, mostly because AFAICS:

 - It documents an issue instead of fixing it.
 - It likely only addresses a small part of the actual problem.

> It's possible to get into a dangerous situation without triggering this
> warning (already have the file mapped a lot of times, then lower pid_max,
> then raise max_map_count, then map the file a lot more times), but it's
> unlikely to happen.
>
> Comments?
>
> [1] map_count counts the number of times that a page is mapped to
> userspace; max_map_count restricts the number of times a process can
> map a page and pid_max restricts the number of processes that can exist.
> So map_count can never be larger than pid_max * max_map_count.
[...]
> +int sysctl_pid_max(struct ctl_table *table, int write,
> +                  void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> +       struct ctl_table t;
> +       int ret;
> +
> +       t = *table;
> +       t.data = &pid_max;
> +       t.extra1 = &pid_max_min;
> +       t.extra2 = &pid_max_max;
> +
> +       ret = proc_douintvec_minmax(&t, write, buffer, lenp, ppos);
> +       if (ret || !write)
> +               return ret;
> +
> +       if ((INT_MAX / max_map_count) > pid_max)
> +               pr_warn("pid_max is dangerously large\n");

This in reordered is "if (pid_max * max_map_count < INT_MAX)
pr_warn(...);", no? That doesn't make sense to me. Same thing again
further down.

[...]
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 4d3c922ea1a1..5b66a4a48192 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -147,7 +147,7 @@ static long madvise_behavior(struct vm_area_struct *vma,
>         *prev = vma;
>
>         if (start != vma->vm_start) {
> -               if (unlikely(mm->map_count >= sysctl_max_map_count)) {
> +               if (unlikely(mm->map_count >= max_map_count)) {

Why the renaming?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-02-08  2:56 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-08  2:11 [RFC] Warn the user when they could overflow mapcount Matthew Wilcox
2018-02-08  2:11 ` Matthew Wilcox
2018-02-08  2:56 ` Jann Horn [this message]
2018-02-08  2:56   ` Jann Horn
2018-02-08  4:04   ` Matthew Wilcox
2018-02-08  4:04     ` Matthew Wilcox
2018-02-08 17:58   ` valdis.kletnieks
2018-02-08 18:05   ` Daniel Micay
2018-02-08 18:05     ` Daniel Micay
2018-02-08 18:56     ` Matthew Wilcox
2018-02-08 18:56       ` Matthew Wilcox
2018-02-08 19:33       ` Daniel Micay
2018-02-08 19:33         ` Daniel Micay
2018-02-08 19:42         ` Matthew Wilcox
2018-02-08 19:42           ` Matthew Wilcox
2018-02-08 19:48           ` Daniel Micay
2018-02-08 19:48             ` Daniel Micay
2018-02-08 20:21             ` Matthew Wilcox
2018-02-08 20:21               ` Matthew Wilcox
2018-02-08 21:37               ` [RFC] Limit mappings to ten per page per process Matthew Wilcox
2018-02-08 21:37                 ` Matthew Wilcox
2018-02-09  4:26                 ` Kirill A. Shutemov
2018-02-09  4:26                   ` Kirill A. Shutemov
2018-02-14 13:51                   ` Matthew Wilcox
2018-02-14 13:51                     ` Matthew Wilcox
2018-02-14 14:05                     ` Kirill A. Shutemov
2018-02-14 14:05                       ` Kirill A. Shutemov
2018-02-09  1:47               ` [RFC] Warn the user when they could overflow mapcount Daniel Micay
2018-02-09  1:47                 ` Daniel Micay
2018-02-08  3:18 ` Tobin C. Harding
2018-02-08  3:18   ` Tobin C. Harding
2018-02-08  4:06   ` Matthew Wilcox
2018-02-08  4:06     ` Matthew Wilcox
2018-03-02 21:26 ` [RFC] Handle mapcount overflows Matthew Wilcox
2018-03-02 21:26   ` Matthew Wilcox
2018-03-02 22:03   ` Matthew Wilcox
2018-03-02 22:03     ` Matthew Wilcox
2019-05-01 14:41   ` Jann Horn
2019-05-01 14:41     ` Jann Horn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAG48ez2-MTJ2YrS5fPZi19RY6P_6NWuK1U5CcQpJ25=xrGSy_A@mail.gmail.com' \
    --to=jannh@google.com \
    --cc=kernel-hardening@lists.openwall.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.