linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Naresh Kamboju <naresh.kamboju@linaro.org>
Cc: linux- stable <stable@vger.kernel.org>,
	open list <linux-kernel@vger.kernel.org>,
	 linux-mm <linux-mm@kvack.org>, Arnd Bergmann <arnd@arndb.de>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Roman Gushchin <guro@fb.com>,  Michal Hocko <mhocko@kernel.org>,
	lkft-triage@lists.linaro.org,  Chris Down <chris@chrisdown.name>,
	Michel Lespinasse <walken@google.com>,
	 Fan Yang <Fan_Yang@sjtu.edu.cn>,
	Brian Geffon <bgeffon@google.com>,
	 Anshuman Khandual <anshuman.khandual@arm.com>,
	Will Deacon <will@kernel.org>,
	 Catalin Marinas <catalin.marinas@arm.com>,
	pugaowei@gmail.com,  Jerome Glisse <jglisse@redhat.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	 Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	 Hugh Dickins <hughd@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>, Tejun Heo <tj@kernel.org>,
	 Sasha Levin <sashal@kernel.org>
Subject: Re: WARNING: at mm/mremap.c:211 move_page_tables in i386
Date: Thu, 9 Jul 2020 22:22:21 -0700	[thread overview]
Message-ID: <CAHk-=wgB6Ds6yqbZZmscKNuAiNR2J0Pf3a8UrbdfewYxHE7SbA@mail.gmail.com> (raw)
In-Reply-To: <CA+G9fYuL=xJPLbQJVzDfXB8uNiCWdXpL=joDsnATEFCzyFh_1g@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 2627 bytes --]

On Thu, Jul 9, 2020 at 9:29 PM Naresh Kamboju <naresh.kamboju@linaro.org> wrote:
>
> Your patch applied and re-tested.
> warning triggered 10 times.
>
> old: bfe00000-c0000000 new: bfa00000 (val: 7d530067)

Hmm.. It's not even the overlapping case, it's literally just "move
exactly 2MB of page tables exactly one pmd down". Which should be the
nice efficient case where we can do it without modifying the lower
page tables at all, we just move the PMD entry.

There shouldn't be anything in the new address space from bfa00000-bfdfffff.

That PMD value obviously says differently, but it looks like a nice
normal PMD value, nothing bad there.

I'm starting to think that the issue might be that this is because the
stack segment is special. Not only does it have the growsdown flag,
but that whole thing has the magic guard page logic.

So I wonder if we have installed a guard page _just_ below the old
stack, so that we have populated that pmd because of that.

We used to have an _actual_ guard page and then play nasty games with
vm_start logic. We've gotten rid of that, though, and now we have that
"stack_guard_gap" logic that _should_ mean that vm_start is always
exact and proper (and that pgtbales_free() should have emptied it, but
maybe we have some case we forgot about.

> [  741.511684] WARNING: CPU: 1 PID: 15173 at mm/mremap.c:211 move_page_tables.cold+0x0/0x2b
> [  741.598159] Call Trace:
> [  741.600694]  setup_arg_pages+0x22b/0x310
> [  741.621687]  load_elf_binary+0x31e/0x10f0
> [  741.633839]  __do_execve_file+0x5a8/0xbf0
> [  741.637893]  __ia32_sys_execve+0x2a/0x40
> [  741.641875]  do_syscall_32_irqs_on+0x3d/0x2c0
> [  741.657660]  do_fast_syscall_32+0x60/0xf0
> [  741.661691]  do_SYSENTER_32+0x15/0x20
> [  741.665373]  entry_SYSENTER_32+0x9f/0xf2
> [  741.734151]  old: bfe00000-c0000000 new: bfa00000 (val: 7d530067)

Nothing looks bad, and the ELF loading phase memory map should be
really quite simple.

The only half-way unusual thing is that you have basically exactly 2MB
of stack at execve time (easy enough to tune by just setting argv/env
right), and it's moved down by exactly 2MB.

And that latter thing is just due to randomization, see
arch_align_stack() in arch/x86/kernel/process.c.

So that would explain why it doesn't happen every time.

What happens if you apply the attached patch to *always* force the 2MB
shift (rather than moving the stack by a random amount), and then run
the other program (t.c -> compiled to "a.out").

The comment should be obvious. But it's untested, I might have gotten
the math wrong. I don't run in a 32-bit environment.

                Linus

[-- Attachment #2: patch --]
[-- Type: application/octet-stream, Size: 516 bytes --]

 arch/x86/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index f362ce0d5ac0..9b027ec631a1 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -911,8 +911,7 @@ early_param("idle", idle_setup);
 
 unsigned long arch_align_stack(unsigned long sp)
 {
-	if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
-		sp -= get_random_int() % 8192;
+	sp -= 2*1024*1024;
 	return sp & ~0xf;
 }
 

[-- Attachment #3: t.c --]
[-- Type: text/x-csrc, Size: 1089 bytes --]

#define _GNU_SOURCE
#include <unistd.h>

static char one_kb[1024] = {
	[0 ... 1022] = 'a',
	0
};

/*
 * Each string is 1kB, so we would need 2048 strings to fill a 2MB stack.
 *
 * But we have the string pointers themselves: 4 bytes per string, so
 * that would be an additional 8kB on top of the 2MB of strings. Plus
 * we have the two NULL terminators (8 bytes) for argv/envp.
 *
 * And then we have the ELF AUX fields, which is a few hundred bytes too.
 *
 * And then we need the call stack frame etc, and only need to come within
 * 4kB of the 2MB stack target.
 *
 * So instead of using 2048 strings to fill up 2MB exactly, we want to fill up
 * basically 2MB-12kB, and let the AUX info etc go into the last page.
 *
 * So 2036 1kB strings, plus noise.
 */

static char *argv[] = {
	[0] = "/bin/echo",
	[1 ... 2036] = one_kb,
	NULL
};

static char *envp[] = {
	NULL
};

int main(int argc, char **envp)
{
	/*
	 * Don't do this recursively, and sleep so people can look at /proc/<pid>/maps
	 */
	if (argc > 1000) {
		sleep(100);
		return 0;
	}
	return execvpe("./a.out", argv, envp);
}

  reply	other threads:[~2020-07-10  5:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-09  5:28 WARNING: at mm/mremap.c:211 move_page_tables in i386 Naresh Kamboju
2020-07-09  8:25 ` Arnd Bergmann
2020-07-10  4:17   ` Naresh Kamboju
2020-07-09 19:12 ` Linus Torvalds
2020-07-10  4:28   ` Naresh Kamboju
2020-07-10  5:22     ` Linus Torvalds [this message]
2020-07-10 17:48       ` Naresh Kamboju
2020-07-10 20:05         ` Linus Torvalds
2020-07-11 17:27           ` Naresh Kamboju
2020-07-11 18:12             ` Linus Torvalds
2020-07-11 18:21               ` Linus Torvalds
2020-07-11 23:33               ` Joel Fernandes
2020-07-12 17:30               ` Matthew Wilcox
2020-07-12 20:38                 ` Linus Torvalds
2020-07-12 21:50       ` Joel Fernandes
2020-07-12 22:58         ` Linus Torvalds
2020-07-13  2:53           ` Joel Fernandes
2020-07-13  3:51             ` Linus Torvalds
2020-07-13 12:12               ` Joel Fernandes
2020-07-14  7:33           ` Kirill A. Shutemov
2020-07-14 11:27             ` Naresh Kamboju
2020-07-14 16:08             ` Joel Fernandes
2020-07-14 16:10               ` Linus Torvalds
2020-07-14 18:12                 ` Joel Fernandes
2020-07-14 18:49                   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wgB6Ds6yqbZZmscKNuAiNR2J0Pf3a8UrbdfewYxHE7SbA@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=Fan_Yang@sjtu.edu.cn \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=arnd@arndb.de \
    --cc=bgeffon@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=chris@chrisdown.name \
    --cc=gregkh@linuxfoundation.org \
    --cc=guro@fb.com \
    --cc=hughd@google.com \
    --cc=jglisse@redhat.com \
    --cc=joel@joelfernandes.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkft-triage@lists.linaro.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=naresh.kamboju@linaro.org \
    --cc=pugaowei@gmail.com \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=walken@google.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).